DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/10/21 has been entered.
 
Status of the Claims
Claims 1-20 of U.S. Application No. 15/943,223 filed on 4/2/18 were examined. Examiner filed a non-final rejection.
Applicant filed amendments on 1/31/2020. Claims 1, 4, 8, 11, 15, and 18 were amended. Claims 2-3, 9-10, and 16-17 were canceled. Claims 1, 4-8, 11-15, and 18-20 were examined. Examiner filed a final rejection.
Applicant filed an RCE on 7/6/20. Claims 1, 8 and 15 were amended and claim 21 was newly added. Claims 1, 4-8, 11-15, and 18-21 were examined. Examiner filed a non-final rejection.
Applicant filed amendments on 11/11/20. Claims 1, 8 and 15 was amended. Claim 21 was canceled. Claim 22 was newly added. Claims 1, 4-8, 11- 15, 18-20 and 22 were examined. Examiner filed a final rejection.
Applicant filed an RCE on 3/10/21. Claims 1, 8 and 15 were amended. Claims 1, 4-8, 11-15, 18-20 and 22 are presently pending and presented for examination.

Response to Arguments
Regarding claim objections: claims 1, 8, and 15 were originally objected to over minor informalities. Applicant’s amendments have resolved these informalities, so the objections over minor informalities are therefore withdrawn.
Regarding 35 USC 103: Applicant's arguments filed 3/10/21 have been fully considered but they are not persuasive. For reference, the relevant references for this discussion are Lakshmanan (US 20190050729 A1) in view of Giering et al. (US 20180129974 A1) in further view of Jain et al. (US 9598076 B1) in further view of Izhikevich et al. (US 20140277718 A1), hereinafter referred to as Lakshmanan, Giering, Jain, and Izhikevich, respectively.
Regarding claims 1, 8, and 15, applicant argues that, ”paragraph 33 of Giering does not mention optimizing a set of local policies for specific instances of a task” (See at least page 7 of applicant’s remarks). However, this argument is not persuasive since Giering does teach a method for deep reinforcement learning wherein training the reinforcement learning controller for autonomous driving utilizes a first aspect to provide guidance regarding options to explore when making a decision, wherein the first aspect implements a guided policy search which iteratively optimizes a set of local policies for specific instances of a task (Giering teaches that a reinforcement learning system may utilize an iterative guided policy search approach to iteratively change which regions are prioritized by the reinforcement learning algorithm based on a training set [See at least Giering, 0033]). Anyone of ordinary skill in the art will appreciate that all types of machine learning, including but not limited to reinforcement learning as taught in [Giering, 0033] and claimed by applicant, utilize specific sets of data (“specific instances of a task”) to optimize an algorithm used to predict instances of a task. As the optimization taught in [Giering, 0033] utilizes a particular training set, it will therefore be appreciated by anyone of ordinary skill in the art that the policies of the reinforcement learning algorithm, which may be regarded as “local policies” since they are being optimized based on a specific set, are optimized for the training set, which may be regarded as applicant’s “specific instances of a task”.
Applicant further argues, with regard to the Giering reference, that, “there is still no teaching of the claimed limitation of using the local policies to train a general global policy usable across task instances” (See at least page 7 of applicant’s remarks). However, this argument is not persuasive since Giering does teach a method for deep reinforcement learning wherein the reinforcement learning controller uses the local policies to train a general global policy usable across task instances (Giering teaches, “Learning hierarchical structures in control and reinforcement learning can improve generalization” [See at least Giering, 0033]). Anyone of ordinary skill in the art would interpret [Giering, 0033] to therefore teach that the hierarchical structures of the reinforcement learning algorithm can be used to improve the generalizability of the algorithm to contexts other than the specific training set used to train the algorithm in [Giering, 0033]. Giering’s utilization of the structure in control and reinforcement 
If applicant wishes to overcome this interpretation and the Giering reference, applicant must amend the claims in light of the specification in such a way that this or any other broadest reasonable interpretation of Giering no longer reads on the claims. Until applicant makes such an amendment, the rejection of these limitations in view of Giering will be maintained.
Applicant further argues, with regard to Izhikevich, that, “Pointing a laser at a potential target to limit exploration in Izhikevich is not the same as the presently claimed invention limitation [wherein limiting] the search space further includes removing less likely options from a full set of options to generate a limited set of options which focuses on more likely options” (See at least page 7 in applicant’s remarks). However, this argument is not persuasive since Izhikevich does teach a system for training a robot wherein a supervised learning procedure to limit a search space further includes removing less likely options from a full set of options to generate a limited set of options which focuses on more likely options (Izhikevich teaches that a training signal used by a supervised learning model for a robot may limit exploration by pointing to a region within the state space where the target solution may reside [See at least Izhikevich, 0293]). The fact that [Izhikevich, 0293] teaches that a laser pointer or other training signals may be used to limit a search space to a region where solutions are more likely to resides does not change the fact that the quoted portion of [Izhikevich, 0293] does read on the quoted portion of applicant’s claims: the limiting of the set of options, and thus the removal of less likely options which are not the desired solution, still occurs. If applicant wishes for the limiting of the set of options to occur by some means other than a laser pointer or other 

Regarding claim 15, applicant argues that, “the independent Claim 15 has been amended to include: wherein removing less likely options from the full set of options to generate the limited set of options which focuses on more likely options includes removing the negative options. Lakshmanan, Giering, Jain, Izhikevich and their combination do not teach this limitation.” However, this argument is not persuasive since Izhikevich does teach a supervised learning controller (note that Giering teaches that “A guided policy search approach transforms a policy search into a supervised learning problem” [See at least Giering, 0033]) wherein removing less likely options from the full set of options to generate the limited set of options which focuses on more likely options includes removing the negative options (Izhikevich teaches that a training signal used by a supervised learning model for a robot may limit exploration by pointing to a region within the state space where the target solution may reside [See at least Izhikevich, 0293]. It will be appreciated by anyone of ordinary skill in the art that the options that are not part of the indicated region of Izhikevich may be regarded as negative options, since they deviate significantly from the region containing positive option(s)).
Examiner’s note to potentially overcome the prior art: Based on examiner’s reading of applicant’s specification, examiner believes that the basis for the newly amended claim language in claim 15, “wherein removing less likely options from the full set of options to generate the limited set of options which focuses on more likely options includes removing the negative options”, is the passage in applicant’s specification disclosing, “instead of utilizing all options of what to do when approaching an object (including hitting the object), guidance is utilized such that options including speeding up and hitting the object are excluded, and more likely options such as braking, slowing down and avoiding the object are focused on” (emphasis added). For convenience, this passage can be found in paragraph [0014] of Chiang et al. (US 20190302785 A1), which is the publication of applicant’s application. Examiner would like to note that based on this passage, it appears that applicant’s limiting of the search space comprises completely eliminating the ability of the vehicle to speed up. This is not something that is taught in Izhikevich or any of the prior art of record. Therefore, amending the bolded portion of the quoted passage into the claims could potentially allow applicant to overcome the prior art of record. However, this is merely a suggestion from the examiner to help further prosecution if this limitation is of interest to the applicant.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-5, 7-8, 11-12, 14-15, 18, 20 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Lakshmanan (US 20190050729 A1) in view of Giering et al. (US 20180129974 A1) in further view of Jain et al. (US 9598076 B1) in further view of Izhikevich et al. (US 20140277718 A1), hereinafter referred to as Lakshmanan, Giering, Jain, and Izhikevich, respectively.
Regarding claim 1, Lakshmanan discloses A method comprising: 
training a reinforcement learning controller for autonomous driving (Lakshmanan discloses, “a deep network can be trained for a number of maneuvers…without any change in the software/logic architecture” [See at least Lakshmanan, 0018]. Also see Fig. 2 in Lakshmanan: Lakshmanan discloses that as part of a behavior planning logic 106, “an individual deep network may be trained with WGM [weather, geographic, and maneuver data] as well as camera, LIDAR, and/or radar input” [See at least Lakshmanan, 0032]) utilizing a vision model, wherein training the reinforcement learning controller for autonomous driving utilizes…a second aspect to learn how to react based on vision information from the vision model (See Fig. 3 in Lakshmanan: Lakshmanan discloses a more in-depth embodiment of the behavior planning logic 106 wherein “a reinforcement learning logic 302 may generate the M [maneuver] data based on camera, radar, and LIDAR sensor input” [See at least Lakshmanan, 0031]. It will be appreciated by anyone of ordinary skill in the art that the learning logic implements a model); and 
deploying the reinforcement learning controller for autonomous driving utilizing the vision model (Lakshmanan discloses that the behavior planning logic may be used to actuate or control the movement of the vehicle [See at least Lakshmanan, 0096]), wherein the vision model includes images, videos, calculations, depth information (See at least Fig. 3 in Lakshmanan: Lakshmanan discloses that the vision model may utilize camera, radar, and LIDAR information as part of the computer vision system [See at least Lakshmanan, 0031]. Lakshmanan further discloses than a video camera in particular may be used [See at least Lakshmanan, 0034]), classification information (Lakshmanan discloses that the system may implement a deep neural network which perform mapping of features, a type of classification [See at least Lakshmanan, 0070]) and label information (Lakshmanan discloses that inputs and outputs may be labeled during training of the neural network [See at least Lakshmanan, 0060]).
However, Lakshmanan does not explicitly disclose the method wherein training the reinforcement learning controller for autonomous driving utilizes a first aspect to provide guidance regarding options to explore when making a decision, wherein the first aspect implements a guided policy search which iteratively optimizes a set of local policies for specific instances of a task and uses the local policies to train a general global policy usable across task instances and limiting a search space,
and wherein the reinforcement learning controller learns based on the vision model, one or more options to take and one or more outcomes of the options to take, including negative options and positive options.
However, Giering does teach a method wherein training the reinforcement learning controller for autonomous driving utilizes a first aspect to provide guidance regarding options to explore when making a decision, wherein the first aspect implements a guided policy search which iteratively optimizes a set of local policies for specific instances of a task (Giering teaches that a reinforcement learning system may utilize an iterative guided policy search approach to iteratively change which regions are prioritized by the reinforcement learning algorithm based on a training set [See at least Giering, 0033]) and uses the local policies to train a general global policy usable across task instances and limiting a search space (Giering teaches, “Learning hierarchical structures in control and reinforcement learning can improve generalization” [See at least Giering, 0033]),
and wherein the reinforcement learning controller learns based on the vision model (See at least Fig. 2 in Giering: Giering teaches that the sensor data used as inputs for the deep reinforcement learning system 100 of in Fig. 1 may be obtained via sensors 205—particularly, cameras 204 [See at least Giering, 0042]. Also see at least Fig. 3 in Giering: While Giering discloses that the deep reinforcement learning vision system is applied to observing and refining a coldspray application, it may also be broadly applied to other applications, such as robotics [See at least Giering, 0046]), one or more options to take and one or more outcomes of the options to take, including negative options and positive options (See at least Fig. 3 in Giering: Giering discloses that the system may use image data 320 and a reward token 308 to take action 304 to adjust actuators [See at least Giering, 0045]. Giering further discloses that, as a specific application of the general model of Fig. 1 and general vision system of Fig. 2, Fig. 3 discloses that the reward token 308 is an example of reward 110, the action 304 is an example of action 112 that may be controlled by actuators 207, and the surface 310 is a feature of interest 212 [See at least Giering, 0045]. It will be appreciated that the scenario in which the outcome is similar to the reward token may be regarded as a positive option and the scenario in which the outcome deviates significantly from the reward token may be regarded as a negative option). Both Giering and Lakshmanan teach methods for training vision-based reinforcement learning systems. However, only Giering teaches where a guided policy search may be used to guide which options are explored by the reinforcement learning algorithm.
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the vision-based reinforcement learning system training method of Lakshmanan to also include a guided policy search component, as in Giering. Doing so reduces the risk of compounding errors (With regard to this reasoning, Giering teaches, “Training data from the policy's own state distribution helps to reduce the risk of compounding errors” [See at least Giering, 0033]).
However, Lakshmanan does not explicitly disclose the method wherein the vision model includes audio and the vision model is trained via the audio acquired using one or more vehicle cameras.
However, Jain does teach a vehicle computer vision method wherein the vision model includes audio and the vision model is trained via the audio acquired using one or more vehicle cameras (Jain teaches that the computer vision method may be trained using audio data which teaches it to correlate images and audio in order to detect surrounding hazards [See at least Jain, Col 6, line 46-Col 7, line 22]). Both Lakshmanan and Jain teach vehicle computer vision methods. However, only Jain explicitly teaches where the computer vision and object detection system of the vehicle may be trained using audio data.
It would have been obvious to anyone of ordinary skill prior to the effective filing date of the claimed invention to modify the computer vision system of Lakshmanan to also include audio sensors which are used to train the computer vision model. Doing so improves safety of the vehicular system by providing another means to detect hazardous situations (See at least [Jain, Col 6, line 46-Col 7, line 22]).
However, Lakshmanan in view of Giering does not explicitly teach wherein the guided policy search further includes removing less likely options from a full set of options to generate a limited set of options which focuses on more likely options.
However, Izhikevich does teach a system for training a robot wherein a supervised learning procedure further includes removing less likely options from a full set of options to generate a limited set of options which focuses on more likely options (Izhikevich teaches that a training signal used by a supervised learning model for a robot may limit exploration by pointing to a region within the state space where the target solution may reside [See at least Izhikevich, 0293]). Both Izhikevich and Lakshmanan in view of Giering teach methods for training robots using supervised learning (Anyone of ordinary skill in the art will appreciate that a guided policy search comprises supervised learning). However, only Izhikevich explicitly 
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the learning procedure of Lakshmanan in view Giering to also limit a state space so that the state space explored is where the solution is more likely to reside. Doing so improves the speed of the learning algorithm by making the learning algorithm converge on the target solution more quickly (See at least [Izhikevich, 0293]). 

Regarding claim 4, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teaches The method of claim 1 wherein deploying the reinforcement learning controller for autonomous driving utilizes the second aspect to learn how to react based on the vision information from the vision model (Lakshmanan discloses, “an individual deep network may be trained with…camera…input” [See at least Lakshmanan, 0032]).

Regarding claim 5, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teaches The method of claim 1 wherein the vision model is trained via images and/or videos acquired using one or more vehicle cameras (Lakshmanan discloses that “an individual deep network may be trained with…camera…input” [See at least Lakshmanan, 0032]).

	Regarding claim 7, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich discloses The method of claim 1 further comprising autonomously driving a vehicle using the reinforcement learning controller by sending a signal to at least one of a driving mechanism, a braking mechanism and an acceleration mechanism (Lakshmanan discloses that the behavior planning logic may be used to actuate or control the movement of the vehicle [See at least Lakshmanan, 0096]. It will be appreciated by anyone of ordinary skill in the art that this necessitates sending a signal to at least one driving mechanism of the vehicle).

	Regarding claim 8, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich discloses A system comprising: 
a non-transitory memory for storing an application (Lakshmanan discloses that the logic of the embodiments may be stored in a System on Chip (SOC) device [See at least Lakshmanan, 0019]), the application for: 
training a reinforcement learning controller for autonomous driving (Lakshmanan discloses, “a deep network can be trained for a number of maneuvers…without any change in the software/logic architecture” [See at least Lakshmanan, 0018]. Also see Fig. 2 in Lakshmanan: Lakshmanan discloses that as part of a behavior planning logic 106, “an individual deep network may be trained with WGM [weather, geographic, and maneuver data] as well as camera, LIDAR, and/or radar input” [See at least Lakshmanan, 0032]) utilizing a vision model, wherein training the reinforcement learning controller for autonomous driving utilizes…a second aspect to learn how to react based on vision information from the vision model (See Fig. 3 in Lakshmanan: Lakshmanan discloses a more in-depth embodiment of the behavior planning logic 106 wherein “a reinforcement learning logic 302 may generate the M [maneuver] data based on camera, radar, and LIDAR sensor input” [See at least Lakshmanan, 0031]. It will be appreciated by anyone of ordinary skill in the art that the learning logic implements a model), wherein the vision model includes images, videos, calculations, depth information (See at least Fig. 3 in Lakshmanan: Lakshmanan discloses that the vision model may utilize camera, radar, and LIDAR information as part of the computer vision system [See at least Lakshmanan, 0031]. Lakshmanan further discloses than a video camera in particular may be used [See at least Lakshmanan, 0034]), classification information (Lakshmanan discloses that the system may implement a deep neural network which perform mapping of features, a type of classification [See at least Lakshmanan, 0070]) and label information (Lakshmanan discloses that inputs and outputs may be labeled during training of the neural network [See at least Lakshmanan, 0060]); and 
utilizing the reinforcement learning controller for autonomous driving utilizing the vision model (Lakshmanan discloses that the behavior planning logic may be used to actuate or control the movement of the vehicle [See at least Lakshmanan, 0096]); and 
a processor coupled to the memory (See Fig. 5 in Lakshmanan: Lakshmanan discloses, “SOC package 502 is coupled to a memory 560 via the memory controller 542” [See at least Lakshmanan, 0035]), the processor configured for processing the application (Lakshmanan discloses, “embodiments may be applied in computing devices that include one or more processors” [See at least Lakshmanan, 0033]).
 wherein training the reinforcement learning controller for autonomous driving utilizes a first aspect to provide guidance regarding options to explore when making a decision, wherein the first aspect implements a guided policy search which iteratively optimizes a set of local policies for specific instances of a task and uses the local policies to train a general global policy usable across task instances and limiting a search space,
and wherein the reinforcement learning controller learns based on the vision model, one or more options to take and one or more outcomes of the options to take, including negative options and positive options.
However, Giering does teach a method wherein training the reinforcement learning controller for autonomous driving utilizes a first aspect to provide guidance regarding options to explore when making a decision, wherein the first aspect implements a guided policy search which iteratively optimizes a set of local policies for specific instances of a task (Giering teaches that a reinforcement learning system may utilize an iterative guided policy search approach to iteratively change which regions are prioritized by the reinforcement learning algorithm based on a training set [See at least Giering, 0033]) and uses the local policies to train a general global policy usable across task instances and limiting a search space (Giering teaches, “Learning hierarchical structures in control and reinforcement learning can improve generalization” [See at least Giering, 0033]),
and wherein the reinforcement learning controller learns based on the vision model (See at least Fig. 2 in Giering: Giering teaches that the sensor data used as inputs for the deep reinforcement learning system 100 of in Fig. 1 may be obtained via sensors 205—particularly, cameras 204 [See at least Giering, 0042]. Also see at least Fig. 3 in Giering: While Giering discloses that the deep reinforcement learning vision system is applied to observing and refining a coldspray application, it may also be broadly applied to other applications, such as robotics [See at least Giering, 0046]), one or more options to take and one or more outcomes of the options to take, including negative options and positive options (See at least Fig. 3 in Giering: Giering discloses that the system may use image data 320 and a reward token 308 to take action 304 to adjust actuators [See at least Giering, 0045]. Giering further discloses that, as a specific application of the general model of Fig. 1 and general vision system of Fig. 2, Fig. 3 discloses that the reward token 308 is an example of reward 110, the action 304 is an example of action 112 that may be controlled by actuators 207, and the surface 310 is a feature of interest 212 [See at least Giering, 0045]. It will be appreciated that the scenario in which the outcome is similar to the reward token may be regarded as a positive option and the scenario in which the outcome deviates significantly from the reward token may be regarded as a negative option). Both Giering and Lakshmanan teach methods for training vision-based reinforcement learning systems. However, only Giering teaches where a guided policy search may be used to guide which options are explored by the reinforcement learning algorithm.
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the vision-based reinforcement learning system training method of Lakshmanan to also include a guided policy search component, as in Giering. Doing so reduces the risk of compounding errors (With regard to this reasoning, Giering teaches, “Training data from the policy's own state distribution helps to reduce the risk of compounding errors” [See at least Giering, 0033]).
However, Lakshmanan does not explicitly disclose the method wherein the vision model includes audio and the vision model is trained via the audio acquired using one or more vehicle cameras.
However, Jain does teach a vehicle computer vision method wherein the vision model includes audio and the vision model is trained via the audio acquired using one or more vehicle cameras (Jain teaches that the computer vision method may be trained using audio data which teaches it to correlate images and audio in order to detect surrounding hazards [See at least Jain, Col 6, line 46-Col 7, line 22]). Both Lakshmanan and Jain teach vehicle computer vision methods. However, only Jain explicitly teaches where the computer vision and object detection system of the vehicle may be trained using audio data.
It would have been obvious to anyone of ordinary skill prior to the effective filing date of the claimed invention to modify the computer vision system of Lakshmanan to also include audio sensors which are used to train the computer vision model. Doing so improves safety of the vehicular system by providing another means to detect hazardous situations (See at least [Jain, Col 6, line 46-Col 7, line 22]).
However, Lakshmanan in view of Giering does not explicitly teach wherein the guided policy search further includes removing less likely options from a full set of options to generate a limited set of options which focuses on more likely options.
However, Izhikevich does teach a system for training a robot wherein a supervised learning procedure further includes removing less likely options from a full set of options to generate a limited set of options which focuses on more likely options (Izhikevich teaches that a training signal used by a supervised learning model for a robot may limit exploration by pointing to a region within the state space where the target solution may reside [See at least Izhikevich, 0293]). Both Izhikevich and Lakshmanan in view of Giering teach methods for training robots using supervised learning (Anyone of ordinary skill in the art will appreciate that a guided policy search comprises supervised learning). However, only Izhikevich explicitly teaches where a state space may be limited so that the state space explored is where the solution is more likely to reside.
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the learning procedure of Lakshmanan in view Giering to also limit a state space so that the state space explored is where the solution is more likely to reside. Doing so improves the speed of the learning algorithm by making the learning algorithm converge on the target solution more quickly (See at least [Izhikevich, 0293]). 

	Regarding claim 11, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teaches The system of claim 8 wherein utilizing the reinforcement learning controller for autonomous driving utilizes the second aspect to learn how to react based on the vision information from the vision model (Lakshmanan discloses, “an individual deep network may be trained with…camera…input” [See at least Lakshmanan, 0032]).
	
Regarding claim 12, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich discloses The system of claim 8 wherein the vision model is trained via images and/or videos acquired using one or more vehicle cameras (Lakshmanan discloses that “an individual deep network may be trained with…camera…input” [See at least Lakshmanan, 0032]).

	Regarding claim 14, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich discloses The system of claim 8 wherein the reinforcement learning controller is further configured for autonomously driving a vehicle by sending a signal to at least one of a driving mechanism, a braking mechanism and an acceleration mechanism (Lakshmanan discloses that the behavior planning logic may be used to actuate or control the movement of the vehicle [See at least Lakshmanan, 0096]. It will be appreciated by anyone of ordinary skill in the art that this necessitates sending a signal to at least one driving mechanism of the vehicle).

	Regarding claim 15, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich discloses A vehicle (Lakshmanan discloses that the System on Chip (SOC) is physically coupled to a vehicle) comprising: 
one or more cameras configured for acquiring vision information (See Fig. 2 in Lakshmanan: Lakshmanan discloses that a camera may be used to acquire vision information [See at least Lakshmanan, 0031-0032]. Lakshmanan further discloses that four cameras may provide 360 degree view of the vehicle [0022-0023]. It will be appreciated by anyone of ordinary skill in the art that for cameras to provide such a view, they must be part of the vehicle); and 
one or more computing devices (Lakshmanan discloses that the logic of the embodiments may be stored in a System on Chip (SOC) device [See at least Lakshmanan, 0019]. Also see Fig. 5 in Lakshmanan: Lakshmanan further discloses, “SOC package 502 is coupled to a memory 560 via the memory controller 542” [See at least Lakshmanan, 0035]. The computing device may therefore comprise the SOC package 502 and the memory device 560) configured for: 
training a reinforcement learning controller for autonomous driving (Lakshmanan discloses, “a deep network can be trained for a number of maneuvers…without any change in the software/logic architecture” [See at least Lakshmanan, 0018]. Also see Fig. 2 in Lakshmanan: Lakshmanan discloses that as part of a behavior planning logic 106, “an individual deep network may be trained with WGM [weather, geographic, and maneuver data] as well as camera, LIDAR, and/or radar input” [See at least Lakshmanan, 0032]) utilizing a vision model including the vision information wherein training the reinforcement learning controller for autonomous driving utilizes…a second aspect to learn how to react based on vision information from the vision model (See Fig. 3 in Lakshmanan: Lakshmanan discloses a more in-depth embodiment of the behavior planning logic 106 wherein “a reinforcement learning logic 302 may generate the M [maneuver] data based on camera, radar, and LIDAR sensor input” [See at least Lakshmanan, 0031]. It will be appreciated by anyone of ordinary skill in the art that the learning logic implements a model); and 
utilizing the reinforcement learning controller for autonomous driving utilizing the vision model including the vision information (Lakshmanan discloses that the behavior planning logic may be used to actuate or control the movement of the vehicle [See at least Lakshmanan, 0096]), wherein the vision model includes images, videos, calculations, depth information (See at least Fig. 3 in Lakshmanan: Lakshmanan discloses that the vision model may utilize camera, radar, and LIDAR information as part of the computer vision system [See at least Lakshmanan, 0031]. Lakshmanan further discloses than a video camera in particular may be used [See at least Lakshmanan, 0034]), classification information (Lakshmanan discloses that the system may implement a deep neural network which perform mapping of features, a type of classification [See at least Lakshmanan, 0070]) and label information (Lakshmanan discloses that inputs and outputs may be labeled during training of the neural network [See at least Lakshmanan, 0060]).
However, Lakshmanan does not explicitly disclose wherein training the reinforcement learning controller for autonomous driving utilizes a first aspect to provide guidance regarding options to explore when making a decision, wherein the first aspect implements a guided policy search which iteratively optimizes a set of local policies for specific instances of a task and uses the local policies to train a general global policy usable across task instances and limiting a search space,
and wherein the reinforcement learning controller learns based on the vision model, one or more options to take and one or more outcomes of the options to take, including negative options and positive options.
wherein training the reinforcement learning controller for autonomous driving utilizes a first aspect to provide guidance regarding options to explore when making a decision, wherein the first aspect implements a guided policy search which iteratively optimizes a set of local policies for specific instances of a task (Giering teaches that a reinforcement learning system may utilize an iterative guided policy search approach to iteratively change which regions are prioritized by the reinforcement learning algorithm based on a training set [See at least Giering, 0033]) and uses the local policies to train a general global policy usable across task instances and limiting a search space (Giering teaches, “Learning hierarchical structures in control and reinforcement learning can improve generalization” [See at least Giering, 0033]),
and wherein the reinforcement learning controller learns based on the vision model (See at least Fig. 2 in Giering: Giering teaches that the sensor data used as inputs for the deep reinforcement learning system 100 of in Fig. 1 may be obtained via sensors 205—particularly, cameras 204 [See at least Giering, 0042]. Also see at least Fig. 3 in Giering: While Giering discloses that the deep reinforcement learning vision system is applied to observing and refining a coldspray application, it may also be broadly applied to other applications, such as robotics [See at least Giering, 0046]), one or more options to take and one or more outcomes of the options to take, including negative options and positive options (See at least Fig. 3 in Giering: Giering discloses that the system may use image data 320 and a reward token 308 to take action 304 to adjust actuators [See at least Giering, 0045]. Giering further discloses that, as a specific application of the general model of Fig. 1 and general vision system of Fig. 2, Fig. 3 discloses that the reward token 308 is an example of reward 110, the action 304 is an example of action 112 that may be controlled by actuators 207, and the surface 310 is a feature of interest 212 [See at least Giering, 0045]. It will be appreciated that the scenario in which the outcome is similar to the reward token may be regarded as a positive option and the scenario in which the outcome deviates significantly from the reward token may be regarded as a negative option). Both Giering and Lakshmanan teach methods for training vision-based reinforcement learning systems. However, only Giering teaches where a guided policy search may be used to guide which options are explored by the reinforcement learning algorithm.
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the vision-based reinforcement learning system training method of Lakshmanan to also include a guided policy search component, as in Giering. Doing so reduces the risk of compounding errors (With regard to this reasoning, Giering teaches, “Training data from the policy's own state distribution helps to reduce the risk of compounding errors” [See at least Giering, 0033]).
However, Lakshmanan does not explicitly disclose the method wherein the vision model includes audio and the vision model is trained via the audio acquired using one or more vehicle cameras.
However, Jain does teach a vehicle computer vision method wherein the vision model includes audio and the vision model is trained via the audio acquired using one or more vehicle cameras (Jain teaches that the computer vision method may be trained using audio data which teaches it to correlate images and audio in order to detect surrounding hazards [See at least Jain, Col 6, line 46-Col 7, line 22]). Both Lakshmanan and Jain teach vehicle computer 
It would have been obvious to anyone of ordinary skill prior to the effective filing date of the claimed invention to modify the computer vision system of Lakshmanan to also include audio sensors which are used to train the computer vision model. Doing so improves safety of the vehicular system by providing another means to detect hazardous situations (See at least [Jain, Col 6, line 46-Col 7, line 22]).
However, Lakshmanan in view of Giering does not explicitly teach the vehicle wherein the guided policy search further includes removing less likely options from a full set of options to generate a limited set of options which focuses on more likely options.
However, Izhikevich does teach a system for training a robot wherein a supervised learning procedure further includes removing less likely options from a full set of options to generate a limited set of options which focuses on more likely options, wherein removing less likely options from the full set of options to generate the limited set of options which focuses on more likely options includes removing the negative options (Izhikevich teaches that a training signal used by a supervised learning model for a robot may limit exploration by pointing to a region within the state space where the target solution may reside [See at least Izhikevich, 0293]. It will be appreciated by anyone of ordinary skill in the art that the options that are not part of the indicated region of Izhikevich may be regarded as negative options, since they deviate significantly from the region containing positive option(s)). Both Izhikevich and Lakshmanan in view of Giering teach methods for training robots using supervised learning (Anyone of ordinary skill in the art will appreciate that a guided policy search comprises 
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the learning procedure of Lakshmanan in view Giering to also limit a state space so that the state space explored is where the solution is more likely to reside, as in Izhikevich. Doing so improves the speed of the learning algorithm by making the learning algorithm converge on the target solution more quickly (See at least [Izhikevich, 0293]). 

	Regarding claim 18, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teaches The vehicle of claim 16 wherein utilizing the reinforcement learning controller for autonomous driving utilizes the second aspect to learn how to react based on the vision information from the vision model (Lakshmanan discloses, “an individual deep network may be trained with…camera…input” [See at least Lakshmanan, 0032]).

Regarding claim 20, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich discloses The vehicle of claim 15 wherein the reinforcement learning controller is further configured for autonomously driving the vehicle by sending a signal to at least one of a driving mechanism, a braking mechanism and an acceleration mechanism (Lakshmanan discloses that the behavior planning logic may be used to actuate or control the movement of the vehicle [See at least Lakshmanan, 0096]. It will be appreciated by anyone of ordinary skill in the art that this necessitates sending a signal to at least one driving mechanism of the vehicle).

Regarding claim 22, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teaches The vehicle of claim 15 wherein the full set of options include driving options selected from the group consisting of braking, slowing down, avoiding an object, accelerating, and turning (Lakshmanan discloses that movement of the vehicle executed based on outputs of the deep learning network may include steering and braking [See at least Lakshmanan, 0026-0027]).

Claims 6, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Lakshmanan (US 20190050729 A1) in view of Giering (US 20180129974 A1) in further view of Jain et al. (US 9598076 B1) in further view of Izhikevich et al. (US 20140277718 A1) in further view of Jordan et al. (US 20170327138 A1), hereinafter referred to as Jordan.
Regarding claim 6, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teaches The method of claim 1.
However, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich does not explicitly teach the method wherein training the reinforcement learning controller for autonomous driving utilizes labeled images which include fully or partially observed states.
However, Jordan does teach a system for training a reinforcement learning controller wherein training the reinforcement learning controller for a vehicle (See Fig. 1 in Jordan: Jordan teaches that the reinforcement learning system is applied to a vehicle [Jordan, 0020]) utilizes labeled images which include fully or partially observed states (Jordan teaches “various objects…may be identified with the use of reinforcement learning of asset camera images…Reinforcement learning utilizes previously labeled data sets defined as ‘training’ data” [Jordan, 0019]). Both Jordan and Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teach reinforcement learning methods for training vehicle vision systems. However, only Jordan explicitly teaches where labeled images may be used for that training.
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the method of Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich to explicitly utilize labeled images to train the reinforcement learning controller, as in Jordan. Doing so allows the vehicle to autonomously identify objects within view of the camera (With regard to this reasoning, Jordan teaches “Reinforcement learning utilizes previously labeled data sets defined as ‘training’ data to allow remote and autonomous identification of objects within view of the camera” [Jordan, 0019]).

Regarding claim 13, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teaches The system of claim 8.
However, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich does not explicitly teach the system wherein training the reinforcement learning controller for autonomous driving utilizes labeled images which include fully or partially observed states.
However, Jordan does teach a system for training a reinforcement learning controller wherein training the reinforcement learning controller for a vehicle (See Fig. 1 in Jordan: Jordan teaches that the reinforcement learning system is applied to a vehicle [Jordan, 0020]) utilizes labeled images which include fully or partially observed states (Jordan teaches “various objects…may be identified with the use of reinforcement learning of asset camera images…Reinforcement learning utilizes previously labeled data sets defined as ‘training’ data” [Jordan, 0019]). Both Jordan and Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teach reinforcement learning methods for training vehicle vision systems. However, only Jordan explicitly teaches where labeled images may be used for that training.
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the method of Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich to explicitly utilize labeled images to train the reinforcement learning controller, as in Jordan. Doing so allows the vehicle to autonomously identify objects within view of the camera (With regard to this reasoning, Jordan teaches “Reinforcement learning utilizes previously labeled data sets defined as ‘training’ data to allow remote and autonomous identification of objects within view of the camera” [Jordan, 0019]).

	Regarding claim 19, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teaches The vehicle of claim 15.
However, Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich does not explicitly teach the vehicle wherein training the reinforcement learning controller for autonomous driving utilizes labeled images which include fully or partially observed states.
However, Jordan does teach a system for training a reinforcement learning controller wherein training the reinforcement learning controller for a vehicle (See Fig. 1 in Jordan: Jordan teaches that the reinforcement learning system is applied to a vehicle [Jordan, 0020]) utilizes labeled images which include fully or partially observed states (Jordan teaches “various objects…may be identified with the use of reinforcement learning of asset camera images…Reinforcement learning utilizes previously labeled data sets defined as ‘training’ data” [Jordan, 0019]). Both Jordan and Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich teach reinforcement learning methods for training vehicle vision systems. However, only Jordan explicitly teaches where labeled images may be used for that training.
It would have been obvious to anyone of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the method of Lakshmanan in view of Giering in further view of Jain in further view of Izhikevich to explicitly utilize labeled images to train the reinforcement learning controller, as in Jordan. Doing so allows the vehicle to autonomously identify objects within view of the camera (With regard to this reasoning, Jordan teaches “Reinforcement learning utilizes previously labeled data sets defined as ‘training’ data to allow remote and autonomous identification of objects within view of the camera” [Jordan, 0019]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAEEM T ALAM whose telephone number is (571)272-5901.  The examiner can normally be reached on M-F 9:00 am-5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FADEY JABR can be reached on (571) 272-1516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-

/NAEEM TASLIM ALAM/Examiner, Art Unit 3668                                                                                                                                                                                                        /YAZAN A SOOFI/Primary Examiner, Art Unit 3668