DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/19/2022 has been entered.

Status of Claims
Claims 1-17 are presented for examination.
Claims 1-17 are rejected.

Response to Arguments
Applicant's arguments filed 09/08/2021 have been fully considered but they are not persuasive.
The Applicants argued that the prior art on record does not explicitly teach “…and some of the training dataset is collected while the expert demonstrations are being conducted, and some of the training dataset is collected after the expert demonstrations have been conducted.”, please see Remarks/arguments filed 09/08/2021 Pages 1-3.
The Examiner kindly states that machine learning networks are designed / programmed to mimic the human brain. Deep Neural Networks (DNNs) are key in any autonomous vehicle as they collect the input data from sensors, process, and then produce the correct behavior for the vehicle. The prior arts on record does teach machine learning networks. Therefore, it is understood that the neural networks or machine learning networks are programmed to learn (i.e., they are trained to keep on learning even in an operational environment) from the operational environment, e.g., the behaviors of the drivers, the driving situations, accidents etc. e.g., “…the behavior learning unit may include: a general purpose behavior learning unit configured to cause a neural network to learn, for each of a plurality of drivers, the relationship between the vehicle environment state detected by the detector and the behavior of the vehicle implemented after the vehicle environment state; and a dedicated behavior learning unit configured to build a dedicated neural network for a specific driver by transfer learning involving causing the neural network that learned to relearn by using the vehicle environment state detected by the detector for the specific driver and the behavior of the vehicle implemented after the vehicle environment state…”. Hence, the prior art on record clearly teaches the newly added limitations, e.g., “…and some of the training dataset is collected while the expert demonstrations are being conducted, and some of the training dataset is collected after the expert demonstrations have been conducted.”, as disclosed in MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0009]-¶ [0013], ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61
Therefore, the rejection is maintained with some elucidations to clarify Examiner’s position.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-7, 9, 12, 14-15, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over MOTOMURA in view of MOVERT et al. (US Pub. No.: 2019/0176818 A1: hereinafter “MOVERT”).


         Consider claim 1:
                    MOTOMURA teaches an autonomous driving vehicle (Fig. 1 element vehicle 1) comprising: a body (Fig. 1 element vehicle 1 with a body); a source of motive power operatively coupled to the body (Fig. 1 element vehicle 1 with an engine); and a controller configured to control the source of motive power (Fig. 1 element vehicle 1 equipped with engine being controlled by the controller), wherein the controller includes a storage module in which pre-collecting data is stored (Figs. 56, 57, and 60-61, elements “a driver model (situation database) based on the driving environment history illustrated in FIG. 60 is not limited to a clustering driver model or an individual adaptive driver model, and may be, for example, built so as to include the driving environment histories for all drivers” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61); the controller is pre-trained with the pre-collected data, the pre-training being carried out using behavioral cloning before the autonomous driving vehicle has any interaction with an operational environment (it is understood that the neural networks or machine learning networks are pre-programmed to learn (i.e., they are pre-trained to learn before they are placed into an operational environment ) from the operational environment, e.g., the behaviors of the drivers, the driving situations etc. Hence, the prior art on record clearly teaches the newly added limitations), and before the autonomous driving vehicle enters the operational environment (See MOTOMURA, e.g., “…Vehicle controller 7 compares the environment parameters illustrated in (a) in FIG. 61 with the environment parameters in the driving environment histories for the driver models illustrated in (b) in FIG. 61, and the behavior associated with the most similar environment parameters is determined to be the primary behavior. Moreover, other behaviors associated with other similar environment parameters are determined to be secondary behaviors...” of Abstract, ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61).
                     MOTOMURA further teaches and the controller is further trained, after the pre-training, using an actor-critic reinforcement learning algorithm to effect final reinforcement (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61), and some of the training dataset is collected while the expert demonstrations (e.g., Behavior learning unit 401 builds a neural network for a specific driver) are being conducted (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61), and some of the training dataset is collected after the expert demonstrations (e.g., Behavior learning unit 401 builds a neural network for a specific driver…a specific driver (for example, driver x) from a driving history for driver x. therefore, it is understood that the neural networks are pre-programmed, and also programmed to learn as they record the various driving situations.) have been conducted (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61). However, MOTOMURA does not explicitly teach wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations.
                     In an analogous filed of endeavor, MOVERT teaches wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations (See MOVERT, e.g., “…The deep neural network may be trained by supervised learning based on target values and paths recorded from human drivers in traffic or from automated drivers in traffic…Supervised learning may be understood as an imitation learning or behavioral cloning where the target values are taken from demonstrations of preferred behavior (i.e. driving paths in driving situations) and the deep neural network learns to behave as the demonstrations. …” of Abstract, ¶ [0038], Figs. 5a-b elements 700-710). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, by adding the above features “wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations”, as taught by MOVERT, so as to provide promising path prediction solutions for taking appropriate driving decisions. 

          Consider claim 2:
                    MOTOMURA teaches a method for learning and/or reinforcement (e.g., “An information processing system that appropriately estimates a driving conduct includes…a behavior learning unit configured to cause a neural network to learn a relationship between the vehicle environment state detected by the detector…estimate a behavior of the vehicle by inputting, into the neural network that learned, the vehicle environment state detected at a current point in time by the detector…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61), the method comprising the acts of: pre-collecting data (Figs. 56, 57, and 60-61, elements “a driver model (situation database) based on the driving environment history illustrated in FIG. 60 is not limited to a clustering driver model or an individual adaptive driver model, and may be, for example, built so as to include the driving environment histories for all drivers” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61).
                    MOTOMURA further teaches pre-training an actor with the pre-collected data, the pre-training being carried out using behavioral cloning (See MOTOMURA, e.g., “…Vehicle controller 7 compares the environment parameters illustrated in (a) in FIG. 61 with the environment parameters in the driving environment histories for the driver models illustrated in (b) in FIG. 61, and the behavior associated with the most similar environment parameters is determined to be the primary behavior. Moreover, other behaviors associated with other similar environment parameters are determined to be secondary behaviors...” of Abstract, ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61), and before the actor has any interaction with an operational environment, and before the actor enters the operational environment (it is understood that the neural networks or machine learning networks are pre-programmed to learn (i.e., they are pre-trained to learn before they are placed into an operational environment ) from the operational environment, e.g., the behaviors of the drivers, the driving situations etc. Hence, the prior art on record clearly teaches the newly added limitations); and after the pre-training, using an actor-critic reinforcement learning algorithm to effect final reinforcement (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61), and some of the training dataset is collected while the expert demonstrations (e.g., Behavior learning unit 401 builds a neural network for a specific driver) are being conducted (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61), and some of the training dataset is collected after the expert demonstrations (e.g., Behavior learning unit 401 builds a neural network for a specific driver…a specific driver (for example, driver x) from a driving history for driver x. therefore, it is understood that the neural networks are pre-programmed, and also programmed to learn as they record the various driving situations.) have been conducted (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61). However, MOTOMURA does not explicitly teach wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations.
                     In an analogous filed of endeavor, MOVERT teaches wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations (See MOVERT, e.g., “…The deep neural network may be trained by supervised learning based on target values and paths recorded from human drivers in traffic or from automated drivers in traffic…Supervised learning may be understood as an imitation learning or behavioral cloning where the target values are taken from demonstrations of preferred behavior (i.e. driving paths in driving situations) and the deep neural network learns to behave as the demonstrations. …” of Abstract, ¶ [0038], Figs. 5a-b elements 700-710). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, by adding the above features “wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations”, as taught by MOVERT, so as to provide promising path prediction solutions for taking appropriate driving decisions. 

          Consider claim 3:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches wherein the pre-training also includes pre-training a critic (e.g., the neural network) with the pre-collected data (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig. 16 elements 401-403, and Fig 40-49, 60-61). 

          Consider claim 4:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches wherein the actor is a neural network (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig. 16 elements 401-403, and Fig 40-49, 60-61).

          Consider claim 5:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 3. In addition, MOTOMURA teaches wherein the critic is a neural network (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig. 16 elements 401-403, and Fig 40-49, 60-61).

          Consider claim 6:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 4. In addition, MOTOMURA teaches wherein the neural network is part of a vehicle (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig. 16 elements 401-403, and Fig. 40-49, 60-61). 

          Consider claim 7:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 6. In addition, MOTOMURA teaches wherein the vehicle is an autonomous driving vehicle (Fig. 1 element vehicle 1). 

         Consider claim 9:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 3. In addition, MOTOMURA teaches wherein for the acts of pre-collecting data, the pre-training, and reinforcement learning after pre-training, an arbitrary range of variables are controlled, including a steering angle/torque, a throttle position, a brake, lane change decisions, and a minimum distance to maintain with regarding to a preceding vehicle (See MOTOMURA, e.g., “…General purpose behavior estimation unit 412 inputs the input parameters to general purpose behavior estimation NN and outputs the output from the general purpose behavior estimation NN to histogram generator 413 as a tentative behavior estimation result…” of Abstract, ¶ [0269]-¶ [0273], Fig. 1 elements 1-91, Fig. 19). 

          Consider claim 12:
                    The combination of MOTOMURA, MOVERT everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches wherein the pre-collected data is data collected from a professional driver taken while real-world driving (Figs. 56, 57, and 60-61, elements “a driver model (situation database) based on the driving environment history illustrated in FIG. 60 is not limited to a clustering driver model or an individual adaptive driver model, and may be, for example, built so as to include the driving environment histories for all drivers” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61). 

          Consider claim 14:
                    The combination of MOTOMURA, MOVERT everything claimed as implemented above in the rejection of claim 3. In addition, MOTOMURA teaches wherein the pre-training of the critic and the pre-training of the actor are carried out separately (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61). 

          Consider claim 15:
                    The combination of MOTOMURA, MOVERT everything claimed as implemented above in the rejection of claim 4. In addition, MOTOMURA teaches wherein the pre-training of the actor and the pre-training of the critic respectively yield an initial actor and an initial critic (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61), and the initial actor and the initial critic are both used by the actor-critic reinforcement learning algorithm to fine-tune and thereby improve a final agent (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61). 

          Consider claim 17:
                    MOTOMURA teaches an autonomous driving vehicle (Fig. 1 element vehicle 1) comprising: a body (Fig. 1 element vehicle 1 with a body); a source of motive power operatively coupled to the body (Fig. 1 element vehicle 1 with an engine); and a controller configured to control the source of motive power (Fig. 1 element vehicle 1 equipped with engine being controlled by the controller), wherein the controller includes a storage module in which pre-collecting data is stored (Figs. 56, 57, and 60-61, elements “a driver model (situation database) based on the driving environment history illustrated in FIG. 60 is not limited to a clustering driver model or an individual adaptive driver model, and may be, for example, built so as to include the driving environment histories for all drivers” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61); the controller is pre-trained with the pre-collected data (See MOTOMURA, e.g., “…Vehicle controller 7 compares the environment parameters illustrated in (a) in FIG. 61 with the environment parameters in the driving environment histories for the driver models illustrated in (b) in FIG. 61, and the behavior associated with the most similar environment parameters is determined to be the primary behavior. Moreover, other behaviors associated with other similar environment parameters are determined to be secondary behaviors...” of Abstract, ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61), the pre-training being carried out using: i) behavioral cloning, and ii) offline TD learning, the pre-training being carried out before the autonomous driving vehicle has any interaction with an operational environment, and before the actor enters the operational environment (it is understood that the neural networks or machine learning networks are pre-programmed to learn (i.e., they are pre-trained to learn before they are placed into an operational environment ) from the operational environment, e.g., the behaviors of the drivers, the driving situations etc. Hence, the prior art on record clearly teaches the newly added limitations) (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61).
                     MOTOMURA further teaches and the controller is further trained, after the pre-training, using an actor-critic reinforcement learning algorithm to effect final reinforcement (e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61), and some of the training dataset is collected while the expert demonstrations (e.g., Behavior learning unit 401 builds a neural network for a specific driver) are being conducted (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61), and some of the training dataset is collected after the expert demonstrations (e.g., Behavior learning unit 401 builds a neural network for a specific driver…a specific driver (for example, driver x) from a driving history for driver x. therefore, it is understood that the neural networks are pre-programmed, and also programmed to learn as they record the various driving situations.) have been conducted (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61). However, MOTOMURA does not explicitly teach wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations.
                     In an analogous filed of endeavor, MOVERT teaches wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations (See MOVERT, e.g., “…The deep neural network may be trained by supervised learning based on target values and paths recorded from human drivers in traffic or from automated drivers in traffic…Supervised learning may be understood as an imitation learning or behavioral cloning where the target values are taken from demonstrations of preferred behavior (i.e. driving paths in driving situations) and the deep neural network learns to behave as the demonstrations. …” of Abstract, ¶ [0038], Figs. 5a-b elements 700-710). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, by adding the above features “wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations”,  as taught by MOVERT, so as to provide promising path prediction solutions for taking appropriate driving decisions. 

Claims 8, 10-11-, 13, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over MOTOMURA in view of MOVERT, and further in view of Petousis.

          Consider claim 8:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 3. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61. However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein the pre-training of the critic is carried out using the offline TD learning. 
                    In an analogous filed of endeavor, Petousis teaches wherein the pre-training of the critic is carried out using the offline TD learning (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to implement a seamless, robust, and a safe autonomous learning. 

          Consider claim 10:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61.  However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein the actor-critic reinforcement learning algorithm uses deterministic policy gradient algorithms.
                    In an analogous filed of endeavor, Petousis teaches wherein the actor-critic reinforcement learning algorithm uses deterministic policy gradient algorithms (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning… gradient boosting machines, etc.)…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to enhance the learning abilities of the autonomous vehicles.

          Consider claim 11:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61.  However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein the actor-critic reinforcement learning algorithm uses stochastic policy gradients.
                    In an analogous filed of endeavor, Petousis teaches wherein the actor-critic reinforcement learning algorithm uses stochastic policy gradients (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning… gradient boosting machines, etc.)…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to enhance the neural networks learning capabilities for the autonomous vehicles.

         Consider claim 13:
                    The combination of MOTOMURA, MOVERT, and Petousis teaches everything claimed as implemented above in the rejection of claim 11. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61.  However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein the stochastic policy gradients are selected from the groups consisting of at least A3C, A2C, and PPO.
                    In an analogous filed of endeavor, Petousis teaches wherein the stochastic policy gradients are selected from the groups consisting of at least A3C, A2C, and PPO (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning… gradient boosting machines, etc.)…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to enhance the autonomous vehicles’ functionalities.

         Consider claim 16:
                    The combination of MOTOMURA, MOVER, and Petousis teaches everything claimed as implemented above in the rejection of claim 8. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61.  However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein during the actor-critic reinforcement learning the critic is trained using online TD learning. 
                    In an analogous filed of endeavor, Petousis teaches wherein during the actor-critic reinforcement learning the critic is trained using online TD learning (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to implement a seamless, robust, and a safe autonomous learning functions. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

          Palanisamy et al. (US Pub. No.: 2020/0033868 A1) teaches “Systems and methods are provided autonomous driving policy generation. The system can include a set of autonomous driver agents, and a driving policy generation module that includes a set of driving policy learner modules for generating and improving policies based on the collective experiences collected by the driver agents. The driver agents can collect driving experiences to create a knowledge base. The driving policy learner modules can process the collective driving experiences to extract driving policies. The driver agents can be trained via the driving policy learner modules in a parallel and distributed manner to find novel and efficient driving policies and behaviors faster and more efficiently. Parallel and distributed learning can enable accelerated training of multiple autonomous intelligent driver agents.”

          Huber et al. (US Pub. No.: 2020/0050207 A1) teaches “Systems, Apparatuses and Methods for implementing a neural network system for controlling an autonomous vehicle (AV) are provided, which includes: a neural network having a plurality of nodes with context to vector (context2vec) contextual embeddings to enable operations of the of the AV; a plurality of encoded context2vec AV words in a sequence of timing to embed data of context and behavior; a set of inputs which comprise: at least one of a current, a prior, and a subsequent encoded context2vec AV word; a neural network solution applied by the at least one computer to determine a target context2vec AV word of each set of the inputs based on the current context2vec AV word; an output vector computed by the neural network that represents the embedded distributional one-hot scheme of the input encoded context2vec AV word; and a set of behavior control operations for controlling a behavior of the AV.”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BABAR SARWAR whose telephone number is (571)270-5584.  The examiner can normally be reached on Mon-Fri 9:00 AM-5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Faris S. Almatrahi can be reached on (313)446-4821.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/BABAR SARWAR/Primary Examiner, Art Unit 3667