DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-17 are presented for examination.
Claims 1-17 are rejected.

Response to Arguments
Applicant's arguments filed 09/08/2021 have been fully considered but they are not persuasive.
The Applicants argued that the prior art on record does not explicitly teach “…before the autonomous driving vehicle has any interaction with an operational environment, and before the autonomous driving vehicle enters the operational environment”, please see Remarks/arguments filed 09/08/2021 Pages 1-3.
The Examiner kindly states that machine learning networks are pre-designed / pre-programmed to mimic the human brain. Deep Neural Networks (DNNs) are key in any autonomous vehicle as they collect the input data from sensors, process, and then produce the correct behavior for the vehicle. The prior arts on record does teach machine learning networks. Therefore, it is understood that the neural networks or machine learning networks are pre-programmed to learn (i.e., they are pre-trained to learn before they are placed into an operational environment) from the operational environment, Abstract, ¶ [0009]-¶ [0010] of MOTOMURA, and Abstract, ¶ [0038], Figs. 5a-b elements 700-710 of MOVERT. And also taught in  PETTINGER et al. (US 20210080967 A1) as disclosed in ¶ [0031], ¶ [0412], Fig. 4 steps 410-440, and Yao et al. (US 10935982 B2) as disclosed in Claims 1, 10-, and 16-18, and Fig. 3 elements 302-330 ) “…obtain, from an action neural network defining a pre-trained action model, a plurality of predicted subsequent states (s.sub.a′) of the vehicle in the environment that are predicted by the pre-trained action model using the current state of the vehicle in the environment and a plurality of actions...”
Therefore, the rejection is maintained with some elucidations to clarify Examiner’s position.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	The term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “module” have been interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because they use a generic placeholder “means for” coupled with functional language without reciting sufficient 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
A review of the original specification, filed on 01/25/2019, ¶ [0013], recites “… need not be limited to such configuration, given that in other embodiments the storage module may be any device that is capable of storing information, as may occur to those having ordinary skill in the art…”.          
 Therefore, the nonce word “unit” is a replacement for means and contains supporting structures.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-7, 9, 12, 14-15, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over MOTOMURA in view of MOVERT et al. (US Pub. No.: 2019/0176818 A1: hereinafter “MOVERT”).


         Consider claim 1:
                    MOTOMURA teaches an autonomous driving vehicle (Fig. 1 element vehicle 1) comprising: a body (Fig. 1 element vehicle 1 with a body); a source of motive power operatively coupled to the body (Fig. 1 element vehicle 1 with an engine); and a controller configured to control the source of motive power (Fig. 1 element vehicle 1 equipped with engine being controlled by the controller), wherein the controller includes a storage module in which pre-collecting data is stored (Figs. 56, 57, and 60-61, elements “a driver model (situation database) based on the driving environment history illustrated in FIG. 60 is not limited to a clustering driver model or an individual adaptive driver model, and may be, for example, built so as to include the driving environment histories for all drivers” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61); the controller is pre-trained with the pre-collected data, the pre-training being carried out using behavioral cloning before the autonomous driving vehicle has any interaction with an operational environment (it is understood that the neural networks or machine learning networks are pre-programmed to learn (i.e., they are pre-trained to learn before they are placed into an operational environment ) from the operational environment, e.g., the behaviors of the drivers, the driving situations etc. Hence, the prior art on record clearly teaches the newly added limitations), and before the autonomous driving vehicle enters the operational environment (See MOTOMURA, e.g., “…Vehicle controller 7 compares the environment parameters illustrated in (a) in FIG. 61 with the environment parameters in the driving environment histories for the driver models illustrated in (b) in FIG. 61, and the behavior associated with the most similar environment parameters is determined to be the primary behavior. Moreover, other behaviors associated with other similar environment parameters are determined to be secondary behaviors...” of Abstract, ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61).
                     MOTOMURA further teaches and the controller is further trained, after the pre-training, using an actor-critic reinforcement learning algorithm to effect final (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61). However, MOTOMURA does not explicitly teach wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations.
                     In an analogous filed of endeavor, MOVERT teaches wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations (See MOVERT, e.g., “…The deep neural network may be trained by supervised learning based on target values and paths recorded from human drivers in traffic or from automated drivers in traffic…Supervised learning may be understood as an imitation learning or behavioral cloning where the target values are taken from demonstrations of preferred behavior (i.e. driving paths in driving situations) and the deep neural network learns to behave as the demonstrations. …” of Abstract, ¶ [0038], Figs. 5a-b elements 700-710). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, by adding the above features, as taught by MOVERT, so as to provide promising path prediction solutions for taking appropriate driving decisions. 

Consider claim 2:
                    MOTOMURA teaches a method for learning and/or reinforcement (e.g., “An information processing system that appropriately estimates a driving conduct includes…a behavior learning unit configured to cause a neural network to learn a relationship between the vehicle environment state detected by the detector…estimate a behavior of the vehicle by inputting, into the neural network that learned, the vehicle environment state detected at a current point in time by the detector…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61), the method comprising the acts of: pre-collecting data (Figs. 56, 57, and 60-61, elements “a driver model (situation database) based on the driving environment history illustrated in FIG. 60 is not limited to a clustering driver model or an individual adaptive driver model, and may be, for example, built so as to include the driving environment histories for all drivers” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61).
                    MOTOMURA further teaches pre-training an actor with the pre-collected data, the pre-training being carried out using behavioral cloning (See MOTOMURA, e.g., “…Vehicle controller 7 compares the environment parameters illustrated in (a) in FIG. 61 with the environment parameters in the driving environment histories for the driver models illustrated in (b) in FIG. 61, and the behavior associated with the most similar environment parameters is determined to be the primary behavior. Moreover, other behaviors associated with other similar environment parameters are determined to be secondary behaviors...” of Abstract, ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61), and before the actor has any interaction with an operational environment, and before the actor enters the operational environment (it is understood that the neural networks or machine learning networks are pre-programmed to learn (i.e., they are pre-trained to learn before they are placed into an operational environment ) from the operational environment, e.g., the behaviors of the drivers, the driving situations etc. Hence, the prior art on record clearly teaches the newly added limitations); and after the pre-training, using an actor-critic reinforcement learning algorithm to effect final reinforcement (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61). However, MOTOMURA does not explicitly teach wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations.
                     In an analogous filed of endeavor, MOVERT teaches wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations (See MOVERT, e.g., “…The deep neural network may be trained by supervised learning based on target values and paths recorded from human drivers in traffic or from automated drivers in traffic…Supervised learning may be understood as an imitation learning or behavioral cloning where the target values are taken from demonstrations of preferred behavior (i.e. driving paths in driving situations) and the deep neural network learns to behave as the demonstrations. …” of Abstract, ¶ [0038], Figs. 5a-b elements 700-710). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, by adding the above features, as taught by MOVERT, so as to provide promising path prediction solutions for taking appropriate driving decisions. 

          Consider claim 3:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches wherein the pre-training also includes pre-training a critic (e.g., the neural network) with the pre-collected data (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig. 16 elements 401-403, and Fig 40-49, 60-61). 

          Consider claim 4:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches wherein the actor is a neural network (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig. 16 elements 401-403, and Fig 40-49, 60-61).

          Consider claim 5:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 3. In addition, MOTOMURA teaches wherein the critic is a neural network (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig. 16 elements 401-403, and Fig 40-49, 60-61).

          Consider claim 6:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 4. In addition, MOTOMURA teaches wherein the neural network is part of a vehicle (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig. 16 elements 401-403, and Fig. 40-49, 60-61). 

          Consider claim 7:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 6. In addition, MOTOMURA teaches wherein the vehicle is an autonomous driving vehicle (Fig. 1 element vehicle 1). 

         Consider claim 9:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 3. In addition, MOTOMURA teaches wherein for the acts of pre-collecting data, the pre-training, and reinforcement learning after pre-training, an arbitrary range of variables are controlled, including a steering angle/torque, a throttle position, a brake, lane change decisions, and a minimum distance to maintain with regarding to a preceding vehicle (See MOTOMURA, e.g., “…General purpose behavior estimation unit 412 inputs the input parameters to general purpose behavior estimation NN and outputs the output from the general purpose behavior estimation NN to histogram generator 413 as a tentative behavior estimation result…” of Abstract, ¶ [0269]-¶ [0273], Fig. 1 elements 1-91, Fig. 19). 

          Consider claim 12:
                    The combination of MOTOMURA, MOVERT everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches wherein the pre-collected data is data collected from a professional driver taken while real-world driving (Figs. 56, 57, and 60-61, elements “a driver model (situation database) based on the driving environment history illustrated in FIG. 60 is not limited to a clustering driver model or an individual adaptive driver model, and may be, for example, built so as to include the driving environment histories for all drivers” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61). 

          Consider claim 14:
                    The combination of MOTOMURA, MOVERT everything claimed as implemented above in the rejection of claim 3. In addition, MOTOMURA teaches wherein the pre-training of the critic and the pre-training of the actor are carried out separately (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61). 

          Consider claim 15:
                    The combination of MOTOMURA, MOVERT everything claimed as implemented above in the rejection of claim 4. In addition, MOTOMURA teaches wherein the pre-training of the actor and the pre-training of the critic respectively yield an initial actor and an initial critic (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61), and the initial actor and the initial critic are both used by the actor-critic reinforcement learning (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61). 

          Consider claim 17:
                    MOTOMURA teaches an autonomous driving vehicle (Fig. 1 element vehicle 1) comprising: a body (Fig. 1 element vehicle 1 with a body); a source of motive power operatively coupled to the body (Fig. 1 element vehicle 1 with an engine); and a controller configured to control the source of motive power (Fig. 1 element vehicle 1 equipped with engine being controlled by the controller), wherein the controller includes a storage module in which pre-collecting data is stored (Figs. 56, 57, and 60-61, elements “a driver model (situation database) based on the driving environment history illustrated in FIG. 60 is not limited to a clustering driver model or an individual adaptive driver model, and may be, for example, built so as to include the driving environment histories for all drivers” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61); the controller is pre-trained with the pre-collected data (See MOTOMURA, e.g., “…Vehicle controller 7 compares the environment parameters illustrated in (a) in FIG. 61 with the environment parameters in the driving environment histories for the driver models illustrated in (b) in FIG. 61, and the behavior associated with the most similar environment parameters is determined to be the primary behavior. Moreover, other behaviors associated with other similar environment parameters are determined to be secondary behaviors...” of Abstract, ¶ [0547]-¶ [0554], Fig, 1 elements 1-91, Fig 16 elements 401-403, and Fig 40-49, 60-61), the pre-training being carried out using: i) behavioral cloning, and ii) offline TD learning, the pre-training being carried out before the autonomous driving vehicle has any interaction with an operational environment, and before the actor enters the operational environment (it is understood that the neural networks or machine learning networks are pre-programmed to learn (i.e., they are pre-trained to learn before they are placed into an operational environment ) from the operational environment, e.g., the behaviors of the drivers, the driving situations etc. Hence, the prior art on record clearly teaches the newly added limitations) (See MOTOMURA, e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61).
                     MOTOMURA further teaches and the controller is further trained, after the pre-training, using an actor-critic reinforcement learning algorithm to effect final reinforcement (e.g., “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61). However, MOTOMURA does not explicitly teach wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations.
                     In an analogous filed of endeavor, MOVERT teaches wherein the behavioral cloning is a supervised machine learning problem including a training dataset that consists of expert demonstrations (See MOVERT, e.g., “…The deep neural network may be trained by supervised learning based on target values and paths recorded from human drivers in traffic or from automated drivers in traffic…Supervised learning may be understood as an imitation learning or behavioral cloning where the target values are taken from demonstrations of preferred behavior (i.e. driving paths in driving situations) and the deep neural network learns to behave as the demonstrations. …” of Abstract, ¶ [0038], Figs. 5a-b elements 700-710). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, by adding the above features, as taught by MOVERT, so as to provide promising path prediction solutions for taking appropriate driving decisions. 

Claims 8, 10-11-, 13, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over MOTOMURA in view of MOVERT, and further in view of Petousis.

          Consider claim 8:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 3. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61. However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein the pre-training of the critic is carried out using the offline TD learning. 
                    In an analogous filed of endeavor, Petousis teaches wherein the pre-training of the critic is carried out using the offline TD learning (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to implement a seamless, robust, and a safe autonomous learning. 

          Consider claim 10:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61.  However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein the actor-critic reinforcement learning algorithm uses deterministic policy gradient algorithms.
                    In an analogous filed of endeavor, Petousis teaches wherein the actor-critic reinforcement learning algorithm uses deterministic policy gradient algorithms (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning… gradient boosting machines, etc.)…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to enhance the learning abilities of the autonomous vehicles.

          Consider claim 11:
                    The combination of MOTOMURA, MOVERT teaches everything claimed as implemented above in the rejection of claim 2. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61.  However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein the actor-critic reinforcement learning algorithm uses stochastic policy gradients.
                    In an analogous filed of endeavor, Petousis teaches wherein the actor-critic reinforcement learning algorithm uses stochastic policy gradients (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning… gradient boosting machines, etc.)…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to enhance the neural networks learning capabilities for the autonomous vehicles.

         Consider claim 13:
                    The combination of MOTOMURA, MOVERT, and Petousis teaches everything claimed as implemented above in the rejection of claim 11. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61.  However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein 
                    In an analogous filed of endeavor, Petousis teaches wherein the stochastic policy gradients are selected from the groups consisting of at least A3C, A2C, and PPO (See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning… gradient boosting machines, etc.)…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to enhance the autonomous vehicles’ functionalities.

         Consider claim 16:
                    The combination of MOTOMURA, MOVER, and Petousis teaches everything claimed as implemented above in the rejection of claim 8. In addition, MOTOMURA teaches “…Behavior learning unit 401 builds a neural network for a specific driver (for example, driver x) from a driving history for driver x. Behavior learning unit 401 then outputs the built neural network as a behavior estimation NN to behavior estimation unit 402…” of Abstract, ¶ [0255]-¶ [0263], ¶ [0547]-¶ [0554], Fig. 1 elements 1-91, Fig 16 elements 401-403, and Fig. 40-49, 60-61.  However, the combination of MOTOMURA, MOVERT does not explicitly teach wherein during the actor-critic reinforcement learning the critic is trained using online TD learning. 
(See Petousis, e.g., “…The vehicle system may employ any suitable machine learning… temporal difference learning…” of Abstract, ¶ [0023], Figs. 3-4). 
                     Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of MOTOMURA, MOVERT by adding the above features, as taught by Petousis, so as to implement a seamless, robust, and a safe autonomous learning functions. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

          Palanisamy et al. (US Pub. No.: 2020/0033868 A1) teaches “Systems and methods are provided autonomous driving policy generation. The system can include a set of autonomous driver agents, and a driving policy generation module that includes a set of driving policy learner modules for generating and improving policies based on the collective experiences collected by the driver agents. The driver agents can collect driving experiences to create a knowledge base. The driving policy learner modules can process the collective driving experiences to extract driving policies. The driver agents can be trained via the driving policy learner modules in a parallel and distributed manner to find novel and efficient driving policies and behaviors faster and more efficiently. 

          Huber et al. (US Pub. No.: 2020/0050207 A1) teaches “Systems, Apparatuses and Methods for implementing a neural network system for controlling an autonomous vehicle (AV) are provided, which includes: a neural network having a plurality of nodes with context to vector (context2vec) contextual embeddings to enable operations of the of the AV; a plurality of encoded context2vec AV words in a sequence of timing to embed data of context and behavior; a set of inputs which comprise: at least one of a current, a prior, and a subsequent encoded context2vec AV word; a neural network solution applied by the at least one computer to determine a target context2vec AV word of each set of the inputs based on the current context2vec AV word; an output vector computed by the neural network that represents the embedded distributional one-hot scheme of the input encoded context2vec AV word; and a set of behavior control operations for controlling a behavior of the AV.”

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BABAR SARWAR whose telephone number is (571)270-5584.  The examiner can normally be reached on Mon-Fri 9:00 AM-5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Faris S. Almatrahi can be reached on (313)446-4821.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private 




/BABAR SARWAR/Primary Examiner, Art Unit 3667