DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is made in response to applicant’s arguments filed on 07/26/2022 wherein: independent claims 1, 11, and 19 are being amended, dependent claims 2-8, 10, 12-18, and 20 are being amended, and no new matter is added by the amendments. Accordingly, claims 1-20 are now pending.
Response to Arguments
Applicant’s arguments, filed on 07/26/2022, with respect to the rejection(s) of the claims under 112(b) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of the amended limitations.
Applicant's arguments filed on 07/26/2022 with respect to the 102 and 103 rejection have been fully considered but they are not persuasive. 
The applicant argues that the cited references, either alone or in combination, fail to disclose, teach, or otherwise suggest: "wherein the simulated environment comprises a plurality of operating parameters for training a reinforcement learning framework to calculate driving maneuvers for the autonomous vehicle; in response to determining the driving maneuver was beneficial, causing the critic neural network to reward the actor neural network during a training phase for the reinforcement learning framework; in response to determining the driving maneuver was not beneficial, causing the critic neural network to penalize the actor neural network during the training phase for the reinforcement learning framework”. While the examiner acknowledges the fact that Fang does not explicitly state that the simulated environment comprises a plurality of operating parameters for training a reinforcement learning framework to calculate driving maneuvers for the autonomous vehicle, the examiner respectfully disagrees with the fact that Fang does not teach in response to determining the driving maneuver was beneficial, causing the critic neural network to reward the actor neural network during a training phase for the reinforcement learning framework; in response to determining the driving maneuver was not beneficial, causing the critic neural network to penalize the actor neural network during the training phase for the reinforcement learning framework. In reference to Fang, Pages 9, Lines 484-500, and Page 11, Lines 614-620, Fang explicitly recites that the vehicle can sense the parking environment and select the optimal 614 decision action that can reach the target parking space according to the parking environment. Reaching the target parking space is interpreted to read on the fact that the driving maneuver is beneficial (or not), and accordingly the optimal decision action will be taken (reward and/or punishment). For example, Fang explicitly recites that when the vehicle performs a decision action in the parking environment, the vehicle is rewarded or punished, and the reward or 616 penalty is used to cause the vehicle to learn to perform the decision action to a new position state. For example, when the vehicle arrives at the target parking space, it is rewarded. However, when the vehicle collides with the obstacle, it is punished or negatively rewarded. Vehicles can self-learn based on rewards in order to get the most out of their subsequent decision-making actions. In addition to that, Fang recites that the side azimuth parking system includes two neural networks: an action network, which is interpreted to read on the actor neural network, and an evaluation network, which is interpreted to refer to the critic neural network. Furthermore, upon further consideration of the amended set of the claims, and as shown in the 103 rejection below, the Graepel reference (US 20180032863 A1), cited below, teaches a simulated environment comprises a plurality of operating parameters for training a reinforcement learning framework to calculate driving maneuvers for the autonomous vehicle (Claims 1, 3, and 16). Therefore, modifying Fang reference in view of the Graepel reference leads to the performance of the reward/penalty learning framework in the simulated environment that comprises plurality of maneuvering parameters, which may provide a better expectation of the actual environment that the autonomous vehicle may encounter while maneuvering.
The applicant also argues that Fang further fails to disclose the concepts of:
implementing an actor neural network to calculate a driving maneuver: the examiner respectfully refers the applicant to Page 9, Lines 490-492 of Fang, which explicitly recite that the action network generates a decision action u(t) according to the current state quantity X(t), and the decision action u(t) corresponds to a set of action control parameters including the velocity value of the throttle or the brake, and the rotation of the steering wheel. Angle, etc.;
determine whether the driving maneuver was beneficial for accurately maneuvering the autonomous vehicle: the examiner respectfully refers the applicant to Page 9, Lines 508-511 of Fang, which explicitly recite that the enhanced signal may exist in a numerical manner, and different values are used to evaluate the “good” and “bad” of the decision action made by the action network, and the larger the value of the enhanced signal indicates the better the decision action made by the action network, the smaller the value of the enhanced signal, the worse the decision action made by the action network;
using a critic neural network to reward or penalize the actor neural network: the examiner respectfully refers the applicant to Page 9, Lines 497-501 of Fang, which specifically state that the parking environment can input the current state quantity X(t) into the evaluation network, and at the same time, the action network can send the decision action u(t) according to the current state quantity X(t) to the evaluation network, and evaluate the network according to the current state quantity X ( t) and the decision action u(t).
The examiner also notes that the examiner is interpreting the actor and critic networks depending on their well-known definition in the art which is: The actor decides which action should be taken and the critic informs the actor how good was the action. Therefore, since the action neural network in Fang is used to output a decision action, and the evaluation network is used to evaluate the decision action, these networks are interpreted to exactly perform the functions of the actor and critic neural networks claimed by the applicant, and are thus interpreted to read on them respectively.
According to the discussion above, the claim limitations rejected over Fang, are maintained (See below). 
Claim Objections
Claims 6 and 16 are objected for reciting “error signal is calculating using” instead of “error signal is calculated using”. 
Claims 18 and 20 are objected to for reciting: “instructions further comprises” instead of “instructions further comprise”.
Appropriate corrections are required.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claims 1, 11, and 19 recite, “identifying an autonomous vehicle within a simulated environment”. On the other hand, the specification describes the invention with respect to the on-board vehicle sensors’ identification of the “state of the autonomous vehicle” which is represented by its location, orientation, and steering angle for example ([0052]). However, nothing in the disclosure specifies that the “autonomous vehicle” itself has to be identified within the simulated environment, before or for the reinforcement learning algorithm to perform the claimed functions. In other words, the specification considers the training process of the autonomous vehicle with the assumption that the autonomous vehicle is already within the simulated environment, and the rest of the training process depends on the identification of the state of the autonomous vehicle (location, orientation, steering angle). Determining the location, orientation, and/or steering angle of the autonomous vehicle within the simulated environment renders a different interpretation from identifying the autonomous vehicle within the simulated environment. Therefore, identifying an autonomous vehicle within a simulated environment is not well-supported in the specification, and the examiner interprets this limitation in view of the specification language (identification of the state of the autonomous vehicle (location, orientation, steering angle). Claims 2-10, 12-18 and 20 depend from claims 1, 11, and 19,  include all of their limitations, and do not cure their deficiencies, rendering them rejected under the same rationale. 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 5 -9, and 11-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 2 and 12 recite “the critical neural network”. There is insufficient antecedent basis for this term in the claims, nor in the claims from which they depend, rendering the metes and bounds of the claims indefinite. 
Claims 5, 15, 6, 16 recite “temporal difference error signal”. This term is considered indefinite because it is unclear what this temporal difference error represents. Although the claims recite the signals are utilized to perform the rewarding or penalizing functions, however, the claims do not specify if those signals comprise an evaluation or comparison of two elements, entities, or functions, that results in a temporal difference error, which is then relayed as a signal.  
Claims 7 and 17 recite “the corresponding state”. There is insufficient antecedent basis for this term in the claims, nor in the claims from which they depend, rendering the metes and bounds of the claims indefinite.
 Claims 8-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being dependent on rejected claim 7.
Independent claim 11 recites “a sensor”, then claim 13 recites “at least one sensor”. Therefore, it is unclear if the at least one sensor in claim 13 refers to the sensor in claim 11, or not. Therefore, the metes and bounds of the claims are ill-defined. Claims 12-18 depend from claim 11, include all of its limitations, and do not cure their deficiencies. Thus, they are rejected under the same rationale.
Claims 18 and 20 recites “a plurality of driving maneuvers”. The independent claims from which these claims depend also recite plurality of maneuvers. Therefore, it is unclear if the plurality of maneuvers recited in claims 18 and 20 are the same as the plurality of maneuvers recited in the independent claims or not. 
Claim 19 recites “the autonomous vehicle”. There is insufficient antecedent basis for this term in the claim, rendering the metes and bounds of the claim indefinite. Claim 20 depends from claim 19, includes all of its limitations, and does not cure its deficiencies, rendering it rejected under the same rationale. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-4, 11, 13-14, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Fang (CN-105527963-A; Examiner relied on the English translation document provided by the examiner and attached herein) in view of Graepel et al. (US 20180032863 A1).
Regarding claims 1, 11, and 19, Fang discloses a method, comprising: 
identifying an autonomous vehicle within a reinforcement learning framework(p. 8, Lines 431-445; p.11, Lines 614, 615: “the vehicle can sense the parking environment and select the optimal 614 decision action that can reach the target parking space according to the parking environment”; machine learning algorithm); 
calculating, with an actor neural network, a driving maneuver for navigating the autonomous vehicle from an initial location to a target destination (p. 3, Lines 154-156: “the starting point of the target parking travel path is an initial position indicated by position 155 information of the vehicle in the parking environment, The end point of the target parking travel route is a target 156 location point indicated by the location information of the target parking space in the parking environment”; P. 9, Lines 490-493: “the action network generates a decision action 490 u(t) according to the current state quantity X(t), and the decision action u(t) corresponds to a set of action 491 control parameters including the velocity value of the throttle or the brake, and the rotation of the steering 492 wheel. Angle, etc. The decision action u(t) changes the current position state of the smart car, causing the smart 493 car to transition from the current position state to a new position state”; P. 11, Lines 614-616; NOTE: the action network is broadly interpreted to read on the actor neural network);
determining, with a critic neural network, whether the driving maneuver was beneficial for accurately maneuvering the autonomous vehicle to the target destination; in response to determining the driving maneuver was beneficial causing the critic neural network to reward the actor neural network during a training phase for the reinforcement learning framework; and  in response to determining the driving maneuver was not beneficial, causing the critic neural network to penalize the actor neural network during the training phase for the reinforcement learning framework (P. 11, Lines 614-626: “the vehicle can sense the parking environment and select the optimal 614 decision action that can reach the target parking space according to the parking environment. When the vehicle 615 performs a decision action in the parking environment, the vehicle is rewarded or punished, and the reward or 616 penalty is used to cause the vehicle to learn to perform the decision action to a new position state”; P. 9, Lines 497-516; NOTE: the evaluation network is broadly interpreted to read on the critic network).
However, Fang does not explicitly state identifying an autonomous vehicle within a simulated environment wherein the simulated environment comprises a plurality of operating parameters for training a reinforcement learning framework to calculate driving maneuvers for the autonomous vehicle.
On the other hand, Graepel teaches identifying an autonomous vehicle within a simulated environment wherein the simulated environment comprises a plurality of operating parameters for training a reinforcement learning framework to calculate driving maneuvers for the autonomous vehicle (Claim 1: “initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data generated from interactions of the agent with a simulated version of the environment using reinforcement learning to determine trained values of the parameters of the reinforcement learning policy neural network from the initial values”; Claim 3: “the agent is a control system for an autonomous or semi-autonomous vehicle navigating through the real-world environment, wherein the actions in the set of actions are possible control inputs to control the autonomous or semi-autonomous vehicle”).
It would have been obvious for someone with ordinary skill in the art before the effective filing date of the current application to modify the teachings of the Fang reference and include features from the Graepel reference and involve a simulated environment that comprises a plurality of operating parameters for training a reinforcement learning framework to calculate driving maneuvers for the autonomous vehicle. Doing so would provide a better expectation of the actual/real environment that the autonomous vehicle may encounter while maneuvering.

Regarding claims 3 and 13, Fang discloses the autonomous vehicle comprises at least one sensor selected from the group consisting of a camera sensor, a lidar sensor, a radar sensor, a GPS sensor, and an ultrasound sensor (P. 10, Lines 541-545).
Regarding claims 4 and 14, Fang discloses further comprising determining a state of the autonomous vehicle within the simulated environment wherein the state comprises one or more of a location, or an orientation of the autonomous vehicle (P. 8, Lines 427-429).
Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Fang and Graepel in further view of Zhang et al. (US-10782694-B2).
Regarding claims 2 and 12, although Fang discloses that the sensors that acquire environment information is onboard (P. 8, Lines 484-486; P.10: Lines 541-545: “a detecting device such as a radar or a camera may be installed on the 542 vehicle, and the side azimuth parking system may detect the parking environment of the vehicle through the 543 detecting device and acquire the parking environment”; P. 9, Lines 486-489: “The two neural networks are forward transport networks using a 486 nonlinear multilayer perceptron structure, each of which contains a hidden layer. The specific learning process 487 is: the smart car itself perceives the current state quantity X(t) of the smart car, X(t) includes the position of the 488; smart car in the parking environment and the position of the target parking space in the parking environment, 489 and the smart car perceives the current state quantity”). It does not explicitly state each of the actor neural network and the critical neural network are installed onboard the autonomous vehicle during the training phase for the reinforcement learning framework.
On the other hand, Zhang teaches each of the first neural network and the second neural network is onboard the autonomous vehicle during the training phase (Figures 1 and 2; Lines 32-45).
It would have been obvious for someone with ordinary skill in the art to modify the teachings of the Fang reference and include features from the Zhang reference to have the neural networks onboard the vehicle during the training phase. Doing so would provide a more efficient training algorithm.
Claims 7, 8, 9, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Fang and Graepel in further view of Pietquin et al. (US2020/0151562A1)
Regarding claims 7 and 17, Fang discloses storing one or more of a state of the autonomous vehicle, an action taken at the corresponding state, or a reward and a penalty corresponding to the action (P. 14, Lines 833-838).
However, it does not explicitly state storing in a replay buffer one or more of a state of the autonomous vehicle, an action taken at the corresponding state, or a reward and a penalty corresponding to the action.
On the other hand, Pietquin teaches storing in a replay buffer one or more of a state of the autonomous vehicle, an action taken at the corresponding state, or a reward and a penalty corresponding to the action ([0016]: “The operation transition data is also stored in the replay buffer. The method samples from the replay buffer to train the actor-critic system”; [0017]).
It would have been obvious for someone with ordinary skill in the art to modify the teachings of the Fang reference and include features from the Pietquin reference to have a replay storage for storing data corresponding to the action. Doing so would provide a more efficient training algorithm.
Regarding claim 8, Fang does not explicitly state sampling the replay buffer to train the actor neural network.
On the other hand, Pietquin teaches sampling the replay buffer to train the actor neural network. ([0016]; [0017]).
It would have been obvious for someone with ordinary skill in the art to modify the teachings of the Fang reference and include features from the Pietquin reference to sample the replay storage to train the neural network. Doing so would provide a more efficient training algorithm.
Regarding claim 9, Fang discloses iteratively navigating the autonomous vehicle from the initial location to the target destination in accordance with the training (P. 9, Lines 490-493: “The decision action u(t) changes the current position state of the smart car, causing the smart 493 car to transition from the current position state to a new position state”).
Claims 10, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Fang and Graepel in view of Colijn et al. (US-10545510-B2)
Regarding claims 10, 18, and 20, Fang discloses communicating information from a neural network corresponding to each autonomous vehicle to a central master actor (Fig. 2-2 of original FOR reference; P. 9, lines 514-516: “the side azimuth parking system interacts with the parking environment at each moment, and adjusts the side azimuth parking strategy online by the “good” and “bad” of the enhanced signal feedback from the parking environment, so as to obtain the subsequent decision action.)
However, it does not explicitly state calculating a plurality of maneuvers for a plurality of autonomous vehicles from the initial location to the target location. 
On the other hand, Colijn teaches navigating a plurality of autonomous vehicles from the initial location to the target location (Abstract).
It would have been obvious for someone with ordinary skill in the art to modify the teachings of the Fang reference and include features from the Colijn reference and navigate a plurality of autonomous vehicles from an initial location to a target location. Doing so would provide a further advantage for the vehicle training process such that it could be applicable over a fleet of autonomous vehicles by assigning the vehicles to a plurality of parking locations. 
Allowable Subject Matter
Claims 5, 6, 15, and 16 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(a)/(b) or 35 U.S.C. 112 (pre-AIA ), 1st/2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHIRA BAAJOUR whose telephone number is (313)446-6602. The examiner can normally be reached 7:30 - 4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Olszewski can be reached on 571-272-2706. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/S.B./Examiner, Art Unit 3669                                                                                                                                                                                                        
/RAMI KHATIB/Primary Examiner, Art Unit 3669