DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The amendment filed on 05/20/2022 has been entered and fully considered.
Claims 1, 4, 9, and 12 have been amended.
Claims 3, 5, 11, 13, and 17-20 have been canceled.
Claims 1-2, 4, 6-10, 12, and 14-16 are pending in Instant Application.


Response to Arguments
Applicant’s arguments with respect to claim(s) 1 and 9 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s amendment to claims 1 and 9 have overcome the 101 rejection raised in the previous action; therefore the 101 rejection is hereby withdrawn.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103(a) are summarized as follows:
1.	Determining the scope and contents of the prior art.
2.	Ascertaining the differences between the prior art and the claims at issue.
3.	Resolving the level of ordinary skill in the pertinent art.
4.	Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2, 4, 6-10, 12, and 14-16 are rejected under 35 U.S.C. 103(a) as being unpatentable over Shalev-Shwartz et al. (USPGPub 2018/0032082), in view of Naghshvar et al. (USPGPub 2020/0150672), and further in view Angelov et al. (Simpl_eTS: A Simplified Method for Learning Evolving Takagi-Sugeno Fuzzy Models, 2005)	As per claim 1, Shalev-Shwartz discloses a method, comprising: 	calculating, via a reinforcement learning agent (RLA) controller, a plurality of state-action values based on sensor data representing an observed state (see at least paragraph 0176; wherein driving policy module 803, which is discussed in more detail below and which may be implemented using processing unit 110, may implement a desired driving policy in order to decide on one or more navigational actions for the host vehicle to take in response to the sensed navigational state…The technology used to generate the output of driving policy module 803 may include reinforcement learning. The output of driving policy module 803 may include at least one navigational action for the host vehicle and may include a desired acceleration, a desired yaw rate for the host vehicle, a desired trajectory, among other potential desired navigational actions), wherein the RLA controller utilizes a deep neural network (DNN) (see at last paragraph 0173; wherein any of the modules (e.g., modules 801, 803, and 805) disclosed herein may implement techniques associated with a trained system (such as a deep neural network));		mapping the plurality of state-action values to the sensor data (see at least paragraph 0176; wherein driving policy module 803, which is discussed in more detail below and which may be implemented using processing unit 110, may implement a desired driving policy in order to decide on one or more navigational actions for the host vehicle to take in response to the sensed navigational state);	actuating an agent based on at least one of the plurality of state-action values or the plurality of linear models (see at least paragraph 0176; wherein the technology used to generate the output of driving policy module 803 may include reinforcement learning. The output of driving policy module 803 may include at least one navigational action for the host vehicle and may include a desired acceleration, a desired yaw rate for the host vehicle, a desired trajectory, among other potential desired navigational actions);	wherein the agent includes the vehicle and wherein actuating the agent further comprises adjusting a speed of the vehicle based on at least one of the plurality of state-action values or the plurality of linear models (see at least paragraph 0176; wherein the technology used to generate the output of driving policy module 803 may include reinforcement learning. The output of driving policy module 803 may include at least one navigational action for the host vehicle and may include a desired acceleration, a desired yaw rate for the host vehicle, a desired trajectory, among other potential desired navigational actions). Shalev-Shwartz does not explicitly mention wherein the RLA uses, to calculate the state-action values, (i) a first reward that is a function of a user-selected maximum velocity for a vehicle, (ii) a second reward that is a function of a headway parameter representing a distance between the vehicle and a second vehicle, and (iii) a third reward that is a function of a change in acceleration of the vehicle and a predetermined allowable acceleration for the vehicle; and generating, via a fuzzy controller, a plurality of linear models. 	However Naghshvar does disclose:	wherein the RLA uses, to calculate the state-action values, (i) a first reward that is a function of a user-selected maximum velocity for a vehicle, (ii) a second reward that is a function of a headway parameter representing a distance between the vehicle and a second vehicle, and (iii) a third reward that is a function of a change in acceleration of the vehicle and a predetermined allowable acceleration for the vehicle (see at least paragraph 0089; wherein rewards may be determined for actions. A positive reward may be provided for keeping a safe distance, approaching a maximum desired speed, and successfully executing high-level options, such as a lane change or merge. A negative reward may be provided for increasing risk (relative distance to lead vehicle <safe distance) or passenger discomfort (high frequency of lane changes, high jerk, high lateral acceleration, etc.) The negative reward may also be provided if a motion planner cannot execute high-level options).	Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings as in Naghshvar with the teachings as in Shalev-Shwartz. The motivation for doing so would have been to improve reinforcement learning systems by consider uncertainty of state-action value functions before determining how to proceed with selecting an action corresponding to a state-action value function, see Naghshvar paragraph 0005.	Shalev-Shwartz and Naghshvar do not explicitly mention generating, via a fuzzy controller, a plurality of linear models.	However Angelov does disclose:	generating, via a fuzzy controller, a plurality of linear models (see at least page 1068 paragraph 4; wherein the nonlinear system forms a collection of loosely (fuzzily) coupled (blended) multiple linear models).	Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings as in Angelov with the teachings as in Shalev-Shwartz and Naghshvar. The motivation for doing so would have been to provide a simplified way to achieve fuzzy linear models, see Angelov Abstract. 	As per claims 2 and 10, Shalev-Shwartz discloses wherein the plurality of state-action values correspond to an optimal policy generated during reinforcement learning training (see at least paragraph 0176; wherein driving policy module 803, which is discussed in more detail below and which may be implemented using processing unit 110, may implement a desired driving policy in order to decide on one or more navigational actions for the host vehicle to take in response to the sensed navigational state…The technology used to generate the output of driving policy module 803 may include reinforcement learning. The output of driving policy module 803 may include at least one navigational action for the host vehicle and may include a desired acceleration, a desired yaw rate for the host vehicle, a desired trajectory, among other potential desired navigational actions). 	As per claim 4 and 12, Shalev-Shwartz discloses wherein the vehicle is an autonomous vehicle (see at least abstract; wherein navigating an autonomous vehicle). 	As per claims 6 and 14, Angelov discloses wherein the plurality of linear models comprise a set of IF- THEN rules mapping the plurality of state-action values to the sensor data (see at least page 1068 and equation 1; wherein IF-THEN fuzzy rules are present based on inputs provided).  	As per claims 7 and 15, Angelov discloses wherein the fuzzy controller uses an Evolving Takagi- Sugeno (ETS) model to generate the plurality of linear models (see at least page 1068 paragraph 2; wherein the eTS model is a TS fuzzy model…paragraph 4; wherein at the heart of the TS method for fuzzy modeling is the segmentation of the data space into fuzzily defined regions. The fuzzy regions are parameterized and each region is associated with a linear sub-system. As a result, the nonlinear system forms a collection of loosely (fuzzily) coupled (blended) multiple linear models).  	As per claims 8 and 16, Angelov discloses further comprising: determining, via the fuzzy controller, one or more data clusters corresponding to the sensor data, wherein each of the one or more data clusters comprises a focal point and a radius (see at least page 1070 paragraph 2; wherein The actual learning algorithm starts (stage 1) with the first standardized data sample that establishes the focal point of the first cluster (i=1). Its coordinates are used to form the antecedent part of the fuzzy rule (1) using for example the Cauchy membership functions (2). Its scatter, S is assumed equal to 0. Equation 14).	As per claim 9, Shalev-Shwartz discloses a system, comprising: 	at least one processor (see at least Figure 1; item 110); and 	at least one memory, wherein the at least one memory stores instructions executable by the at least one processor (see at least paragraph 0072; wherein each memory 140, 150 may include software instructions that when executed by a processor (e.g., applications processor 180 and/or image processor 190), may control operation of various aspects of system 100) such that the at least one processor is programmed to: 	calculate, via a deep neural network (see at last paragraph 0173; wherein any of the modules (e.g., modules 801, 803, and 805) disclosed herein may implement techniques associated with a trained system (such as a deep neural network)), a plurality of state-action values based on sensor data representing an observed state see at least paragraph 0176; wherein driving policy module 803, which is discussed in more detail below and which may be implemented using processing unit 110, may implement a desired driving policy in order to decide on one or more navigational actions for the host vehicle to take in response to the sensed navigational state…The technology used to generate the output of driving policy module 803 may include reinforcement learning. The output of driving policy module 803 may include at least one navigational action for the host vehicle and may include a desired acceleration, a desired yaw rate for the host vehicle, a desired trajectory, among other potential desired navigational actions); and 	mapping the plurality of state-action values to the sensor data (see at least paragraph 0176; wherein driving policy module 803, which is discussed in more detail below and which may be implemented using processing unit 110, may implement a desired driving policy in order to decide on one or more navigational actions for the host vehicle to take in response to the sensed navigational state);	actuating an agent based on at least one of the plurality of state-action values or the plurality of linear models (see at least paragraph 0176; wherein the technology used to generate the output of driving policy module 803 may include reinforcement learning. The output of driving policy module 803 may include at least one navigational action for the host vehicle and may include a desired acceleration, a desired yaw rate for the host vehicle, a desired trajectory, among other potential desired navigational actions);	wherein the agent includes the vehicle and wherein actuating the agent further comprises adjusting a speed of the vehicle based on at least one of the plurality of state-action values or the plurality of linear models (see at least paragraph 0176; wherein the technology used to generate the output of driving policy module 803 may include reinforcement learning. The output of driving policy module 803 may include at least one navigational action for the host vehicle and may include a desired acceleration, a desired yaw rate for the host vehicle, a desired trajectory, among other potential desired navigational actions). Shalev-Shwartz does not explicitly mention wherein the RLA uses, to calculate the state-action values, (i) a first reward that is a function of a user-selected maximum velocity for a vehicle, (ii) a second reward that is a function of a headway parameter representing a distance between the vehicle and a second vehicle, and (iii) a third reward that is a function of a change in acceleration of the vehicle and a predetermined allowable acceleration for the vehicle; and generate a plurality of linear models.	However Naghshvar does disclose:	wherein the RLA uses, to calculate the state-action values, (i) a first reward that is a function of a user-selected maximum velocity for a vehicle, (ii) a second reward that is a function of a headway parameter representing a distance between the vehicle and a second vehicle, and (iii) a third reward that is a function of a change in acceleration of the vehicle and a predetermined allowable acceleration for the vehicle (see at least paragraph 0089; wherein rewards may be determined for actions. A positive reward may be provided for keeping a safe distance, approaching a maximum desired speed, and successfully executing high-level options, such as a lane change or merge. A negative reward may be provided for increasing risk (relative distance to lead vehicle <safe distance) or passenger discomfort (high frequency of lane changes, high jerk, high lateral acceleration, etc.) The negative reward may also be provided if a motion planner cannot execute high-level options).	Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings as in Naghshvar with the teachings as in Shalev-Shwartz. The motivation for doing so would have been to improve reinforcement learning systems by consider uncertainty of state-action value functions before determining how to proceed with selecting an action corresponding to a state-action value function, see Naghshvar paragraph 0005.	Shalev-Shwartz and Naghshvar do not explicitly mention generate a plurality of linear models.	However Angelov does disclose:	generate a plurality of linear models (see at least page 1068 paragraph 4; wherein the nonlinear system forms a collection of loosely (fuzzily) coupled (blended) multiple linear models).	Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings as in Angelov with the teachings as in Shalev-Shwartz and Naghshvar. The motivation for doing so would have been to provide a simplified way to achieve fuzzy linear models, see Angelov Abstract.	

Relevant Art
The prior art made of record and not relied upon are considered pertinent to applicant’s disclosure:	USPGPub 2020/0193271 – Provides a system and method for predicting a characteristic of an object in an artificial intelligence system. The method includes evaluating the object using a first model to produce a first prediction of a characteristic of the object. The object is evaluated using a second model to produce a second prediction of the characteristic of the object, the second model being dissimilar to the first model. A final prediction of the characteristic of the object is generated as a function of dynamic weightings of the first prediction and the second prediction.	USPGPub 2020/0065665 – Provides a computing system can determine a vehicle action based on inputting vehicle sensor data to a first neural network including a first safety agent that can determine a probability of unsafe vehicle operation. The first neural network can be adapted, at a plurality of times, by a periodically retrained deep reinforcement learning agent that includes a second deep neural network including a second safety agent. A vehicle can be operated based on the vehicle action	CN105189237A – Provides a control device and more particularly to a device for controlling an autonomous or partially autonomous land vehicle.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAHMOUD S ISMAIL whose telephone number is (571)272-1326. The examiner can normally be reached M - F: 9:00AM- 5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jelani Smith can be reached on 571-270-3969. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MAHMOUD S ISMAIL/Primary Examiner, Art Unit 3662