DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on June 27, 2018.
This office action is in response to Amendments and/or Remarks filed on July 28, 2022. In the current amendment, claims 1, 3-7, and 10-20 are amended. No claims are cancelled. Claims 1-20 are pending. 
Priority
Acknowledgement is made of applicant’s claim for priority under 35 U.S.C. 119(a)-(d). The certified copy has been filed on June 27, 2018 and October 10, 2018. 

Drawings
The drawings filed on June 27, 2018 are accepted.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are the following (generic placeholder and linking phrase in bold): 

Claim 19
	an acquiring unit configured to acquire control information…

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 16-18 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Martinez-Marin et al. (“Navigation of Autonomous Vehicles in Unknown Environments using Reinforcement Learning”).

Claim 16,
Martinez-Marin teaches: 
An information processing apparatus comprising: 
a receiving unit configured to receive an environmental parameter relating to an unlearned environment state; (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches using a MPC555 microcontroller (information processing apparatus); Page 874: “The new algorithm has been implemented as a model-based reinforcement learning method, where the back-ups are made by simulation following the shortest path search backward in time, starting from the goal state. In direct reinforcement learning, the back-ups are only made by experimentation, which is suitable when the back-up time for experimentation compared with simulation is not very high. In general, model-based reinforcement learning find better trajectories and manages changes in the environment (e. g. obstacles) and the goals more efficiently than direct reinforcement learning.” teaches using a model-based reinforcement learning method to find better trajectories for navigating the autonomous vehicle in the environment (environmental model); Table 1 and Page 873: “Although the state space of the system is three-dimensional, some vehicle behaviours, such as wall following, can be specified in a two-dimensional state space. In this case, the problem is simplified by fixing the value of vT . Then, the task is reduced to control the orientation of the vehicle (ς) in a two-dimensional state space (d, β), where d is the distance between the vehicle and the wall, and β is the relative orientation of the vehicle with respect to the wall.” teaches obtaining parameters (environmental parameters) for the reinforcement learning algorithm to navigate the vehicle in the environment; Page 872: “In this paper we propose a generic approach for navigation of nonholonomic vehicles in unknown environments. The vehicle model is also unknown, so the path planner uses reinforcement learning to acquire the optimal behavior together with the model, which is estimated by a reduced set of transitions” teaches that the environment is unknown)

and a generating unit configured to generate data relating to behavior of a first control target in a new environmental model generated based on the environmental parameter. (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches using a MPC555 microcontroller (information processing apparatus, generating unit is part of the microcontroller) to control (generate response information) an autonomous vehicle (control target); Page 874: “The new algorithm has been implemented as a model-based reinforcement learning method, where the back-ups are made by simulation following the shortest path search backward in time, starting from the goal state. In direct reinforcement learning, the back-ups are only made by experimentation, which is suitable when the back-up time for experimentation compared with simulation is not very high. In general, model-based reinforcement learning find better trajectories and manages changes in the environment (e. g. obstacles) and the goals more efficiently than direct reinforcement learning.” teaches using a model-based reinforcement learning method to find better trajectories for navigating the autonomous vehicle in the environment (environmental model); Table 1 and Page 873: “Although the state space of the system is three-dimensional, some vehicle behaviours, such as wall following, can be specified in a two-dimensional state space. In this case, the problem is simplified by fixing the value of vT . Then, the task is reduced to control the orientation of the vehicle (ς) in a two-dimensional state space (d, β), where d is the distance between the vehicle and the wall, and β is the relative orientation of the vehicle with respect to the wall.” teaches that the reinforcement learning model for navigating the vehicle in the environment is based on parameters (environmental parameters))

wherein the receiving unit and the generating unit are each implemented via at least one processor. (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches a computer based implementation with a MPC555 microcontroller)

Claim 17,
Martinez-Marin teaches: 
The information processing apparatus according to claim 16, 
Martinez-Marin further teaches:
wherein the receiving unit receives at least one of sensor information acquired from one or more sensors, a reward parameter relating to control learning of the first control target, or control information acquired from a second control target. (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle includes infrared sensors, a laser scanner, and a CMOS camera (one or more sensors); Table 1 and Page 874: “The RL algorithm deals with several data structures for organizing the available information and storing the partial results of the learning process. These structures are the following: Q(s, a): is the Q-value table where the accumulated reward for the (s, a)-pair is saved. From this table, the optimal policy is obtained according to 5” teaches that the reinforcement learning algorithm has a reward parameter (reward parameter relating to the machine learning); Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle uses RL controllers (learning unit) for reinforcement learning that are processed by the microcontroller, therefore the reward parameter is sent to the RL controllers; Page 875: “The algorithm proposed has been tested in simulation using the vehicle model given in section II. The state space needed to learn the wall following behaviour is two-dimensional (d, β). The variables values and parameters are depicted in Table I.” teaches parameters used to control the autonomous vehicle (control target) in a simulation (second control target); Page 872: “ Implementational aspects of the new algorithm are addressed in section V, tested in section VI and VII through simulations and experimentation on a real vehicle, respectively.” teaches that implementation of the reinforcement learning algorithm to control the autonomous vehicle is done through simulation (second control target) and through a real vehicle)

Claim 18,
Martinez-Marin teaches: 
The information processing apparatus according to claim 17, 
Martinez-Marin further teaches:
wherein the second control target includes at least one of a vehicle which travels in a real world or a virtual vehicle in a game or a simulator. (Page 872: “Implementational aspects of the new algorithm are addressed in section V, tested in section VI and VII through simulations and experimentation on a real vehicle, respectively.” teaches that the autonomous vehicle can be implemented via simulations and through a real vehicle)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martinez-Marin et al. (“Navigation of Autonomous Vehicles in Unknown Environments using Reinforcement Learning”) in view of Luo et al. (“Simulation of Natural Environment Impacts on Intelligent Vehicle Based on a Virtual Reality Platform”).

Claim 1,
Martinez-Marin teaches: 
An information processing apparatus comprising: 
a generating unit configured to generate first response information relating to a control target in a first environmental model generated based on a first environmental parameter; (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches using a MPC555 microcontroller (information processing apparatus, generating unit is part of the microcontroller) to control (generate first response information) an autonomous vehicle (control target); Page 874: “The new algorithm has been implemented as a model-based reinforcement learning method, where the back-ups are made by simulation following the shortest path search backward in time, starting from the goal state. In direct reinforcement learning, the back-ups are only made by experimentation, which is suitable when the back-up time for experimentation compared with simulation is not very high. In general, model-based reinforcement learning find better trajectories and manages changes in the environment (e. g. obstacles) and the goals more efficiently than direct reinforcement learning.” teaches using a model-based reinforcement learning method to find better trajectories for navigating the autonomous vehicle in the environment (first environmental model); Table 1 and Page 873: “Although the state space of the system is three-dimensional, some vehicle behaviours, such as wall following, can be specified in a two-dimensional state space. In this case, the problem is simplified by fixing the value of vT . Then, the task is reduced to control the orientation of the vehicle (ς) in a two-dimensional state space (d, β), where d is the distance between the vehicle and the wall, and β is the relative orientation of the vehicle with respect to the wall.” teaches that the reinforcement learning model for navigating the vehicle in the environment is based on parameters (environmental parameters))

    PNG
    media_image1.png
    342
    554
    media_image1.png
    Greyscale


and a communication unit configured to transmit the first response information and the first environmental parameter to a learning unit which performs machine learning relating to control of the control target, and  (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle uses reinforcement learning controllers (learning unit that performs machine learning) and that the microcontroller processes data from all sensors and the RL controllers (response information from the microcontroller and environmental parameters is sent to the RL controller); Page 874: “The vehicle should move forward and backward to avoid possible obstacles in the way. For that reason, during the training phase two RL controller are built: one for forward motion and the other for backward motion. Thus, employing both controllers the vehicle is able to avoid obstacles and turn around (see experimental results in Fig. 4) in a natural manner.” teaches that the RL controllers are used to control navigation of the autonomous vehicle (control target))

	wherein the generating unit and the communication unit are each implemented via at least one processor. (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches a computer based implementation with a MPC555 microcontroller)

Martinez-Marin does not appear to explicitly teach: 
receive a second environmental parameter relating to a request of a second environmental model in accordance with progress of the machine learning,
wherein the generating unit is further configured to generate second response information in the second environmental model generated based on the second environmental parameter, and 

However, Luo teaches: 
receive a second environmental parameter relating to a request of a second environmental model in accordance with progress of the machine learning, (Page 117-118, Section 2.2: “The related environmental information are selected from the terrain field, atmosphere field as well as civilian field. With these data, we can modeling the environmental model, such as the road trajectory and the rules of highway or the climate changing. About the intelligent vehicle system, various component models of each intelligent vehicle should be modeled involving the corresponding environmental factors, like vehicle longitudinal/lateral dynamic model, tire/road model, sensor model, etc.” teaches generating an environmental model to virtually model the natural environment and environmental factors; Page 118, Section 3.1: “Some components of models or controllers are defined as the receivers of VSD so as to making them being a function of environmental conditions (climate and terrain). With regard to the transmitters of VSD, they are set in the virtual world to provide real-time NE information or other vehicles’ motion data. In terms of Fig. 2, the transmitters and receivers defined here could be seen as the part “Environmental model” and the part “Component model” respectively.” and Page 118, Section 3.2: “The automatic vehicle should be intelligent enough to react to various changes like the changes of NE conditions or traffic conditions, etc. Concerning the NE conditions, we have considered two kinds of transmitter: firstly the climate VSD transmitter, which contain weather/road condition, temperature, air density and wind speed; secondly the terrain VSD transmitter, which contain slope, road type (e.g. straight or curved road) and road curvature.” teaches that the environmental model transmits virtual sensor data about the weather, road, temperature, air density, wind speed, road type, and road type to receivers in the intelligent vehicle system (receive environmental parameters); Page 117, Fig. 1 teaches that multiple environmental models can be generated)

    PNG
    media_image2.png
    133
    539
    media_image2.png
    Greyscale


wherein the generating unit is further configured to generate second response information in the second environmental model generated based on the second environmental parameter, and (Page 118, Section 2.2: “And the effects from natural environment can act on the component models and further extend this influence to the vehicles’ behaviors, such as lane following, lane changing, overtaking and so on.” and Page 118, Section 3.2: “In IVVR platform, with the above VSD transmitters, users can observe not only real-time vehicle dynamic reaction under different NE conditions but also some artificial intelligent vehicle behaviors which make the simulation more reasonable. For example, if NE situation changes, the virtual vehicles would detect these changes, and the transmitters would send out the corresponding signals, then, certain receivers would receive the signals and chose to decelerate or accelerate the velocity to assure the comfortableness of driver or to keep a safe distance from the preceding vehicle” teaches that the vehicles behavior is changed (generate response information) depending on the environmental model and data generated from the VSD transmitters (environmental parameters))

Martinez-Marin and Luo are analogous art because they are directed to autonomous vehicle systems. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the environmental model of Martinez-Marin to incorporate the multiple environmental models and virtual sensor data of Luo with a motivation to improve autonomous vehicle driving and reduce collisions with adverse weather (Luo, Page 116). 

Claim 2,
The combination of Martinez-Marin and Luo teaches: 
The information processing apparatus according to claim 1, 
Martinez-Marin further teaches:
wherein the communication unit transmits a reward parameter relating to the machine learning to the learning unit. (Table 1 and Page 874: “The RL algorithm deals with several data structures for organizing the available information and storing the partial results of the learning process. These structures are the following: Q(s, a): is the Q-value table where the accumulated reward for the (s, a)-pair is saved. From this table, the optimal policy is obtained according to 5” teaches that the reinforcement learning algorithm has a reward parameter (reward parameter relating to the machine learning); Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle uses RL controllers (learning unit) for reinforcement learning that are processed by the microcontroller, therefore the reward parameter is sent to the RL controllers)

    PNG
    media_image1.png
    342
    554
    media_image1.png
    Greyscale


Claim 3,
The combination of Martinez-Marin and Luo teaches:  
The information processing apparatus according to claim 1, 
Martinez-Marin further teaches:
wherein each environmental parameter includes at least one of an external parameter which does not depend on a state of the control target and an internal parameter which depends on the state of the control target. (Table 1 and Page 872: “The state space formulation [3] of the vehicle model we
will use is the following… where vT is the translational velocity and ς is the steering angle of the vehicle. The distance between the reference point (x, y) and the middle point of the driving  wheels is 0.32 m. The orientation of the car is denoted by θ. The two control variables of a car are the velocity vT of the driving wheels and the steering angle ς.” teaches that the environmental parameters include the steering angle and velocity (internal parameters that depend on the state of the vehicle (control target)) and sampling times (external parameter which does not depend on state of vehicle)) 


Claim 4,
The combination of Martinez-Marin and Luo teaches: 
The information processing apparatus according to claim 3, 
Martinez-Marin further teaches:
wherein the external parameter includes at least one of geographical information, time information, a weather condition, outdoor information, indoor information, information relating to a traffic object, or road surface information. (Table 1 teaches that the sampling time (external parameter) is a parameter relating to time information)

Claim 5,
The combination of Martinez-Marin and Luo teaches:
The information processing apparatus according to claim 3, 
Martinez-Marin further teaches:
wherein the control target is a vehicle, and wherein the internal parameter includes at least one of vehicle body information, loaded object information, or passenger information. (Table 1 and Page 872: “The state space formulation [3] of the vehicle model we will use is the following… where vT is the translational velocity and ς is the steering angle of the vehicle. The distance between the reference point (x, y) and the middle point of the driving  wheels is 0.32 m. The orientation of the car is denoted by θ. The two control variables of a car are the velocity vT of the driving wheels and the steering angle ς.” teaches that the translational velocity and steering angle are internal parameters related to vehicle body information)

Claim 6,
Martinez-Marin teaches:
An information processing apparatus comprising: 
a communication unit configured to receive first response information relating to a control target in a first environmental model generated based on a first environmental parameter, and the first environmental parameter; (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches using a MPC555 microcontroller (information processing apparatus, generating unit is part of the microcontroller) to control (generate response information) an autonomous vehicle (control target), the autonomous vehicle includes sensors and RL controllers processed by the microcontroller, therefore the sensor data and reinforcement learning data is received by the processor; Page 874: “The new algorithm has been implemented as a model-based reinforcement learning method, where the back-ups are made by simulation following the shortest path search backward in time, starting from the goal state. In direct reinforcement learning, the back-ups are only made by experimentation, which is suitable when the back-up time for experimentation compared with simulation is not very high. In general, model-based reinforcement learning find better trajectories and manages changes in the environment (e. g. obstacles) and the goals more efficiently than direct reinforcement learning.” teaches using a model-based reinforcement learning method to find better trajectories for navigating the autonomous vehicle in the environment (environmental model); Table 1 and Page 873: “Although the state space of the system is three-dimensional, some vehicle behaviours, such as wall following, can be specified in a two-dimensional state space. In this case, the problem is simplified by fixing the value of vT . Then, the task is reduced to control the orientation of the vehicle (ς) in a two-dimensional state space (d, β), where d is the distance between the vehicle and the wall, and β is the relative orientation of the vehicle with respect to the wall.” teaches that the reinforcement learning model for navigating the vehicle in the environment is based on parameters (environmental parameters)))

and a learning unit configured to perform machine learning relating to control of the control target using the received first response information and the received first environmental parameter, (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle uses reinforcement learning controllers (learning unit that performs machine learning) and that the microcontroller processes data from all sensors and the RL controllers (response information from the microcontroller and environmental parameters is sent to the RL controller); Page 874: “The vehicle should move forward and backward to avoid possible obstacles in the way. For that reason, during the training phase two RL controller are built: one for forward motion and the other for backward motion. Thus, employing both controllers the vehicle is able to avoid obstacles and turn around (see experimental results in Fig. 4) in a natural manner.” teaches that the RL controllers are used to control navigation of the autonomous vehicle (control target))
wherein the communication unit, the learning unit, and the generating unit are each implemented via at least one processor. (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches a computer based implementation with a MPC555 microcontroller)

Martinez-Marin does not appear to explicitly teach: 
wherein the communication unit is further configured to transmit a second environmental parameter relating to a request of a second environmental model in accordance with progress of the machine learning to a generating unit which is configured to generate the first response information and second response information, and 

However, Luo teaches: 

wherein the communication unit is further configured to transmit a second environmental parameter relating to a request of a second environmental model in accordance with progress of the machine learning to a generating unit which is configured to generate the first response information and second response information, and (Page 117-118, Section 2.2: “The related environmental information are selected from the terrain field, atmosphere field as well as civilian field. With these data, we can modeling the environmental model, such as the road trajectory and the rules of highway or the climate changing. About the intelligent vehicle system, various component models of each intelligent vehicle should be modeled involving the corresponding environmental factors, like vehicle longitudinal/lateral dynamic model, tire/road model, sensor model, etc.” teaches generating an environmental model to virtually model the natural environment and environmental factors; Page 118, Section 3.1: “Some components of models or controllers are defined as the receivers of VSD so as to making them being a function of environmental conditions (climate and terrain). With regard to the transmitters of VSD, they are set in the virtual world to provide real-time NE information or other vehicles’ motion data. In terms of Fig. 2, the transmitters and receivers defined here could be seen as the part “Environmental model” and the part “Component model” respectively.” and Page 118, Section 3.2: “The automatic vehicle should be intelligent enough to react to various changes like the changes of NE conditions or traffic conditions, etc. Concerning the NE conditions, we have considered two kinds of transmitter: firstly the climate VSD transmitter, which contain weather/road condition, temperature, air density and wind speed; secondly the terrain VSD transmitter, which contain slope, road type (e.g. straight or curved road) and road curvature.” teaches that the environmental model transmits virtual sensor data about the weather, road, temperature, air density, wind speed, road type, and road type to receivers in the intelligent vehicle system (receive environmental parameters); Page 118, Section 2.2: “And the effects from natural environment can act on the component models and further extend this influence to the vehicles’ behaviors, such as lane following, lane changing, overtaking and so on.” and Page 118, Section 3.2: “In IVVR platform, with the above VSD transmitters, users can observe not only real-time vehicle dynamic reaction under different NE conditions but also some artificial intelligent vehicle behaviors which make the simulation more reasonable. For example, if NE situation changes, the virtual vehicles would detect these changes, and the transmitters would send out the corresponding signals, then, certain receivers would receive the signals and chose to decelerate or accelerate the velocity to assure the comfortableness of driver or to keep a safe distance from the preceding vehicle” teaches that the vehicles behavior is changed (generate response information) depending on the environmental model and data generated from the VSD transmitters (environmental parameters))

Martinez-Marin and Luo are analogous art because they are directed to autonomous vehicle systems. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the environmental model of Martinez-Marin to incorporate the multiple environmental models and virtual sensor data of Luo with a motivation to improve autonomous vehicle driving and reduce collisions with adverse weather (Luo, Page 116).

Claim 7,
The combination of Martinez-Marin and Luo teaches: 
The information processing apparatus according to claim 6, 
Luo further teaches:
wherein the communication unit transmits the second environmental parameter in accordance with a result of the machine learning to the generating unit which generates the first response information and the second response information. (Page 117-118, Section 2.2: “The related environmental information are selected from the terrain field, atmosphere field as well as civilian field. With these data, we can modeling the environmental model, such as the road trajectory and the rules of highway or the climate changing. About the intelligent vehicle system, various component models of each intelligent vehicle should be modeled involving the corresponding environmental factors, like vehicle longitudinal/lateral dynamic model, tire/road model, sensor model, etc.” teaches generating an environmental model to virtually model the natural environment and environmental factors; Page 118, Section 3.1: “Some components of models or controllers are defined as the receivers of VSD so as to making them being a function of environmental conditions (climate and terrain). With regard to the transmitters of VSD, they are set in the virtual world to provide real-time NE information or other vehicles’ motion data. In terms of Fig. 2, the transmitters and receivers defined here could be seen as the part “Environmental model” and the part “Component model” respectively.” and Page 118, Section 3.2: “The automatic vehicle should be intelligent enough to react to various changes like the changes of NE conditions or traffic conditions, etc. Concerning the NE conditions, we have considered two kinds of transmitter: firstly the climate VSD transmitter, which contain weather/road condition, temperature, air density and wind speed; secondly the terrain VSD transmitter, which contain slope, road type (e.g. straight or curved road) and road curvature.” teaches that the environmental model transmits virtual sensor data about the weather, road, temperature, air density, wind speed, road type, and road type to receivers in the intelligent vehicle system (receive environmental parameters); Page 118, Section 2.2: “And the effects from natural environment can act on the component models and further extend this influence to the vehicles’ behaviors, such as lane following, lane changing, overtaking and so on.” and Page 118, Section 3.2: “In IVVR platform, with the above VSD transmitters, users can observe not only real-time vehicle dynamic reaction under different NE conditions but also some artificial intelligent vehicle behaviors which make the simulation more reasonable. For example, if NE situation changes, the virtual vehicles would detect these changes, and the transmitters would send out the corresponding signals, then, certain receivers would receive the signals and chose to decelerate or accelerate the velocity to assure the comfortableness of driver or to keep a safe distance from the preceding vehicle” teaches that the vehicles behavior is changed (generate response information) depending on the environmental model and data generated from the VSD transmitters (environmental parameters))

The combination of claim 6 has already incorporated the virtual sensor data, therefore has incorporated the details of the environmental parameters and response information required by claim 7. 

Claim 8,
The combination of Martinez-Marin and Luo teaches: 
The information processing apparatus according to claim 6, 
Martinez-Marin further teaches:
wherein the communication unit receives a reward parameter relating to the machine learning. (Table 1 and Page 874: “The RL algorithm deals with several data structures for organizing the available information and storing the partial results of the learning process. These structures are the following: Q(s, a): is the Q-value table where the accumulated reward for the (s, a)-pair is saved. From this table, the optimal policy is obtained according to 5” teaches that the reinforcement learning algorithm has a reward parameter (reward parameter relating to the machine learning); Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle uses RL controllers (learning unit) for reinforcement learning that are processed by the microcontroller, therefore the reward parameter is sent to the RL controllers)

Claim 9,
The combination of Martinez-Marin and Luo teaches: 
The information processing apparatus according to claim 6, 
Martinez-Marin further teaches:
wherein the communication unit receives expert information relating to the machine learning. (Page 874: “The RL algorithm that implements the concepts described above is presented in Fig. 2(see [11] for notation details). The state is represented in the algorithm by a real valued vector x, which is converted to the discrete state s (integer index) by the function cell(). In our experiments, uniform discretization was used with 41 cells per variable (see Table I for full details of the RL parameters). The function Dk−adjoining() is used to determine whether the adjoining property has been satisfied. The index s is used to update the Q-table, and x is used to update the function M. Since the controller uses noisy data from an image sensor, the function M() estimates the state of the system by filtering before storing it. In our experiments an average filter was used. For the wall following behaviour the aim of the controller is to move the vehicle from any initial position inside the region of interest to a goal position (an objective distance from the wall) through a minimum-time trajectory. A trial finishes when the vehicle moves outside of the state space (sink cell) or when it enters in the goal. Then, the function reactive() moves the vehicle in the opposite direction, until some starting position inside the state space is reached” teaches that the reinforcement learning algorithm is used to create a model that selects actions for controlling the autonomous vehicle (receiving expert information related to autonomous vehicle control from the model))

Claim 10,
The combination of Martinez-Marin and Luo teaches: 
The information processing apparatus according to claim 8, 
Martinez-Marin further teaches:
wherein the control target is a vehicle, and wherein the reward parameter includes at least one of parameters relating to a distance to a destination, ride quality, a number of times of contact, infringement on a traffic rule, or fuel consumption. (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the control target is an autonomous vehicle; Table 1 teaches that reward parameter is determined based on if the vehicle’s state is at a sink cell or goal; Page 874: “A trial finishes when the vehicle moves outside of the state space (sink cell) or when it enters in the goal. Then, the function reactive() moves the vehicle in the opposite direction, until some starting position inside the state space is reached.” teaches that a vehicle is in a sink cell if it moves outside of the state space; Page 874: “In this way, the path planning can be improved using a control architecture with two levels of abstraction. On the other hand, the vehicle can reach any position and orientation using the virtual wall concept. A virtual wall is created by manipulating the vehicle sensors in such a way that although the RL controller remain unchanged, the vehicle can navigate on a free space without any physical guide.” teaches that the vehicle navigates the space based on created virtual walls, therefore a sink space occurs when the vehicle contacts a virtual wall (moves outside state space) and the reward parameter is based on contact with the virtual wall (number of times of contact))

    PNG
    media_image1.png
    342
    554
    media_image1.png
    Greyscale



Claim(s) 11-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martinez-Marin in view of Levinson et al. (“Towards fully autonomous driving: Systems and algorithms”).

Claim 11,
Martinez-Marin teaches:
An information processing apparatus comprising: 
an environment acquiring unit configured to acquire an environmental parameter relating to an environment state, (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches using a MPC555 microcontroller (information processing apparatus); Page 874: “The new algorithm has been implemented as a model-based reinforcement learning method, where the back-ups are made by simulation following the shortest path search backward in time, starting from the goal state. In direct reinforcement learning, the back-ups are only made by experimentation, which is suitable when the back-up time for experimentation compared with simulation is not very high. In general, model-based reinforcement learning find better trajectories and manages changes in the environment (e. g. obstacles) and the goals more efficiently than direct reinforcement learning.” teaches using a model-based reinforcement learning method to find better trajectories for navigating the autonomous vehicle in the environment (environmental model); Table 1 and Page 873: “Although the state space of the system is three-dimensional, some vehicle behaviours, such as wall following, can be specified in a two-dimensional state space. In this case, the problem is simplified by fixing the value of vT . Then, the task is reduced to control the orientation of the vehicle (ς) in a two-dimensional state space (d, β), where d is the distance between the vehicle and the wall, and β is the relative orientation of the vehicle with respect to the wall.” teaches obtaining parameters (environmental parameters) for the reinforcement learning algorithm to navigate the vehicle in the environment)

a determining unit configured to perform estimation based on the environmental parameter… (Fig. 2 and Page 874: “The RL algorithm that implements the concepts described above is presented in Fig. 2(see [11] for notation details). The state is represented in the algorithm by a real valued vector x, which is converted to the discrete state s (integer index) by the function cell(). In our experiments, uniform discretization was used with 41 cells per variable (see Table I for full details of the RL parameters).” teaches performing reinforcement learning (performing estimation) with the parameters (environmental parameters); Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the reinforcement learning is performed by the RL controllers (determining units))

a transmitting unit configured to transmit the environmental parameter when the determining unit determines that the environment state is the unlearned environment state, (Page 874: “The vehicle should move forward and backward to avoid possible obstacles in the way. For that reason, during the training phase two RL controller are built: one for forward motion and the other for backward motion. Thus, employing both controllers the vehicle is able to avoid obstacles and turn around (see experimental results in Fig. 4) in a natural manner. This controller is suitable to explore unknown environments creating a vehicle mode and a map which later can use a three-dimensional RL controller or a CACM controller [13].” teaches that the reinforcement learning controllers are used to guide the autonomous vehicle in unknown environments (unlearned environment state); Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches using a MPC555 microcontroller to process the RL controllers, therefore the reinforcement learning parameters (environment parameters) are transmitted to the microcontroller)

	wherein the environment acquiring unit, the determining unit, and the transmitting unit are each implemented via at least one processor. (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches a computer based implementation with a MPC555 microcontroller)

Martinez-Marin does not appear to explicitly teach: 
…and determine whether or not the environment state is an unlearned environment state; and

However, Levinson teaches: 
…and determine whether or not the environment state is an unlearned environment state; and (Page 164: “Our approach yields substantial improvements over previous work in vehicle localization, including higher precision, the ability to learn and improve maps over time, and increased robustness to environment changes and dynamic obstacles. Specifically, we model the environment, instead of as a spatial grid of fixed infrared remittance values, as a probabilistic grid whereby every cell is represented as its own gaussian distribution over remittance values; see Figure 3. Subsequently, Bayesian inference is able to preferentially weight parts of the map most likely to be stationary and of consistent angular reflectivity, thereby reducing uncertainty and catastrophic errors. Furthermore, by using offline SLAM to align multiple passes of the same environment[5], possibly separated in time by days or even months, it is possible to build an increasingly robust understanding of the world that can be then exploited for localization.” teaches building a model of the environment using a probabilistic grid if the environment is unknown to the autonomous vehicle system (determining if the environment is unlearned))

Martinez-Marin and Levinson are analogous art because they are directed to autonomous vehicle systems. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Levinson’s sensors to create a ground map for localization with the autonomous vehicle of Martinez-Marin with a motivation to increase localization accuracy in dynamic environments (Levinson, Page 165)

Claim 12,
The combination of Martinez-Marin and Levinson teaches: 
The information processing apparatus according to claim 11,

Martinez-Marin further teaches: 
further comprising: a sensor information acquiring unit configured to acquire sensor information from one or more sensors, (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle includes infrared sensors, a laser scanner, and a CMOS camera (one or more sensors))
wherein the transmitting unit transmits the sensor information, and (Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the microcontroller processes the sensor data, therefore the sensor information is transmitted to the microcontroller)
wherein the sensor information acquiring unit is implemented via at least one processor. (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches a computer based implementation with a MPC555 microcontroller)

Claim 13,
The combination of Martinez-Marin and Levinson teaches: 
The information processing apparatus according to claim 11,

Martinez-Marin further teaches: 
further comprising: a control information acquiring unit configured to acquire control information relating to control of a control target, (Page 872: “The state space formulation [3] of the vehicle model we will use is the following… where vT is the translational velocity and ς is the steering angle of the vehicle. The distance between the reference point (x, y) and the middle point of the driving  wheels is 0.32 m. The orientation of the car is denoted by θ. The two control variables of a car are the velocity vT of the driving wheels and the steering angle ς.” teaches acquiring parameters related to the orientation of the car, velocity, and steering angle that are used to control the autonomous vehicle (control information relating to control of control target))

wherein the transmitting unit transmits data relating to the control information, and (Table 1 teaches that parameters used in the reinforcement learning algorithm include the velocity and steering angle of the autonomous vehicle, therefore these parameters (control information) is sent to the RL controllers that perform reinforcement learning)

    PNG
    media_image1.png
    342
    554
    media_image1.png
    Greyscale

wherein the control information acquiring unit is implemented via at least one processor. (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches a computer based implementation with a MPC555 microcontroller)

Claim 14,
The combination of Martinez-Marin and Levinson teaches: 
The information processing apparatus according to claim 11,

Martinez-Marin further teaches: 
wherein the transmitting unit transmits a reward parameter relating to control of learning of the control target. (Table 1 and Page 874: “The RL algorithm deals with several data structures for organizing the available information and storing the partial results of the learning process. These structures are the following: Q(s, a): is the Q-value table where the accumulated reward for the (s, a)-pair is saved. From this table, the optimal policy is obtained according to 5” teaches that the reinforcement learning algorithm has a reward parameter (reward parameter relating to the machine learning); Page 872: “The vehicle is equipped with an array of infrared sensors, a laser scanner and a CMOS camera; although the image sensor has not been employed in this application. The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle uses RL controllers (learning unit) for reinforcement learning that are processed by the microcontroller, therefore the reward parameter is sent to the RL controllers)


Claim(s) 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martinez-Marin in view of Levinson, further in view of Ross et al. (US 20160334229 A1).
Claim 15,
The combination of Martinez-Marin and Levinson teaches: 
The information processing apparatus according to claim 11,

The combination of Martinez-Marin and Levinson does not appear to explicitly teach: 
wherein, when the determining unit determines that the environment state has not been learned, the determining unit generates notification data based on the determination, and wherein the transmitting unit is further configured to transmit the notification data.

However, Ross teaches: 
wherein, when the determining unit determines that the environment state has not been learned, the determining unit generates notification data based on the determination, and (Para [0086]: “The autonomous vehicle 101 can implement or configure the sensor analysis component 110 to generate one or more types of alerts 413 when the analysis of the sensor profile sets 95 identify (i) an unknown or unexpected object or condition in the path of the vehicle (e.g., long range camera detects a bag in road, but the image processing does not recognize the bag or distinguish the bag from rock or solid object), and/or (ii) a relatively known object or condition which may require a response for which the outcome is sufficiently uncertain (e.g., emergency vehicle in road, response to pull over on shoulder uncertain given environmental or event conditions). The alerts 413 can specify or trigger a request for assistance.” teaches generating an alert (notification) that the autonomous vehicle detects an unknown condition in the path of the vehicle (environment state has not been learned))

wherein the transmitting unit is further configured to transmit the notification data. (Para [0086]: “The autonomous vehicle 101 can implement or configure the sensor analysis component 110 to generate one or more types of alerts 413 when the analysis of the sensor profile sets 95 identify (i) an unknown or unexpected object or condition in the path of the vehicle (e.g., long range camera detects a bag in road, but the image processing does not recognize the bag or distinguish the bag from rock or solid object), and/or (ii) a relatively known object or condition which may require a response for which the outcome is sufficiently uncertain (e.g., emergency vehicle in road, response to pull over on shoulder uncertain given environmental or event conditions). The alerts 413 can specify or trigger a request for assistance.” teaches sending the alert (transmitting notification data))

Martinez-Marin, Levinson, and Ross are analogous art because they are directed to autonomous vehicle systems. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Ross’ alert system for autonomous vehicles with the autonomous vehicle of Martinez-Marin/Levinson with a motivation to allow a human operator to provide rapid and appropriate input (Ross , Para [0086])

Claim(s) 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martinez-Marin in view of Levinson, further in view of Zhifei et al. (“A review of inverse reinforcement learning theory and recent advances”).

Claim 19,
The combination of Martinez-Marin and Levinson teaches: 
The information processing apparatus according to claim 11,

Martinez-Marin further teaches: 
further comprising: an acquiring unit configured to acquire control information acquired from a control target, (Page 872: “The state space formulation [3] of the vehicle model we will use is the following… where vT is the translational velocity and ς is the steering angle of the vehicle. The distance between the reference point (x, y) and the middle point of the driving  wheels is 0.32 m. The orientation of the car is denoted by θ. The two control variables of a car are the velocity vT of the driving wheels and the steering angle ς.” teaches acquiring parameters related to the orientation of the car, velocity, and steering angle that are used to control the autonomous vehicle (control information relating to control of control target))
wherein the transmitting unit is further configured to transmit the control information to a learning unit which is configured to perform… based on a result of determination by the determining unit, and (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle uses reinforcement learning controllers (learning unit that performs machine learning) and that the microcontroller processes data from all sensors and the RL controllers (response information from the microcontroller and environmental parameters is sent to the RL controller); Page 874: “The vehicle should move forward and backward to avoid possible obstacles in the way. For that reason, during the training phase two RL controller are built: one for forward motion and the other for backward motion. Thus, employing both controllers the vehicle is able to avoid obstacles and turn around (see experimental results in Fig. 4) in a natural manner.” teaches that the RL controllers (learning unit) are used to control navigation of the autonomous vehicle (control target))
wherein the determining unit is implemented via at least one processor. (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches a computer based implementation with a MPC555 microcontroller)


Levinson further teaches: 
wherein the determining unit is further configured to determine whether or not a person who controls the control target belongs to a predetermined attribute, (Page 163: “Our research vehicle is a 2006 Volkswagen Passat wagon (Figure 1). This remains an ideal platform, providing plenty of space for equipment and people, as well as featuring an electronically actuated throttle, shifter, parking brake, and steering system. An interface box designed in collaboration with VW provides software control over these functions as well as brake pressure and turn signals, in addition to the ability to fall back to human control of the vehicle during a software or power failure, or by manual takeover.” teaches that a human can control the autonomous vehicle (control target) in case a software or power failure occurs (being in a car with a failure is a predetermined attribute of a person))
Martinez-Marin and Levinson are analogous art because they are directed to autonomous vehicle systems. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to replace the autonomous vehicle of Martinez-Marin with the autonomous vehicle of Levinson with a motivation to allow a safety driver to control the vehicle when it encounters unexpected events (Levinson, Page 168)

The combination of Martinez-Marin and Levinson does not appear to explicitly teach: 
configured to perform inverse reinforcement learning…

However, Zhifei teaches: 
configured to perform inverse reinforcement learning… (Page 4: “The most notable series of applications in IRL field is the autonomous helicopter aerobatic demonstrations carried out by Stanford University. They all used apprenticeship IRL theory discussed before to find the trade-off among features of the reward function through expert’s demonstrations. In [17], a set of helicopter aerobatic maneuvers were demonstrated, including flip, roll, tail-in funnel and nose-in funnel. By learning a desired trajectory from a number of sub-optimal demonstrations, the helicopter performance has been greatly increased, and some new aerobatic maneuvers were performed [18]” teaches performing inverse reinforcement learning with an autonomous helicopter (autonomous vehicle))

Martinez-Marin, Levinson, and Zhifei are analogous art because they are directed to autonomous vehicles. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to replace the reinforcement learning algorithm of Martinez-Marin/ Levinson with an inverse reinforcement learning algorithm of Zhifei with a motivation to derive a reward function from expert demonstrations, without having to specify the reward in advance (Zhifei, Page 1)

 Claim 20,
Martinez-Marin teaches: 
The information processing apparatus according to claim 6,

Martinez-Marin further teaches: 
wherein the communication unit is further configured to receive control information acquired from the control target, (Page 872: “The vehicle is autonomous, using a microcontroller MPC555 to process all sensors and the RL controllers.” teaches that the autonomous vehicle uses reinforcement learning controllers (learning unit that performs machine learning) and that the microcontroller processes data from all sensors and the RL controllers (response information from the microcontroller and environmental parameters is sent to the RL controller); Page 874: “The vehicle should move forward and backward to avoid possible obstacles in the way. For that reason, during the training phase two RL controller are built: one for forward motion and the other for backward motion. Thus, employing both controllers the vehicle is able to avoid obstacles and turn around (see experimental results in Fig. 4) in a natural manner.” teaches that the RL controllers (learning unit) are used to control navigation of the autonomous vehicle (control target))

Martinez-Marin does not appear to explicitly teach: 

a determining unit configured to determine whether or not a person who controls the control target belongs to a predetermined attribute,
wherein the learning unit performs the machine learning by performing inverse reinforcement learning using the control information relating to the person who controls the control target and who is determined to belong to the predetermined attribute, and 

However, Levinson teaches: 
a determining unit configured to determine whether or not a person who controls the control target belongs to a predetermined attribute, (Page 163: “Our research vehicle is a 2006 Volkswagen Passat wagon (Figure 1). This remains an ideal platform, providing plenty of space for equipment and people, as well as featuring an electronically actuated throttle, shifter, parking brake, and steering system. An interface box designed in collaboration with VW provides software control over these functions as well as brake pressure and turn signals, in addition to the ability to fall back to human control of the vehicle during a software or power failure, or by manual takeover.” teaches that a human can control the autonomous vehicle (control target) in case a software or power failure occurs (being in a car with a failure is a predetermined attribute of a person))
wherein the determining unit is implemented via at least one processor. (Page 163: “An interface box designed in collaboration with VW provides software control over these functions as well as brake pressure and turn signals, in addition to the ability to fall back to human control of the vehicle during a software or power failure, or by manual takeover.” teaches using software control, which suggest a computer-based implementation)

Martinez-Marin and Levinson are analogous art because they are directed to autonomous vehicle systems. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to replace the autonomous vehicle of Martinez-Marin with the autonomous vehicle of Levinson with a motivation to allow a safety driver to control the vehicle when it encounters unexpected events (Levinson, Page 168)

The combination of Martinez-Marin and Levinson does not appear to explicitly teach: 
wherein the learning unit performs the machine learning by performing inverse reinforcement learning using the control information relating to the person who controls the control target and who is determined to belong to the predetermined attribute, and

However, Zhifei teaches: 
wherein the learning unit performs the machine learning by performing inverse reinforcement learning using the control information relating to the person who controls the control target and who is determined to belong to the predetermined attribute, and (Page 4: “The most notable series of applications in IRL field is the autonomous helicopter aerobatic demonstrations carried out by Stanford University. They all used apprenticeship IRL theory discussed before to find the trade-off among features of the reward function through expert’s demonstrations. In [17], a set of helicopter aerobatic maneuvers were demonstrated, including flip, roll, tail-in funnel and nose-in funnel. By learning a desired trajectory from a number of sub-optimal demonstrations, the helicopter performance has been greatly increased, and some new aerobatic maneuvers were performed [18]” teaches performing inverse reinforcement learning with an autonomous helicopter (autonomous vehicle))

Martinez-Marin, Levinson, and Zhifei are analogous art because they are directed to autonomous vehicles. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to replace the reinforcement learning algorithm of Martinez-Marin/ Levinson with an inverse reinforcement learning algorithm of Zhifei with a motivation to derive a reward function from expert demonstrations, without having to specify the reward in advance (Zhifei, Page 1)

Response to Arguments
Regarding Objection to the Specification: 
Applicant’s argument: 
“In reply to the objection to the specification for alleged informalities, Applicant respectfully requests reconsideration. 
Applicant hereby amends paragraph [0038] of the specification in the manner suggested by the Examiner on pages 2 and 3 of the Office Action. 
Accordingly, Applicant respectfully requests reconsideration and withdrawal of the objection to the specification.”

Response: 
The Specification was not objected to in the previous office action. The use of the term “Wi-Fi” was noted in Para [0083] as being a trade mark. 

Regarding Claim interpretation under 35 U.S.C. 112(f): 
Applicant’s argument:
“Without acquiescing to the Examiner's interpretation of the previous claim recitations, Applicant hereby amends the claims in order to more clearly avoid any invocation of 35 U.S.C §112(f).”

Response: 
The 35 U.S.C. 112(f) claim interpretation applied to claims 1, 2, 6-9, 11-17, and 20 have been withdrawn due to amendments to these claims. However, the 112(f) interpretation applied to claim 19 has been maintained. Please see pages 3-4 of this office action for more information. 

Regarding Claim rejections under 35 U.S.C. 102: 
Applicant’s argument:
“However, there is nothing in Martinez-Marin that would fairly teach or suggest the elements presently recited in relation to receiving a second environmental parameter relating to a request of a second environmental model in accordance with progress of machine learning, along with generating second response information in the second environmental model, let alone in the particularly claimed manner. As such, it is simply not possible for Martinez-Marin to satisfy the elements recited by amended independent claim 1.”

Response: 
Luo has been relied upon to teach the following limitations of claim 1, as necessitated by amendments: 
“receive a second environmental parameter relating to a request of a second environmental model in accordance with progress of the machine learning,
wherein the generating unit is further configured to generate second response information in the second environmental model generated based on the second environmental parameter, and”

Please see pages 12-14 of this office action for more information. 

Regarding Claim rejections under 35 U.S.C. 103: 
Applicant’s argument: 
“The Examiner alleges that a combination of Martinez-Marin and Levinson teaches every element previously recited by independent claim 11. Office Action 9-13 (citing Martinez-Marin FIG. 2, pages 872-875). Martinez-Marin fails to teach or suggest the elements of independent claim 11 for reasons similar to those discussed above with respect to independent claim 1.”

Response: 
Applicant’s arguments have been considered but are not persuasive. Applicant’s arguments are conclusory and do not provide detail as to why the combination of Martinez-Marin and Levinson does not teach the limitations of claim 11. Further, independent claim 11 contains different limitations from independent claim 1, therefore it is unclear why “Martinez-Marin fails to teach or suggest the elements of independent claim 11 for reasons similar to those discussed above with respect to independent claim 1”. As noted in the office action, the combination of Martinez-Marin and Levinson teaches the limitations of claim 11. Please see pages 25-29 of this office action for a detailed analysis of claim 11. 


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOUN ABRAHAM whose telephone number is (571)272-8144. The examiner can normally be reached Mon - Fri 08:00-16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.J.A./Examiner, Art Unit 2125                 
                                                                                                                                                                                       /KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125