DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 1, 8, 11, and 15 are objected to because of the following informalities: 
As per Claim 1: The phrase “at least one of” is used in the limitation explaining the source of where feedback is received; stating “at least one of a user, a second device, and environmental sensors.” The phrase “at least one of” signifies a selection of one from the three options is chosen however the use of “and” is suggesting the opposite. The word “and” should be changed to “or.”
As per Claim 8: The phrase “at least one of” is used in the limitation explaining the source of where feedback is received; stating “at least one of a user, a second device, and environmental sensors.” The phrase “at least one of” signifies a selection of one from the three options is chosen however the use of “and” is suggesting the opposite. The word “and” should be changed to “or.”
As per Claim 11: the claim reads “wherein to instruct the remotely controlled device to perform the second series of actions, the system is further enable to.” The word “enable” should read “enabled.”
As per Claim 15: The phrase “at least one of” is used in the limitation explaining the source of where feedback is received; stating “at least one of a user, a second device, and environmental sensors.” The phrase “at least one of” signifies a selection of one 
Appropriate correction is required.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 8, and 15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhou (US 20190004518 A1).
As per claim 1: Zhou teaches the following;
A method, comprising: in response to receiving a command for a remotely controlled device to perform a behavior, [See at least Paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." And Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]
monitoring a first series of actions performed by the remotely controlled device that comprise the behavior; [ See at least Paragraph 0027: “training the deep neural network model with training samples obtained from a simulated environment until a condition of minimizing a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information is reached.” Also see paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information.” And Paragraph 0130: “It needs to be appreciated that regarding the aforesaid method embodiments, for ease of description, the aforesaid method embodiments are all described as a combination of a series of actions” ]
receiving feedback related to how the remotely controlled device performs the behavior, wherein the feedback is received from at least one of a user, a second device, and environmental sensors; [See at least Paragraph 0110-0113: “[0110] controlling the unmanned aerial vehicle to fly in the actual environment and obtaining training data in the actual environment; comprising: [0111] in the actual environment, regarding the sensor data and continuous random target state information of the unmanned aerial vehicle as input of the deep neural network which is already trained in the simulated environment, the deep neural network outputting corresponding control information; [0112] according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback; [0113] regarding the sensor data, the target state information and the control information as a group of training samples.” ] 
updating, according to the feedback, a machine learning model used by the remotely controlled device [See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” And Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.”] Under broadest reasonable interpretation, a second series of actions to perform a behavior is already the product of a machine learning model after its iterative learning process is completed, where it compares the difference between a calculated behavior and the measured actual behavior. Subsequently the produced second series of actions is thereby a different variation from the first series of actions.  
to produce a second, different series of actions to perform the behavior; [See at least paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ] From the above citation the neural network undergoes an iterative (“fine tuning”) learning process which under broadest reasonable interpretation, “fine-tuning” is interpreted as being changing the intensity of an action by replacing an old action with a new action. Therefor the “changed action” is interpreted as removing an old action and implementing a new intensity action; resulting in the modified control module to control the vehicle to behave differently than before. 
and in response to receiving a subsequent command to perform the behavior, instructing the remotely controlled device to perform the second series of actions. [ See at least paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." And Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]

As per Claim 8: Zhou teaches the following;
A system, comprising: a processor; and a memory, including instructions that when performed by the processor enable the system to: in response to receiving a command for a remotely controlled device to perform a behavior, [ See at least Paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." and Paragraph 0205: “The processing unit executes functions and/or methods in embodiments described in the present disclosure by running programs stored in the system memory.” and Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]
monitor a first series of actions performed by the remotely controlled device that comprise the behavior; [ See at least Paragraph 0027: “training the deep neural network model with training samples obtained from a simulated environment until a condition of minimizing a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information is reached.” Also see paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information.” And Paragraph 0130: “It needs to be appreciated that regarding the aforesaid method embodiments, for ease of description, the aforesaid method embodiments are all described as a combination of a series of actions” ] 
receive feedback related to how the remotely controlled device performs the behavior, wherein the feedback is received from at least one of a user, a second device, and environmental sensors; [ See at least Paragraph 0110-0113: “[0110] controlling the unmanned aerial vehicle to fly in the actual environment and obtaining training data in the actual environment; comprising: [0111] in the actual environment, regarding the sensor data and continuous random target state information of the unmanned aerial vehicle as input of the deep neural network which is already trained in the simulated environment, the deep neural network outputting corresponding control information; [0112] according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback; [0113] regarding the sensor data, the target state information and the control information as a group of training samples.” ]
update, according to the feedback, a machine learning model used by the remotely controlled device [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” And Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ] Under broadest reasonable interpretation, a second series of actions to perform a behavior is already the product of a machine learning model after its iterative learning process is completed, where it compares the difference between a calculated behavior and the measured actual behavior. Subsequently the produced second series of actions is thereby a different variation from the first series of actions.
to produce a second, different series of actions to perform the behavior; [ See at least paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ] Under broadest reasonable interpretation, “a second, different series of actions” is interpreted as being a set of vehicle actions different from the first set. From the above citation, the neural network undergoes an iterative (“fine tuning”) learning process, resulting in the modified control module to control the vehicle to behave differently than before.
and in response to receiving a subsequent command to perform the behavior, instruct the remotely controlled device to perform the second series of actions. [ See at least paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." And Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]
As per claim 15: Zhou teaches the following;
A non-transitory computer-readable medium containing computer program code that, when executed by operation of one or more computer processors, performs an operation comprising: in response to receiving a command for a remotely controlled device to perform a behavior, [ See at least Paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." and Paragraph 0205: “The processing unit executes functions and/or methods in embodiments described in the present disclosure by running programs stored in the system memory.” and Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]
monitoring a first series of actions performed by the remotely controlled device that comprise the behavior; [ See at least Paragraph 0027: “training the deep neural network model with training samples obtained from a simulated environment until a condition of minimizing a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information is reached.” Also see paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information.” And Paragraph 0130: “It needs to be appreciated that regarding the aforesaid method embodiments, for ease of description, the aforesaid method embodiments are all described as a combination of a series of actions” 
receiving feedback related to how the remotely controlled device performs the behavior, wherein the feedback is received from at least one of a user, a second device, and environmental sensors; See at least Paragraph 0110-0113: “[0110] controlling the unmanned aerial vehicle to fly in the actual environment and obtaining training data in the actual environment; comprising: [0111] in the actual environment, regarding the sensor data and continuous random target state information of the unmanned aerial vehicle as input of the deep neural network which is already trained in the simulated environment, the deep neural network outputting corresponding control information; [0112] according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback; [0113] regarding the sensor data, the target state information and the control information as a group of training samples.” ]
updating, according to the feedback, a machine learning model used by the remotely controlled device [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” And Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.”] Under broadest reasonable interpretation, a second series of actions to perform a behavior is already the product of a machine learning model after its iterative learning process is completed, where it compares the difference between a calculated behavior and the measured actual behavior. Subsequently the produced second series of actions is thereby a different variation from the first series of actions.  
to produce a second, different series of actions to perform the behavior; [See at least paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ] Under broadest reasonable interpretation, “a second, different series of actions” is interpreted as being a set of vehicle actions different from the first set. From the above citation, the neural network undergoes an iterative (“fine tuning”) learning process, resulting in the modified control module to control the vehicle to behave differently than before. 
and in response to receiving a subsequent command to perform the behavior, instructing the remotely controlled device to perform the second series of actions. [ See at least paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." And Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]   Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (US 20190004518 A1), in view of Bertram (US 10032111 B1.) 
As per Claim 2: Zhou teaches all of the limitations of claim 1. Zhou further teaches the following;
wherein receiving the feedback refines the behavior in the machine learning model. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]
Zhou does not teach the device user defining the series of actions. Bertram however teaches;
wherein the command specifies a series of user-defined actions that comprise the first series of action to teach the behavior to the machine learning model,  [ See at least Paragraph 15: “In some embodiments, a system includes a machine learning engine. The machine learning engine is configured to receive training data including a plurality of first input conditions and a plurality of first response maneuvers associated with the first input conditions. The machine learning engine is configured to train a learning system using the training data to generate a second response maneuver based on a second input condition.” And Paragraph 24: “In some embodiments, the machine learning engine is configured to train the maneuver controller (or the learning system) to generate a behavior model, such as a pilot behavior model, a semantic model, or a mental model, which can be shared with or readily understood by a human operator. For example, when trained, the maneuver controller can be configured to receive commands at a semantic level of understanding from a human pilot. The semantic level of understanding may include high level or abstract commands, such as “follow the lead,” “perform a join up with lead,” or “land in that region.” Based on the commands received at the semantic level of understanding, the maneuver controller can determine an appropriate maneuver to perform to follow the command. ...” ] 
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include the user to input a set of first response maneuvers to the device. The modification results with autonomous vehicles that “can be improved to perform flight maneuvers in a manner consistent with how human pilots would respond to such conditions,” (Bertram Paragraph 17.)    
As per Claim 9: Zhou and Bertram teach all of the limitations of claim 8. Zhou further teaches the following;
wherein receiving the feedback refines the behavior in the machine learning model. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]
Zhou does not teach the device user defining the series of actions. Bertram however teaches;
wherein the command specifies a series of user-defined actions that comprise the first series of action to teach the behavior to the machine learning model,  [ See at least Paragraph 15: “In some embodiments, a system includes a machine learning engine. The machine learning engine is configured to receive training data including a plurality of first input conditions and a plurality of first response maneuvers associated with the first input conditions. The machine learning engine is configured to train a learning system using the training data to generate a second response maneuver based on a second input condition.” And Paragraph 24: “In some embodiments, the machine learning engine is configured to train the maneuver controller (or the learning system) to generate a behavior model, such as a pilot behavior model, a semantic model, or a mental model, which can be shared with or readily understood by a human operator. For example, when trained, the maneuver controller can be configured to receive commands at a semantic level of understanding from a human pilot. The semantic level of understanding may include high level or abstract commands, such as “follow the lead,” “perform a join up with lead,” or “land in that region.” Based on the commands received at the semantic level of understanding, the maneuver controller can determine an appropriate maneuver to perform to follow the command. ...” ] 
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include the user to input a set of first response maneuvers to the device.  The modification results with autonomous vehicles that “can be improved to perform flight maneuvers in a manner consistent with how human pilots would respond to such conditions,” (Bertram Paragraph 17.)  
As per Claim 16: Zhou and Bertram teach all of the limitations of claim 15. Zhou further teaches the following;
wherein receiving the feedback refines the behavior in the machine learning model. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]
Zhou does not teach the device user defining the series of actions. Bertram however teaches;
wherein the command specifies a series of user-defined actions that comprise the first series of action to teach the behavior to the machine learning model,  [ See at least Paragraph 15: “In some embodiments, a system includes a machine learning engine. The machine learning engine is configured to receive training data including a plurality of first input conditions and a plurality of first response maneuvers associated with the first input conditions. The machine learning engine is configured to train a learning system using the training data to generate a second response maneuver based on a second input condition.” And Paragraph 24: “In some embodiments, the machine learning engine is configured to train the maneuver controller (or the learning system) to generate a behavior model, such as a pilot behavior model, a semantic model, or a mental model, which can be shared with or readily understood by a human operator. For example, when trained, the maneuver controller can be configured to receive commands at a semantic level of understanding from a human pilot. The semantic level of understanding may include high level or abstract commands, such as “follow the lead,” “perform a join up with lead,” or “land in that region.” Based on the commands received at the semantic level of understanding, the maneuver controller can determine an appropriate maneuver to perform to follow the command. ...” ]
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include the user to input a set of first response maneuvers to the device. The modification results with autonomous vehicles that “can be improved to perform flight maneuvers in a manner consistent with how human pilots would respond to such conditions,” (Bertram Paragraph 17.)     

Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (US 20190004518 A1), in view of Redding (US 20180089563 A1.) 
As per Claim 4: Zhou teaches all of the limitations of claim 1 including instructing the remotely controlled device to perform the second series of actions. Zhou however does not teach identifying the behavior, and randomly selecting individual actions from a pool of actions. Redding teaches the following;
wherein instructing the remotely controlled device to perform the second series of actions further comprises: identifying the behavior to the remotely controlled device; [ See at least 0004: “The behavior planner may be configured to generate candidate sequences of conditional actions … The sequences may also be referred to as policies. An action may comprise, for example, an acceleration to a particular speed, a lane change, a deceleration to a particular speed, and so on…” and Paragraph 0038: “Only a selected subset of representative actions which meet certain criteria may be considered when generating the policies in the depicted embodiment. A prediction model may provide a probability distribution over some number of next-states, given the current state and the alternative actions being considered. Using such models, which may in some cases involve the use of machine learning techniques as discussed below, the behavior planner may generate a set of one or more policies ({action, state} sequences), evaluate them and provide at least a recommended subset of the policies to the motion selector in the depicted embodiment.” ]
and selecting, stochastically according to the machine learning model, individual actions to comprise the second series of actions from a pool of candidate actions. [ See at least Paragraph 0004: “The behavior planner may be configured to generate candidate sequences of conditional actions and associated anticipated state changes for the vehicle for some selected time horizons (e.g., on the order of tens of seconds, or a few minutes) in an iterative fashion, and provide at least some of the sequences generated during various planning iterations to the motion selector. The sequences may also be referred to as policies. An action may comprise, for example, an acceleration to a particular speed, a lane change, a deceleration to a particular speed, and so on.” And Paragraph 0038: “Only a selected subset of representative actions which meet certain criteria may be considered when generating the policies in the depicted embodiment. A prediction model may provide a probability distribution over some number of next-states, given the current state and the alternative actions being considered. Using such models, which may in some cases involve the use of machine learning techniques as discussed below, the behavior planner may generate a set of one or more policies ({action, state} sequences), evaluate them and provide at least a recommended subset of the policies to the motion selector in the depicted embodiment. “ ]
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include identifying a behavior and randomly selecting individual actions from a pool of actions. Selecting a random number of actions from a pool would allow the model to deal with a relatively small subset of alternatives, (similarly explained by Redding Paragraph 0006, line 7;) thus reducing system processing time.
As per Claim 11: Zhou teaches all of the limitations of claim 8 including instructing the remotely controlled device to perform the second series of actions. Zhou however does not teach identifying the behavior, and randomly selecting individual actions from a pool of actions. Redding teaches the following;
wherein to instruct the remotely controlled device to perform the second series of actions, the system is further enable to: identify the behavior to the remotely controlled device; [ See at least Paragraph 0004: “The behavior planner may be configured to generate candidate sequences of conditional actions and associated anticipated state changes for the vehicle for some selected time horizons (e.g., on the order of tens of seconds, or a few minutes) in an iterative fashion, and provide at least some of the sequences generated during various planning iterations to the motion selector. The sequences may also be referred to as policies. An action may comprise, for example, an acceleration to a particular speed, a lane change, a deceleration to a particular speed, and so on.” And Paragraph 0038: “Only a selected subset of representative actions which meet certain criteria may be considered when generating the policies in the depicted embodiment. A prediction model may provide a probability distribution over some number of next-states, given the current state and the alternative actions being considered. Using such models, which may in some cases involve the use of machine learning techniques as discussed below, the behavior planner may generate a set of one or more policies ({action, state} sequences), evaluate them and provide at least a recommended subset of the policies to the motion selector in the depicted embodiment. “ ]
and select, stochastically according to the machine learning model, individual actions to comprise the second series of actions from a pool of candidate actions. [ See at least Paragraph 0004: “The behavior planner may be configured to generate candidate sequences of conditional actions and associated anticipated state changes for the vehicle for some selected time horizons (e.g., on the order of tens of seconds, or a few minutes) in an iterative fashion, and provide at least some of the sequences generated during various planning iterations to the motion selector. The sequences may also be referred to as policies. An action may comprise, for example, an acceleration to a particular speed, a lane change, a deceleration to a particular speed, and so on.” And Paragraph 0038: “Only a selected subset of representative actions which meet certain criteria may be considered when generating the policies in the depicted embodiment. A prediction model may provide a probability distribution over some number of next-states, given the current state and the alternative actions being considered. Using such models, which may in some cases involve the use of machine learning techniques as discussed below, the behavior planner may generate a set of one or more policies ({action, state} sequences), evaluate them and provide at least a recommended subset of the policies to the motion selector in the depicted embodiment. “ ]
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include identifying a behavior and randomly selecting individual actions from a pool of actions. Selecting a random number of actions from a pool would allow the model to deal with a relatively small subset of alternatives, (similarly explained by Redding Paragraph 0006, line 7;) thus reducing system processing time.
As per Claim 18: Zhou teaches all of the limitations of claim 15 including instructing the remotely controlled device to perform the second series of actions. Zhou however does not teach identifying the behavior, and randomly selecting individual actions from a pool of actions. Redding teaches the following;
wherein instructing the remotely controlled device to perform the second series of actions further comprises: identifying the behavior to the remotely controlled device; [ See at least Paragraph 0004: “The behavior planner may be configured to generate candidate sequences of conditional actions and associated anticipated state changes for the vehicle for some selected time horizons (e.g., on the order of tens of seconds, or a few minutes) in an iterative fashion, and provide at least some of the sequences generated during various planning iterations to the motion selector. The sequences may also be referred to as policies. An action may comprise, for example, an acceleration to a particular speed, a lane change, a deceleration to a particular speed, and so on.” And Paragraph 0038: “Only a selected subset of representative actions which meet certain criteria may be considered when generating the policies in the depicted embodiment. A prediction model may provide a probability distribution over some number of next-states, given the current state and the alternative actions being considered. Using such models, which may in some cases involve the use of machine learning techniques as discussed below, the behavior planner may generate a set of one or more policies ({action, state} sequences), evaluate them and provide at least a recommended subset of the policies to the motion selector in the depicted embodiment. “ ]
and selecting, stochastically according to the machine learning model, individual actions to comprise the second series of actions from a pool of candidate actions. [ See at least Paragraph 0004: “The behavior planner may be configured to generate candidate sequences of conditional actions and associated anticipated state changes for the vehicle for some selected time horizons (e.g., on the order of tens of seconds, or a few minutes) in an iterative fashion, and provide at least some of the sequences generated during various planning iterations to the motion selector. The sequences may also be referred to as policies. An action may comprise, for example, an acceleration to a particular speed, a lane change, a deceleration to a particular speed, and so on.” And Paragraph 0038: “Only a selected subset of representative actions which meet certain criteria may be considered when generating the policies in the depicted embodiment. A prediction model may provide a probability distribution over some number of next-states, given the current state and the alternative actions being considered. Using such models, which may in some cases involve the use of machine learning techniques as discussed below, the behavior planner may generate a set of one or more policies ({action, state} sequences), evaluate them and provide at least a recommended subset of the policies to the motion selector in the depicted embodiment. “ ]
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include identifying a behavior and randomly selecting individual actions from a pool of actions. Selecting a random number of actions from a pool would allow the model to deal with a relatively small subset of alternatives, (similarly explained by Redding Paragraph 0006, line 7;) thus reducing system processing time.

Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (US 20190004518 A1), in view of Redding (US 20180089563 A1), in further view of Satou (US 20190056718 A1.) 
As per Claim 5: Zhou, and Redding teach all of the limitations of claim 4. They do not however teach assigning and altering weights to actions to determine selection probability. Satou however teaches;
wherein updating the machine learning model comprises altering weights assigned to candidate actions within the pool of candidate actions, wherein the weights determine a probability that the machine learning model selects an individual candidate action over other candidate actions from the pool of candidate actions. [See at least paragraph 67: “The algorithm according to this example is also known as Q-learning, which is a learning method in which a state s of an actor and an action a (an increase/reduction in the conveyance speed (which may include the conveyance direction) of the conveyance operation, an increase/reduction in the acceleration, modification of the attitude of the conveyance article, and so on) that can be selected by the actor in the state s are used as independent variables, and a function Q (s, a) representing the value of the action when an action a is selected in the state s is learned. The optimal solution is realized by selecting an action a that results in the highest possible value function Q in the state s. The Q-learning is started in a state where the correlation between the state s and the action a is unknown, and by repeatedly selecting various actions a in an arbitrary state s through trial and error, the value function Q is updated iteratively so as to approach the optimal solution. Here, when the environment (i.e. the state s) varies as a result of selecting an action a in the state s, a reward (i.e. a weighting applied to the action a) r corresponding to the variation is acquired, and by guiding the learning so as to select actions a with which higher rewards r are acquired, the value function Q can be brought close to the optimal solution in a comparatively short time.” ]
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include assigning weights to individual actions to determining a probability for selection. The method of picking actions with high assigned weights help the system “close to the optimal solution in a comparatively short time,” (Satou, Paragraph 0067.)
As per Claim 12: Zhou, and Redding teach all of the limitations of claim 11. They do not however teach assigning and altering weights to actions to determine selection probability. Satou however teaches;
wherein to update the machine learning model the system is further enabled to adjust weights assigned to candidate actions within the pool of candidate actions, wherein the weights determine a probability that the machine learning model selects an individual candidate action over other candidate actions from the pool of candidate actions. [See at least paragraph 67: “The algorithm according to this example is also known as Q-learning, which is a learning method in which a state s of an actor and an action a (an increase/reduction in the conveyance speed (which may include the conveyance direction) of the conveyance operation, an increase/reduction in the acceleration, modification of the attitude of the conveyance article, and so on) that can be selected by the actor in the state s are used as independent variables, and a function Q (s, a) representing the value of the action when an action a is selected in the state s is learned. The optimal solution is realized by selecting an action a that results in the highest possible value function Q in the state s. The Q-learning is started in a state where the correlation between the state s and the action a is unknown, and by repeatedly selecting various actions a in an arbitrary state s through trial and error, the value function Q is updated iteratively so as to approach the optimal solution. Here, when the environment (i.e. the state s) varies as a result of selecting an action a in the state s, a reward (i.e. a weighting applied to the action a) r corresponding to the variation is acquired, and by guiding the learning so as to select actions a with which higher rewards r are acquired, the value function Q can be brought close to the optimal solution in a comparatively short time.” ]
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include assigning weights to individual actions to determining a probability for selection. The method of picking actions with high assigned weights help the system “close to the optimal solution in a comparatively short time,” (Satou, Paragraph 0067.)  

As per Claim 19: Zhou, and Redding teach all of the limitations of claim 18. They do not however teach assigning and altering weights to actions to determine selection probability. Satou however teaches;
wherein updating the machine learning model comprises altering weights assigned to candidate actions within the pool of candidate actions, wherein the weights determine a probability that the machine learning model selects an individual candidate action over other candidate actions from the pool of candidate actions. [See at least paragraph 67: “The algorithm according to this example is also known as Q-learning, which is a learning method in which a state s of an actor and an action a (an increase/reduction in the conveyance speed (which may include the conveyance direction) of the conveyance operation, an increase/reduction in the acceleration, modification of the attitude of the conveyance article, and so on) that can be selected by the actor in the state s are used as independent variables, and a function Q (s, a) representing the value of the action when an action a is selected in the state s is learned. The optimal solution is realized by selecting an action a that results in the highest possible value function Q in the state s. The Q-learning is started in a state where the correlation between the state s and the action a is unknown, and by repeatedly selecting various actions a in an arbitrary state s through trial and error, the value function Q is updated iteratively so as to approach the optimal solution. Here, when the environment (i.e. the state s) varies as a result of selecting an action a in the state s, a reward (i.e. a weighting applied to the action a) r corresponding to the variation is acquired, and by guiding the learning so as to select actions a with which higher rewards r are acquired, the value function Q can be brought close to the optimal solution in a comparatively short time.” ]
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include assigning weights to individual actions to determining a probability for selection. The method of picking actions with high assigned weights help the system “close to the optimal solution in a comparatively short time,” (Satou, Paragraph 0067.)

Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (US 20190004518 A1), in view of Bertram (US 10032111 B1.)
As per Claim 3: Zhou and Bertram teach all of the limitations of claim 2. Zhou further teaches the following;
the feedback refines behavior in the machine learning model by one of: adding an additional action to the first series of actions; removing an indicated action from the first series of actions; and altering an intensity of a specified action within the first series of actions. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]  Similarly addressed in claim 1 by Zhou, from the above citation the neural network, using feedback, undergoes an iterative (“fine tuning”) learning process which under broadest reasonable interpretation, is interpreted as changing the intensity of an action by replacing an old action with a new action. Therefor the “changed action” is interpreted as removing an old action and implementing a new intensity action; resulting in the modified control module to control the vehicle to behave differently than before.   
As per Claim 10: Zhou and Bertram teach all of the limitations of claim 9. Zhou further teaches the following;
to refine the behavior in the machine learning model, the feedback instructs the system to: add an additional action to the first series of actions; remove an indicated action from the first series of actions; and alter an intensity of a specified action within the first series of actions. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]  Similarly addressed in claim 1 by Zhou, from the above citation the neural network, using feedback, undergoes an iterative (“fine tuning”) learning process which under broadest reasonable interpretation, is interpreted as changing the intensity of an action by replacing an old action with a new action. Therefor the “changed action” is interpreted as removing an old action and implementing a new intensity action; resulting in the modified control module to control the vehicle to behave differently than before.   
As per Claim 17: Zhou and Bertram teach all of the limitations of claim 16. Zhou further teaches the following;
the feedback refines behavior in the machine learning model by one of: adding an additional action to the first series of actions; removing an indicated action from the first series of actions; and altering an intensity of a specified action within the first series of actions. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and Paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]  Similarly addressed in claim 1 by Zhou, from the above citation the neural network, using feedback, undergoes an iterative (“fine tuning”) learning process which under broadest reasonable interpretation, is interpreted as changing the intensity of an action by replacing an old action with a new action. Therefor the “changed action” is interpreted as removing an old action and implementing a new intensity action; resulting in the modified control module to control the vehicle to behave differently than before.   

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (US 20190004518 A1), in view of Redding (US 20180089563 A1), in further view of Beckman (US 9442496 B1.) 
As per Claim 6: Zhou, and Redding teach all of the limitations of claim 4. Zhou further teaches the following limitations;
further comprising: monitoring a series of actions performed by a remotely controlled device that comprise the behavior; [ See at least Paragraph 0027: “training the deep neural network model with training samples obtained from a simulated environment until a condition of minimizing a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information is reached.” Also see paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information.” And Paragraph 0130: “It needs to be appreciated that regarding the aforesaid method embodiments, for ease of description, the aforesaid method embodiments are all described as a combination of a series of actions” ]
updating, according to the feedback, a machine learning model used by the remotely controlled device. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data,” And paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle,” And paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]
to produce a different subsequent series of actions to comprise the behavior, [See at least paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ] 
From the above citation the neural network undergoes an iterative (“fine tuning”) learning process which under broadest reasonable interpretation, “fine-tuning” is interpreted as being changing the intensity of an action by replacing an old action with a new action. Therefore the “changed action” is interpreted as removing an old action and implementing a new intensity action; resulting in the modified control module to control the vehicle to behave differently than before.
and in response to receiving the subsequent command to perform the behavior, instructing the remotely controlled device to perform the subsequent series of actions. [ See at least paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." And Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]
Therefore, Zhou teaches monitoring a remote controlled vehicle’s actions, updating the system according to feedback, and producing a different subsequent series of actions from the previous series of actions, and then commanding the subsequent different series of actions by the machine learning model. Zhou however does not teach the above limitations being used with a second machine learning model and a second device, or producing a series of actions different from another vehicle’s series of actions. Beckman however teaches the methods disclosed above occurring in different vehicles, each with their own machine learning models.  
wherein the subsequent series of actions of the vehicle is different than another vehicle’s subsequent series of actions; [See at least col. 4 lines 33 - 56: “The machine learning system may be fully trained using a substantial corpus of observed environmental signals e(t) correlated with captured sound signals s(t) that are obtained using one or more of the aerial vehicles, and others, to develop a sound model f. After the machine learning system has been trained, and the sound model f has been developed, the machine learning system may be provided with a set of extrinsic or intrinsic information or data (e.g., environmental conditions, operational characteristics, or positions) that may be anticipated in an environment in which an aerial vehicle is expected to operate. In some embodiments, the machine learning system may reside and/or be operated on one or more computing devices or machines provided onboard one or more of the aerial vehicles. The machine learning system may receive information or data regarding the corpus of sound signals observed and the sound signals captured by the other aerial vehicles, for training purposes and, once trained, the machine learning system may receive extrinsic or intrinsic information or data that is actually observed by the aerial vehicle, e.g., in real time or in near-real time, as inputs and may generate outputs corresponding to predicted sound levels based on the information or data.” ]
This citation teaches of multiple aerial vehicles which each develop separate models that learn according to their respective input data being measured. Each vehicle then updates their respective machine learning system (which is disclosed as being located on each vehicle) inputting their own different real time data to generate their own different set of outputs. Therefore after implementing the machine learning methods taught by Zhou, a second vehicle would have a set of first and second actions different from the first vehicle as a result of the second vehicle’s different input data entered in its own machine learning model outputting different behaviors. 
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include more than one remote controlled vehicle and model so that each vehicle can operate in different environments, and thereby having each model trained according to the data “anticipated in an environment in which an aerial vehicle is expected to operate,” (Beckman paragraph 20.)
As per Claim 13: Zhou, and Redding teach all of the limitations of claim 11. Zhou further teaches the following;
wherein the system is further enabled to: monitor a series of actions performed by a remotely controlled device that comprise the behavior; [ See at least Paragraph 0027: “training the deep neural network model with training samples obtained from a simulated environment until a condition of minimizing a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information is reached.” Also see paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information.” And Paragraph 0130: “It needs to be appreciated that regarding the aforesaid method embodiments, for ease of description, the aforesaid method embodiments are all described as a combination of a series of actions” ]
update, according to the feedback, a machine learning model used by the remotely controlled device. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data,” And paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle,” And paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]
to produce a different subsequent series of actions to comprise the behavior, [See at least paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]
From the above citation the neural network undergoes an iterative (“fine tuning”) learning process which under broadest reasonable interpretation, “fine-tuning” is interpreted as being changing the intensity of an action by replacing an old action with a new action. Therefore the “changed action” is interpreted as removing an old action and implementing a new intensity action; resulting in the modified control module to control the vehicle to behave differently than before.
and in response to receiving the subsequent command to perform the behavior, instruct the remotely controlled device to perform the subsequent series of actions. [ See at least paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." And Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]
Zhou teaches monitoring a remote controlled vehicle’s actions, updating the system according to feedback, and producing a different subsequent series of actions from the previous series of actions, and then commanding the subsequent different series of actions by the machine learning model. Zhou however does not teach the above limitations being used with a second machine learning model and a second device, or producing a series of actions different from another vehicle’s series of actions. Beckman however teaches the methods disclosed above occurring in different vehicles, each with their own machine learning models.  
wherein the subsequent series of actions is different than another vehicle’s subsequent series of actions; [See at least col. 4 lines 33 - 56: “The machine learning system may be fully trained using a substantial corpus of observed environmental signals e(t) correlated with captured sound signals s(t) that are obtained using one or more of the aerial vehicles, and others, to develop a sound model f. After the machine learning system has been trained, and the sound model f has been developed, the machine learning system may be provided with a set of extrinsic or intrinsic information or data (e.g., environmental conditions, operational characteristics, or positions) that may be anticipated in an environment in which an aerial vehicle is expected to operate. In some embodiments, the machine learning system may reside and/or be operated on one or more computing devices or machines provided onboard one or more of the aerial vehicles. The machine learning system may receive information or data regarding the corpus of sound signals observed and the sound signals captured by the other aerial vehicles, for training purposes and, once trained, the machine learning system may receive extrinsic or intrinsic information or data that is actually observed by the aerial vehicle, e.g., in real time or in near-real time, as inputs and may generate outputs corresponding to predicted sound levels based on the information or data.” ]
This citation teaches of multiple aerial vehicles which each develop separate models that learn according to their respective input data being measured. Each vehicle then updates their respective machine learning system (which is disclosed as being located on each vehicle) inputting their own different real time data to generate their own different set of outputs. Therefore after implementing the machine learning methods taught by Zhou, a second vehicle would have a set of first and second actions different from the first vehicle as a result of the second vehicle’s different input data entered in its own machine learning model outputting different behaviors.
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include more than one remote controlled vehicle and model so that each vehicle can operate in different environments, and thereby having each model trained according to the data “anticipated in an environment in which an aerial vehicle is expected to operate,” (Beckman paragraph 20.)
As per Claim 20: Zhou, and Redding teach all of the limitations of claim 18. Zhou further teaches the following;
further comprising: monitoring a series of actions performed by a remotely controlled device that comprise the behavior; [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information.” And Paragraph 0130: “It needs to be appreciated that regarding the aforesaid method embodiments, for ease of description, the aforesaid method embodiments are all described as a combination of a series of actions.” ]  
updating, according to the feedback, a machine learning model used by the remotely controlled device. [ See at least Paragraph 0118: “according to a difference between the state information of the unmanned aerial vehicle under action of the control information output by the deep neural network and the target state information, judging whether the control information complies with an expectation for reaching the target state information, and providing a positive/negative feedback;” and paragraph 0119: “regarding the sensor data, the target state information and the control information as a group of training samples to update the sample data,” And paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle,” And paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ]
to produce a different subsequent series of actions to comprise the behavior, [See at least paragraph 0120: “training the neural network which is already trained in the simulated environment according to the training samples obtained from the actual environment, to obtain a neural network adapted for flight control of the unmanned aerial vehicle.” And Paragraph 0122: “Since operating the unmanned aerial vehicle in the actual environment is not completely the same as in the simulated environment, it is feasible to, through the above steps, train the neural network obtained by training in the simulated environment again, implement fine-tuning, and obtain an unmanned aerial vehicle control module adapted for flight control of the unmanned aerial vehicle.” ] 
From the above citation the neural network undergoes an iterative (“fine tuning”) learning process which under broadest reasonable interpretation, “fine-tuning” is interpreted as being changing the intensity of an action by replacing an old action with a new action. Therefore the “changed action” is interpreted as removing an old action and implementing a new intensity action; resulting in the modified control module to control the vehicle to behave differently than before.
and in response to receiving the subsequent command to perform the behavior, instructing the remotely controlled device to perform the subsequent series of actions. [ See at least paragraph 0004: "An unmanned aerial vehicle refers to an unmanned aircraft which is manipulated via wireless remote control or program control ..." And Paragraph 0117: “Preferably, continuous random control information is provided to the unmanned aerial vehicle at a time interval. The deep neural network which is already trained in the simulated environment outputs the corresponding control information according to the sensor data and the continuous random target state information;” ]
Zhou teaches monitoring a remote controlled vehicle’s actions, updating the system according to feedback, and producing a different subsequent series of actions from the previous series of actions, and then commanding the subsequent different series of actions by the machine learning model. Zhou however does not teach the above limitations being used with a second machine learning model and a second device, or producing a series of actions different from another vehicle’s series of actions. Beckman however teaches the methods disclosed above occurring in different vehicles, each with their own machine learning models.  
wherein the subsequent series of actions is different than the another subsequent series of actions; [See at least col. 4 lines 33 - 56: “The machine learning system may be fully trained using a substantial corpus of observed environmental signals e(t) correlated with captured sound signals s(t) that are obtained using one or more of the aerial vehicles, and others, to develop a sound model f. After the machine learning system has been trained, and the sound model f has been developed, the machine learning system may be provided with a set of extrinsic or intrinsic information or data (e.g., environmental conditions, operational characteristics, or positions) that may be anticipated in an environment in which an aerial vehicle is expected to operate. In some embodiments, the machine learning system may reside and/or be operated on one or more computing devices or machines provided onboard one or more of the aerial vehicles. The machine learning system may receive information or data regarding the corpus of sound signals observed and the sound signals captured by the other aerial vehicles, for training purposes and, once trained, the machine learning system may receive extrinsic or intrinsic information or data that is actually observed by the aerial vehicle, e.g., in real time or in near-real time, as inputs and may generate outputs corresponding to predicted sound levels based on the information or data.” ]
This citation teaches of multiple aerial vehicles which each develop separate models that learn according to their respective input data being measured. Each vehicle then updates their respective machine learning system (which is disclosed as being located on each vehicle) inputting their own different real time data to generate their own different set of outputs. Therefore after implementing the machine learning methods taught by Zhou, a second vehicle would have a set of first and second actions different from the first vehicle as a result of the second vehicle’s different input data entered in its own machine learning model outputting different behaviors.  
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou to include more than one remote controlled vehicle and model so that each vehicle can operate in different environments, and thereby having each model trained according to the data “anticipated in an environment in which an aerial vehicle is expected to operate,” (Beckman paragraph 20.)

Claims 7, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (US 20190004518 A1), in view of Warren (US 20150324706 A1.)  
As per Claim 7: Zhou discloses all the limitations of claim 1. Zhou however does not teach the following; 
wherein the feedback is received as a voice command, further comprising: identifying a speaker of the voice command; and in response to determining that the speaker is associated with the remotely controlled device, accepting the voice command as the feedback. 
Warren teaches;  
an automation system in which rules or programs can be initiated or adjusted, (Paragraph 0052) “The information provided by sensor may be used to initiate a rule setting mode of a home automation system…”
via voice command (Paragraph 0052) “confirm user feedback associated with questions or other feedback provided via voice control module”
in order to adjust the programming before the feedback is accepted, the identity must be confirmed (Paragraph 0052) “confirm identification of one or more users attempting to establish or change rules of the home automation system.”
user must be identified and verified, thereby providing a known means of feedback while ensuring "or confirming the identity of" only a verified user that can adjust a system (Paragraph 0052) “confirm user feedback associated with questions or other feedback provided via voice control module, confirm identification of one or more users attempting to establish or change rules of the home automation system.”
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou so that feedback is received as a voice command, identifying a speaker of the voice command; and in response to determining that the speaker is associated with the remotely controlled device, accepting the voice command as the feedback. These modifications would thereby give a user an alternative way to provide feedback while also identifying users to verify their authority to provide feedback to the system.
As per Claim 14: Zhou discloses all the limitations of claim 8. Zhou however does not teach the following; 
wherein the feedback is received as a voice command, and the system is further enabled to: identify a speaker of the voice command; and in response to determining that the speaker is associated with the remotely controlled device, accept the voice command as the feedback. 
Warren teaches;  
an automation system in which rules or programs can be initiated or adjusted, (Paragraph 0052) “The information provided by sensor may be used to initiate a rule setting mode of a home automation system…”
via voice command (Paragraph 0052) “confirm user feedback associated with questions or other feedback provided via voice control module”
in order to adjust the programming before the feedback is accepted, the identity must be confirmed (Paragraph 0052) “confirm identification of one or more users attempting to establish or change rules of the home automation system.”
user must be identified and verified, thereby providing a known means of feedback while ensuring "or confirming the identity of" only a verified user that can adjust a system (Paragraph 0052) “confirm user feedback associated with questions or other feedback provided via voice control module, confirm identification of one or more users attempting to establish or change rules of the home automation system.”
Therefore, at the time of filling of the invention, it would have been obvious to one of ordinary skill in the art to modify Zhou so that feedback is received as a voice command, identifying a speaker of the voice command; and in response to determining that the speaker is associated with the remotely controlled device, accepting the voice command as the feedback. These modifications would thereby give a user an alternative way to provide feedback while also identifying users to verify their authority to provide feedback to the system.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVE ALEXANDER PARASIDIS whose telephone number is (571)272-7458.  The examiner can normally be reached on Mon. - Fri. (7:30 AM - 4:30 PM).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Olszewski can be reached on (571) 272-2706.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.A.P/Examiner, Art Unit 3669                                                                                                                                                                                                        /ADAM R MOTT/Primary Examiner, Art Unit 3669