DETAILED ACTION
This action is in response to the claims filed 05/22/2020 for application 16/881,557 which claims priority to PRO 62/851,858 filed 05/23/2019. Claims 1-20 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 20 is objected to because of the following informalities:  “The computer implemented method of claim 1.”  For purposes of compact prosecution, the examiner will interpret claim 20 as dependent on claim 14. Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 8-10, and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Doshi-Velez et al. ("Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations", hereinafter "Doshi-Velez") in view of Lopes et al. ("Active Learning for Reward Estimation in Inverse Reinforcement Learning.”)

Regarding claim 1, Doshi-Velez teaches A computer implemented method comprising: 
accessing, by a system, a machine learning model for reinforcement learning using Markov decision processes (MDP) (pg. 4, § 3 The IBP-GP HiP-MDP Model), the MDP represented using a state space, an action space, a transition function, and a reward function (“A HiP-MDP is described by a tuple: {S, A,Θ, T,R, γ, PΘ}, where S and A are the sets of states s and actions a, and R(s, a) is the reward function.” [pg. 3, §2. Hidden parameter Markov Decision Processes,¶2]) wherein the transition function is parameterized by a first set of latent variables (“The dynamics T for each instance b depends on the value of the hidden parameters θb: T(s′|s, θb, a). We denote the set of all possible parameters as Θ and let PΘ(θ) be the prior over these parameters” [pg. 3, § 2 Hidden Parameter Markov Decision Processes, ¶2; T is the transition function.]), wherein the machine learning model is configured for execution by an agent in an environment (“Cartpole has a relatively simple policy; we use it to visualize how HiP-MDPs compress the dynamics across different parameter settings into a latent space. Acrobot and bicycle are challenging domains when the agent must rapidly learn a control strategy for a new set of parameters” [pg. 8, 5 Results, ¶1]), wherein each latent variable represents a hidden parameter corresponding to one or more of:
(a) a factor representing an environment in which the machine learning model is executed (“In the cartpole domain, an agent must apply forces either to the left or right of a cart to keep a pole balanced on top of it” [pg. 8, 5.1 Cartpole, ¶1]; See further: In all 5 of the MCMC runs, a total of 4 latent parameters were inferred. The output dimensions x and θ consistently only used the first (baseline) feature—that is, our IBP-GP’s predictions were in fact the same as using a single GP. This observation is consistent with the cartpole dynamics and the observed prediction errors in figure 4. The second feature was used by both ẋ and θ̇ and was positively correlated with both the pole mass m and the pole length l (figures 6, 5, and 5).” [pg. 14, A Visualizing Latent Features for Cartpole, ¶1; note: The claim under BRI only requires “one or more of”, however examiner has provided a citation for both (a) and (b)]), or 
(b) an attribute of an agent executing the machine learning based model (“Predicting angular velocities is critical to planning in acrobot; inaccurate predictions will make the agent believe it can reach the swing-up position more quickly than is physically possible” [pg. 9, § 5.2 Acrobot, ¶3]); 
training the machine learning model comprising: 
training based on variations of a first set of latent variables (“We varied the pole mass m and the pole length l. In each (m, l) setting, we ran Sarsa (using a 3rd order Fourier basis) for 5 repetitions of 30 episodes, where each episode was run for 300 steps or until the pole fell down… Next, 50 training points were selected from each run of (m, l) settings with m ∈ {.1, .15, .2, . 25, .3} and l ∈ {.4, .45, .5, .55, .6}. The online inference procedure was used to estimate the weights wkb given the filter parameters zkad and basis functions fkad from the batch procedure.” [pg. 8, § 5.1 Cartpole, ¶2-3]), and 
executing the machine learning model in a new environment, wherein the execution of the machine learning model is based on a combination of latent variables from the first set of latent variables and the second set of latent variables that is distinct from combinations of latent variables used during training of the machine learning based model (“Many control applications involve repeated encounters with domains that have similar, but not identical, dynamics. An agent that swings bats may encounter several bats with different weights or lengths, while an agent that manipulates cups may encounter cups with different amounts of liquid. An agent that drives cars may encounter many different cars, each with unique handling characteristics… In all of these scenarios, it makes little sense of the agent to start afresh when it encounters a new bat, a new cup, or a new car. Exposure to a variety of related domains should correspond to faster and more reliable adaptation to a new instance of the same type of domain, via transfer learning. If an agent has already swung several bats, for example, we would hope that it could easily learn to swing a new bat. Why? Like many domains, swinging a bat has a low-dimensional representation that affects its dynamics in structured ways. The agent’s prior experience should allow it to both learn how to model related instances of a domain—such as via the bat’s length, which smoothly changes in the bat’s dynamics—and what specific model parameters (e.g., lengths) are likely.” [pg. 1, ¶1-2]).
Although Doshi-Velez discloses a set of latent variables, the reference fails to explicitly teach the reward function is parameterized by a second set of latent variables
training based on variations of a second set of latent variables
Lopes teaches the reward function is parameterized by a second set of latent variables (“For simplicity, we consider a reward function parameterized using a two-dimensional parameter vector θ” [pg. 40, §4.1 Finding the Maximum of a Quadratic Function, ¶1])
training based on variations of a second set of latent variables (“For our IRL problem, we consider the reward function, r(x) = −(x − 0.15)2, for which the agent should learn the parameter θ from a demonstration” [pg. 41, § §4.1 Finding the Maximum of a Quadratic Function, ¶2; the agents learns from different states which implies variations.])
Doshi-Velez and Lopes are both in the same field of endeavor of using Hidden Parameter Markov Decision Processes to train an agent, thus are analogous arts. Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s teachings by parameterizing the reward function with a second set of variables as taught by Lopes. One would have been motivated to make this modification in order to reduce the amount of computations that the Markov Decision process requires to find the optimal reward function. [pg. 32, ¶2, Lopes]

Regarding claim 2, Doshi-Velez/Lopes teaches The computer implemented method of claim 1, where Doshi-Velez teaches further comprising: 
initializing a data set representing transitions (“The reinforcement learning (RL) setting consists of a series of interactions between an agent and an environment. From some state s, the agent chooses an action a which transitions it to a new state s′ and provides reward r. Its goal is to maximize its expected sum of rewards, E[Σtγ t rt ], where γ ∈ [0, 1) is a discount factor that weighs the relative importance of near-term and long-term rewards. This series of interactions can be modeled as a Markov Decision Process (MDP), a 5-tuple {S, A, T, R, γ} where S and A are sets of states s and actions a, the transition function T(s′|s, a) gives the probability of the next state being s′ after performing action a in state s, and the reward function R(s, a) gives the reward r for performing action a in state s” [pg. 2, §1 Background, ¶1]); 
repeatedly: 
training the machine learning model using the dataset (“
    PNG
    media_image1.png
    154
    493
    media_image1.png
    Greyscale
” [pg. 4, 3. The IBP-GP HiP-MDP Model, ¶1]); and 
augmenting the dataset using new transitions (“We focus on scenarios in which the agent is given a large amount of batch observational data from several domain instances (perhaps solved as independent MDPs) and tasked with quickly performing well on new instances. Our batch inference procedure uses the observational data to fit the filter parameters zkad and basis functions fkad, which are independent of any particular instance, and compute a posterior over the weights wkb, which depend on each instance. These settings of zkad, fkad, and P(wkb) will used to infer the instance-specific weights wkb efficiently in the online setting when given a new instance;” [pg. 5, 4. 4 Inference in the IBP-GP HiP-MDP, ¶1])

Regarding claim 3, Doshi-Velez/Lopes teaches The computer implemented method of claim 1, where Doshi-Velez teaches wherein the agent represents a robot (See, pg. 9, § 5.2 Acrobot, “Acrobot, in which the agent must swing up a double pendulum only through applying a positive, neutral, or negative torque to the joint between the two poles, is much more challenging.” Note: Acrobot represents a robot.)  and 
the environment represents an obstacle course in which the robot is moving (“The bicycle domain requires the agent to keep a bicycle traveling at a constant speed upright for as long as possible within a bounded 60×60m area.” [pg. 10, § 5.3 Bicycle, ¶1]).

Regarding claim 8, Doshi-Velez teaches A non-transitory computer readable storage medium storing instructions, the instructions when executed by a processor, cause the processor to perform steps comprising (“Control applications often feature tasks with similar, but not identical, dynamics. We introduce the Hidden Parameter Markov Decision Process (HiP-MDP), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors, and introduce a semiparametric regression approach for learning its structure from data” [Abstract, Control applications imply use of processors and memory]):
accessing, by a system, a machine learning model for reinforcement learning using Markov decision processes (MDP) (pg. 4, § 3 The IBP-GP HiP-MDP Model), the MDP represented using a state space, an action space, a transition function, and a reward function (“A HiP-MDP is described by a tuple: {S, A,Θ, T,R, γ, PΘ}, where S and A are the sets of states s and actions a, and R(s, a) is the reward function.” [pg. 3, §2. Hidden parameter Markov Decision Processes,¶2]) wherein the transition function is parameterized by a first set of latent variables (“The dynamics T for each instance b depends on the value of the hidden parameters θb: T(s′|s, θb, a). We denote the set of all possible parameters as Θ and let PΘ(θ) be the prior over these parameters” [pg. 3, § 2 Hidden Parameter Markov Decision Processes, ¶2; T is the transition function.]), wherein the machine learning model is configured for execution by an agent in an environment (“Cartpole has a relatively simple policy; we use it to visualize how HiP-MDPs compress the dynamics across different parameter settings into a latent space. Acrobot and bicycle are challenging domains when the agent must rapidly learn a control strategy for a new set of parameters” [pg. 8, 5 Results, ¶1]), wherein each latent variable represents a hidden parameter corresponding to one or more of:
(a) a factor representing an environment in which the machine learning model is executed (“In the cartpole domain, an agent must apply forces either to the left or right of a cart to keep a pole balanced on top of it” [pg. 8, 5.1 Cartpole, ¶1]; See further: In all 5 of the MCMC runs, a total of 4 latent parameters were inferred. The output dimensions x and θ consistently only used the first (baseline) feature—that is, our IBP-GP’s predictions were in fact the same as using a single GP. This observation is consistent with the cartpole dynamics and the observed prediction errors in figure 4. The second feature was used by both ẋ and θ̇ and was positively correlated with both the pole mass m and the pole length l (figures 6, 5, and 5).” [pg. 14, A Visualizing Latent Features for Cartpole, ¶1; note: The claim under BRI only requires “one or more of”, however examiner has provided a citation for both (a) and (b)]), or 
(b) an attribute of an agent executing the machine learning based model (“Predicting angular velocities is critical to planning in acrobot; inaccurate predictions will make the agent believe it can reach the swing-up position more quickly than is physically possible” [pg. 9, § 5.2 Acrobot, ¶3]); 
training the machine learning model comprising: 
training based on variations of a first set of latent variables (“We varied the pole mass m and the pole length l. In each (m, l) setting, we ran Sarsa (using a 3rd order Fourier basis) for 5 repetitions of 30 episodes, where each episode was run for 300 steps or until the pole fell down… Next, 50 training points were selected from each run of (m, l) settings with m ∈ {.1, .15, .2, . 25, .3} and l ∈ {.4, .45, .5, .55, .6}. The online inference procedure was used to estimate the weights wkb given the filter parameters zkad and basis functions fkad from the batch procedure.” [pg. 8, § 5.1 Cartpole, ¶2-3]), and 
executing the machine learning model in a new environment, wherein the execution of the machine learning model is based on a combination of latent variables from the first set of latent variables and the second set of latent variables that is distinct from combinations of latent variables used during training of the machine learning based model (“Many control applications involve repeated encounters with domains that have similar, but not identical, dynamics. An agent that swings bats may encounter several bats with different weights or lengths, while an agent that manipulates cups may encounter cups with different amounts of liquid. An agent that drives cars may encounter many different cars, each with unique handling characteristics… In all of these scenarios, it makes little sense of the agent to start afresh when it encounters a new bat, a new cup, or a new car. Exposure to a variety of related domains should correspond to faster and more reliable adaptation to a new instance of the same type of domain, via transfer learning. If an agent has already swung several bats, for example, we would hope that it could easily learn to swing a new bat. Why? Like many domains, swinging a bat has a low-dimensional representation that affects its dynamics in structured ways. The agent’s prior experience should allow it to both learn how to model related instances of a domain—such as via the bat’s length, which smoothly changes in the bat’s dynamics—and what specific model parameters (e.g., lengths) are likely.” [pg. 1, ¶1-2]).
Although Doshi-Velez discloses a set of latent variables, the reference fails to explicitly teach the reward function is parameterized by a second set of latent variables
training based on variations of a second set of latent variables
Lopes teaches the reward function is parameterized by a second set of latent variables (“For simplicity, we consider a reward function parameterized using a two-dimensional parameter vector θ” [pg. 40, §4.1 Finding the Maximum of a Quadratic Function, ¶1])
training based on variations of a second set of latent variables (“For our IRL problem, we consider the reward function, r(x) = −(x − 0.15)2, for which the agent should learn the parameter θ from a demonstration” [pg. 41, § §4.1 Finding the Maximum of a Quadratic Function, ¶2; the agents learns from different states which implies variations.])
Doshi-Velez and Lopes are both in the same field of endeavor of using Hidden Parameter Markov Decision Processes to train an agent, thus are analogous arts. Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s teachings by parameterizing the reward function with a second set of variables as taught by Lopes. One would have been motivated to make this modification in order to reduce the amount of computations that the Markov Decision process requires to find the optimal reward function. [pg. 32, ¶2, Lopes]

Regarding claim 9, Doshi-Velez/Lopes The non-transitory computer readable storage medium of claim 8, where Doshi-Velez teaches further comprising: 
initializing a data set representing transitions (“The reinforcement learning (RL) setting consists of a series of interactions between an agent and an environment. From some state s, the agent chooses an action a which transitions it to a new state s′ and provides reward r. Its goal is to maximize its expected sum of rewards, E[Σtγ t rt ], where γ ∈ [0, 1) is a discount factor that weighs the relative importance of near-term and long-term rewards. This series of interactions can be modeled as a Markov Decision Process (MDP), a 5-tuple {S, A, T, R, γ} where S and A are sets of states s and actions a, the transition function T(s′|s, a) gives the probability of the next state being s′ after performing action a in state s, and the reward function R(s, a) gives the reward r for performing action a in state s” [pg. 2, §1 Background, ¶1]); 
repeatedly: 
training the machine learning model using the dataset (“
    PNG
    media_image1.png
    154
    493
    media_image1.png
    Greyscale
” [pg. 4, 3. The IBP-GP HiP-MDP Model, ¶1]); and 
augmenting the dataset using new transitions (“We focus on scenarios in which the agent is given a large amount of batch observational data from several domain instances (perhaps solved as independent MDPs) and tasked with quickly performing well on new instances. Our batch inference procedure uses the observational data to fit the filter parameters zkad and basis functions fkad, which are independent of any particular instance, and compute a posterior over the weights wkb, which depend on each instance. These settings of zkad, fkad, and P(wkb) will used to infer the instance-specific weights wkb efficiently in the online setting when given a new instance;” [pg. 5, 4. 4 Inference in the IBP-GP HiP-MDP, ¶1])

Regarding claim 10, Doshi-Velez/Lopes teaches The non-transitory computer readable storage medium of claim 8, where Doshi-Velez teaches wherein the agent represents a robot (See, pg. 9, § 5.2 Acrobot, “Acrobot, in which the agent must swing up a double pendulum only through applying a positive, neutral, or negative torque to the joint between the two poles, is much more challenging.” Note: Acrobot represents a robot.)  and 
the environment represents an obstacle course in which the robot is moving (“The bicycle domain requires the agent to keep a bicycle traveling at a constant speed upright for as long as possible within a bounded 60×60m area.” [pg. 10, § 5.3 Bicycle, ¶1])

Regarding claim 14, Doshi-Velez teaches A computer implemented method comprising:
accessing, by a system, a machine learning model for reinforcement learning using Markov decision processes (MDP) (pg. 4, § 3 The IBP-GP HiP-MDP Model), the MDP represented using a state space, an action space, a transition function, and a reward function (“A HiP-MDP is described by a tuple: {S, A,Θ, T,R, γ, PΘ}, where S and A are the sets of states s and actions a, and R(s, a) is the reward function.” [pg. 3, §2. Hidden parameter Markov Decision Processes, ¶2]) wherein the transition function is parameterized by a set of latent variables (“The dynamics T for each instance b depends on the value of the hidden parameters θb: T(s′|s, θb, a). We denote the set of all possible parameters as Θ and let PΘ(θ) be the prior over these parameters” [pg. 3, § 2 Hidden Parameter Markov Decision Processes, ¶2; T is the transition function.]), wherein the machine learning model is configured for execution by an agent in an environment (“Cartpole has a relatively simple policy; we use it to visualize how HiP-MDPs compress the dynamics across different parameter settings into a latent space. Acrobot and bicycle are challenging domains when the agent must rapidly learn a control strategy for a new set of parameters” [pg. 8, 5 Results, ¶1]), wherein each latent variable represents a hidden parameter corresponding to one or more of:
(a) a factor representing an environment in which the machine learning model is executed (“In the cartpole domain, an agent must apply forces either to the left or right of a cart to keep a pole balanced on top of it” [pg. 8, 5.1 Cartpole, ¶1]; See further: In all 5 of the MCMC runs, a total of 4 latent parameters were inferred. The output dimensions x and θ consistently only used the first (baseline) feature—that is, our IBP-GP’s predictions were in fact the same as using a single GP. This observation is consistent with the cartpole dynamics and the observed prediction errors in figure 4. The second feature was used by both ẋ and θ̇ and was positively correlated with both the pole mass m and the pole length l (figures 6, 5, and 5).” [pg. 14, A Visualizing Latent Features for Cartpole, ¶1; note: The claim under BRI only requires “one or more of”, however examiner has provided a citation for both (a) and (b)]), or 
(b) an attribute of an agent executing the machine learning based model (“Predicting angular velocities is critical to planning in acrobot; inaccurate predictions will make the agent believe it can reach the swing-up position more quickly than is physically possible” [pg. 9, § 5.2 Acrobot, ¶3]); 
training the machine learning model based on variations of the set of latent variables (“We varied the pole mass m and the pole length l. In each (m, l) setting, we ran Sarsa (using a 3rd order Fourier basis) for 5 repetitions of 30 episodes, where each episode was run for 300 steps or until the pole fell down… Next, 50 training points were selected from each run of (m, l) settings with m ∈ {.1, .15, .2, . 25, .3} and l ∈ {.4, .45, .5, .55, .6}. The online inference procedure was used to estimate the weights wkb given the filter parameters zkad and basis functions fkad from the batch procedure.” [pg. 8, § 5.1 Cartpole, ¶2-3]), and 
executing the machine learning model in a new environment, wherein the execution of the machine learning model is based on values of latent variables from the set of latent variables that is distinct from values of latent variables used during training of the machine learning based model (“Many control applications involve repeated encounters with domains that have similar, but not identical, dynamics. An agent that swings bats may encounter several bats with different weights or lengths, while an agent that manipulates cups may encounter cups with different amounts of liquid. An agent that drives cars may encounter many different cars, each with unique handling characteristics… In all of these scenarios, it makes little sense of the agent to start afresh when it encounters a new bat, a new cup, or a new car. Exposure to a variety of related domains should correspond to faster and more reliable adaptation to a new instance of the same type of domain, via transfer learning. If an agent has already swung several bats, for example, we would hope that it could easily learn to swing a new bat. Why? Like many domains, swinging a bat has a low-dimensional representation that affects its dynamics in structured ways. The agent’s prior experience should allow it to both learn how to model related instances of a domain—such as via the bat’s length, which smoothly changes in the bat’s dynamics—and what specific model parameters (e.g., lengths) are likely.” [pg. 1, ¶1-2]).
Although Doshi-Velez discloses a set of latent variables, the reference fails to explicitly teach the reward function is parameterized by a set of latent variables
Lopes teaches the reward function is parameterized by a set of latent variables (“For simplicity, we consider a reward function parameterized using a two-dimensional parameter vector θ” [pg. 40, §4.1 Finding the Maximum of a Quadratic Function, ¶1])
Doshi-Velez and Lopes are both in the same field of endeavor of using Hidden Parameter Markov Decision Processes to train an agent, thus are analogous arts. Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s teachings by parameterizing the reward function with a second set of variables as taught by Lopes. One would have been motivated to make this modification in order to reduce the amount of computations that the Markov Decision process requires to find the optimal reward function. [pg. 32, ¶2, Lopes]

Regarding claim 15, Doshi-Velez/Lopes teaches The computer implemented method of claim 14, where Doshi-Velez teaches further comprising: 
initializing a data set representing transitions (“The reinforcement learning (RL) setting consists of a series of interactions between an agent and an environment. From some state s, the agent chooses an action a which transitions it to a new state s′ and provides reward r. Its goal is to maximize its expected sum of rewards, E[Σtγ t rt ], where γ ∈ [0, 1) is a discount factor that weighs the relative importance of near-term and long-term rewards. This series of interactions can be modeled as a Markov Decision Process (MDP), a 5-tuple {S, A, T, R, γ} where S and A are sets of states s and actions a, the transition function T(s′|s, a) gives the probability of the next state being s′ after performing action a in state s, and the reward function R(s, a) gives the reward r for performing action a in state s” [pg. 2, §1 Background, ¶1]); 
repeatedly: 
training the machine learning model using the dataset (“
    PNG
    media_image1.png
    154
    493
    media_image1.png
    Greyscale
” [pg. 4, 3. The IBP-GP HiP-MDP Model, ¶1]); and 
augmenting the dataset using new transitions (“We focus on scenarios in which the agent is given a large amount of batch observational data from several domain instances (perhaps solved as independent MDPs) and tasked with quickly performing well on new instances. Our batch inference procedure uses the observational data to fit the filter parameters zkad and basis functions fkad, which are independent of any particular instance, and compute a posterior over the weights wkb, which depend on each instance. These settings of zkad, fkad, and P(wkb) will used to infer the instance-specific weights wkb efficiently in the online setting when given a new instance;” [pg. 5, 4. 4 Inference in the IBP-GP HiP-MDP, ¶1])
Regarding claim 16, Doshi-Velez/Lopes teaches The computer implemented method of claim 14, where Doshi-Velez teaches wherein the agent represents a robot (See, pg. 9, § 5.2 Acrobot, “Acrobot, in which the agent must swing up a double pendulum only through applying a positive, neutral, or negative torque to the joint between the two poles, is much more challenging.” Note: Acrobot represents a robot.)  and 
Lopes teaches the environment represents an obstacle course in which the robot is moving (“The bicycle domain requires the agent to keep a bicycle traveling at a constant speed upright for as long as possible within a bounded 60×60m area.” [pg. 10, § 5.3 Bicycle, ¶1])

Claims 4-6, 11-13, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Doshi-Velez in view of Lopes and further in view of Bojarski et al. ("End to End Learning for Self-Driving Cars", hereinafter "Bojarski").

Regarding claim 4, Doshi-Velez/Lopes teaches The computer implemented method of claim 3, however fails to explicitly teach wherein the robot comprises sensors for capturing data describing environment of the robot, and wherein the machine learning model receives as input, sensor data captured by the sensors of the robot and predicts information describing one or more objects in the environment of the robot.
Bojarski teaches wherein the robot comprises sensors for capturing data describing environment of the robot (“Figure 1 shows a simplified block diagram of the collection system for training data for DAVE-2. Three cameras are mounted behind the windshield of the data-acquisition car. Time-stamped video from the cameras is captured simultaneously with the steering angle applied by the human driver.” [pg. 2, § 2 Overview of the DAVE-2 System, ¶1; Cameras are considered to be sensors]), and wherein the machine learning model receives as input, sensor data captured by the sensors of the robot and predicts information describing one or more objects in the environment of the robot (“Images for two specific off-center shifts can be obtained from the left and the right camera. Additional shifts between the cameras and all rotations are simulated by viewpoint transformation of the image from the nearest camera. Precise viewpoint transformation requires 3D scene knowledge which we don’t have. We therefore approximate the transformation by assuming all points below the horizon are on flat ground and all points above the horizon are infinitely far away. This works fine for flat terrain but it introduces distortions for objects that stick above the ground, such as cars, poles, trees, and buildings. Fortunately these distortions don’t pose a big problem for network training. The steering label for transformed images is adjusted to one that would steer the vehicle back to the desired location and orientation in two seconds… A block diagram of our training system is shown in Figure 2. Images are fed into a CNN which then computes a proposed steering command.” [pg. 3, ¶1-2]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses a transfer policy method using HiP-MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’s teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Regarding claim 5, Doshi-Velez/Lopes teaches The computer implemented method of claim 1, however fails to explicitly teach wherein the agent represents a self-driving vehicle and the environment represents traffic through which the self-driving vehicle is moving.
Bojarski teaches wherein the agent represents a self-driving vehicle (“We estimate what percentage of the time the network could drive the car (autonomy). The metric is determined by counting simulated human interventions (see Section 6). These interventions occur when the simulated vehicle departs from the center line by more than one meter. We assume that in real life an actual intervention would require a total of six seconds: this is the time required for a human to retake control of the vehicle, re-center it, and then restart the self-steering mode” [pg. 6, 7.1 Simulation Tests, ¶1]) and the environment represents traffic through which the self-driving vehicle is moving (“We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads.” [Abstract]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Regarding claim 6, Doshi-Velez/Lopes/Bojarski teaches The computer implemented method of claim 5, where Bojarski teaches wherein the self-driving vehicle has one or more sensors mounted on the self-driving vehicle (“Figure 1 shows a simplified block diagram of the collection system for training data for DAVE-2. Three cameras are mounted behind the windshield of the data-acquisition car. Time-stamped video from the cameras is captured simultaneously with the steering angle applied by the human driver.” [pg. 2, § 2 Overview of the DAVE-2 System, ¶1]), and wherein the machine learning model receives as input, sensor data captured by the sensors of the self-driving vehicle and predicts information describing one or more entities in the environment through which the self- driving vehicle is driving (“Images for two specific off-center shifts can be obtained from the left and the right camera. Additional shifts between the cameras and all rotations are simulated by viewpoint transformation of the image from the nearest camera. Precise viewpoint transformation requires 3D scene knowledge which we don’t have. We therefore approximate the transformation by assuming all points below the horizon are on flat ground and all points above the horizon are infinitely far away. This works fine for flat terrain but it introduces distortions for objects that stick above the ground, such as cars, poles, trees, and buildings. Fortunately these distortions don’t pose a big problem for network training. The steering label for transformed images is adjusted to one that would steer the vehicle back to the desired location and orientation in two seconds… A block diagram of our training system is shown in Figure 2. Images are fed into a CNN which then computes a proposed steering command.” [pg. 3, ¶1-2]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Regarding claim 11, Doshi-Velez/Lopes teaches The non-transitory computer readable storage medium of claim 10, however fails to explicitly teach wherein the robot comprises sensors for capturing data describing environment of the robot, and wherein the machine learning model receives as input, sensor data captured by the sensors of the robot and predicts information describing one or more objects in the environment of the robot.
Bojarski teaches wherein the robot comprises sensors for capturing data describing environment of the robot (“Figure 1 shows a simplified block diagram of the collection system for training data for DAVE-2. Three cameras are mounted behind the windshield of the data-acquisition car. Time-stamped video from the cameras is captured simultaneously with the steering angle applied by the human driver.” [pg. 2, § 2 Overview of the DAVE-2 System, ¶1; Cameras are considered to be sensors]), and wherein the machine learning model receives as input, sensor data captured by the sensors of the robot and predicts information describing one or more objects in the environment of the robot (“Images for two specific off-center shifts can be obtained from the left and the right camera. Additional shifts between the cameras and all rotations are simulated by viewpoint transformation of the image from the nearest camera. Precise viewpoint transformation requires 3D scene knowledge which we don’t have. We therefore approximate the transformation by assuming all points below the horizon are on flat ground and all points above the horizon are infinitely far away. This works fine for flat terrain but it introduces distortions for objects that stick above the ground, such as cars, poles, trees, and buildings. Fortunately these distortions don’t pose a big problem for network training. The steering label for transformed images is adjusted to one that would steer the vehicle back to the desired location and orientation in two seconds… A block diagram of our training system is shown in Figure 2. Images are fed into a CNN which then computes a proposed steering command.” [pg. 3, ¶1-2]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Regarding claim 12, Doshi-Velez/Lopes teaches The non-transitory computer readable storage medium of claim 8, however fails to explicitly teach wherein the agent represents a self-driving vehicle and the environment represents traffic through which the self-driving vehicle is moving.
Bojarski teaches wherein the agent represents a self-driving vehicle (“We estimate what percentage of the time the network could drive the car (autonomy). The metric is determined by counting simulated human interventions (see Section 6). These interventions occur when the simulated vehicle departs from the center line by more than one meter. We assume that in real life an actual intervention would require a total of six seconds: this is the time required for a human to retake control of the vehicle, re-center it, and then restart the self-steering mode” [pg. 6, 7.1 Simulation Tests, ¶1]) and the environment represents traffic through which the self-driving vehicle is moving (“We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads.” [Abstract]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Regarding claim 13, Doshi-Velez/Lopes/Bojarski teaches The non-transitory computer readable storage medium of claim 12, where Bojarski teaches wherein the self-driving vehicle has one or more sensors mounted on the self-driving vehicle (“Figure 1 shows a simplified block diagram of the collection system for training data for DAVE-2. Three cameras are mounted behind the windshield of the data-acquisition car. Time-stamped video from the cameras is captured simultaneously with the steering angle applied by the human driver.” [pg. 2, § 2 Overview of the DAVE-2 System, ¶1]), and wherein the machine learning model receives as input, sensor data captured by the sensors of the self-driving vehicle and predicts information describing one or more entities in the environment through which the self- driving vehicle is driving (“Images for two specific off-center shifts can be obtained from the left and the right camera. Additional shifts between the cameras and all rotations are simulated by viewpoint transformation of the image from the nearest camera. Precise viewpoint transformation requires 3D scene knowledge which we don’t have. We therefore approximate the transformation by assuming all points below the horizon are on flat ground and all points above the horizon are infinitely far away. This works fine for flat terrain but it introduces distortions for objects that stick above the ground, such as cars, poles, trees, and buildings. Fortunately these distortions don’t pose a big problem for network training. The steering label for transformed images is adjusted to one that would steer the vehicle back to the desired location and orientation in two seconds… A block diagram of our training system is shown in Figure 2. Images are fed into a CNN which then computes a proposed steering command.” [pg. 3, ¶1-2]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Regarding claim 17, Doshi-Velez/Lopes teaches The computer implemented method of claim 16, however fails to explicitly teach wherein the robot comprises sensors for capturing data describing environment of the robot, and wherein the machine learning model receives as input, sensor data captured by the sensors of the robot and predicts information describing one or more objects in the environment of the robot.
Bojarski teaches wherein the robot comprises sensors for capturing data describing environment of the robot (“Figure 1 shows a simplified block diagram of the collection system for training data for DAVE-2. Three cameras are mounted behind the windshield of the data-acquisition car. Time-stamped video from the cameras is captured simultaneously with the steering angle applied by the human driver.” [pg. 2, § 2 Overview of the DAVE-2 System, ¶1; Cameras are considered to be sensors]), and wherein the machine learning model receives as input, sensor data captured by the sensors of the robot and predicts information describing one or more objects in the environment of the robot (“Images for two specific off-center shifts can be obtained from the left and the right camera. Additional shifts between the cameras and all rotations are simulated by viewpoint transformation of the image from the nearest camera. Precise viewpoint transformation requires 3D scene knowledge which we don’t have. We therefore approximate the transformation by assuming all points below the horizon are on flat ground and all points above the horizon are infinitely far away. This works fine for flat terrain but it introduces distortions for objects that stick above the ground, such as cars, poles, trees, and buildings. Fortunately these distortions don’t pose a big problem for network training. The steering label for transformed images is adjusted to one that would steer the vehicle back to the desired location and orientation in two seconds… A block diagram of our training system is shown in Figure 2. Images are fed into a CNN which then computes a proposed steering command.” [pg. 3, ¶1-2]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Regarding claim 18, Doshi-Velez/Lopes teaches The computer implemented method of claim 14, however fails to explicitly teach wherein the agent represents a self-driving vehicle and the environment represents traffic through which the self-driving vehicle is moving.
Bojarski teaches wherein the agent represents a self-driving vehicle (“We estimate what percentage of the time the network could drive the car (autonomy). The metric is determined by counting simulated human interventions (see Section 6). These interventions occur when the simulated vehicle departs from the center line by more than one meter. We assume that in real life an actual intervention would require a total of six seconds: this is the time required for a human to retake control of the vehicle, re-center it, and then restart the self-steering mode” [pg. 6, 7.1 Simulation Tests, ¶1]) and the environment represents traffic through which the self-driving vehicle is moving (“We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads.” [Abstract]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Regarding claim 19, Doshi-Velez/Lopes/Bojarski teaches The computer implemented method of claim 18, where Bojarski teaches wherein the self-driving vehicle has one or more sensors mounted on the self-driving vehicle (“Figure 1 shows a simplified block diagram of the collection system for training data for DAVE-2. Three cameras are mounted behind the windshield of the data-acquisition car. Time-stamped video from the cameras is captured simultaneously with the steering angle applied by the human driver.” [pg. 2, § 2 Overview of the DAVE-2 System, ¶1]), and wherein the machine learning model receives as input, sensor data captured by the sensors of the self-driving vehicle and predicts information describing one or more entities in the environment through which the self- driving vehicle is driving (“Images for two specific off-center shifts can be obtained from the left and the right camera. Additional shifts between the cameras and all rotations are simulated by viewpoint transformation of the image from the nearest camera. Precise viewpoint transformation requires 3D scene knowledge which we don’t have. We therefore approximate the transformation by assuming all points below the horizon are on flat ground and all points above the horizon are infinitely far away. This works fine for flat terrain but it introduces distortions for objects that stick above the ground, such as cars, poles, trees, and buildings. Fortunately these distortions don’t pose a big problem for network training. The steering label for transformed images is adjusted to one that would steer the vehicle back to the desired location and orientation in two seconds… A block diagram of our training system is shown in Figure 2. Images are fed into a CNN which then computes a proposed steering command.” [pg. 3, ¶1-2]).
Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Bojarski discloses an end to end learning system for self-driving cars. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by further implementing the agent to operate a self-driving car as taught by Bojarski. Using reinforcement learning to train agents to drive cars is well-known in the art (See pg. 1 Doshi-Velez), thus one would have been motivated to make this modification in order to train an agent to drive a self-driving car by using a machine learning model. [Abstract, Bojarski]

Claims 7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Doshi-Velez in view of Lopes and further in view of Vidhate et al. ("Expertise Based Cooperative Reinforcement Learning Methods (ECRLM) for Dynamic Decision Making in Retail Shop Application", hereinafter “Vidhate”).

Regarding claim 7, Doshi-Velez/Lopes teaches The computer implemented method of claim 1, however fails to explicitly teach wherein the agent represents a pricing engine for setting pricing for a business and the environment represents the business
Vidhate teaches wherein the agent represents a pricing engine for setting pricing for a business (“State of the system become Input as (xi, ii). Actions: Assume set of possible actions i.e. action set for agent 1 is (that means Price of products in shop 1), A1 = Price p = {8 to 14} = {8.0; 9.0; 10.0; 10.5; 11.0; 11.5; 12.0; 12.5; 13.0; 13.5}. Set of possible actions i.e. action set for agent 2 is A2 = Price p = {5 to 9} = {5.0; 6.0; 7.0; 7.5; 8.0; 8.5; 9.0}. Set of possible actions i.e. action set for agent 3 is A3 = Price p = {10 to 13} = {10.0; 10.5; 11.0; 11.5; 12.0; 12.5; 13.0}. Output is the possible action taken i.e. price in this case. It is now the state-action pair system can be easily modeled using Q learning i.e. Q(s, a).” [pg. 355, § 4 Model Design, ¶2]) and the environment represents the business (“The retail store sells the household items and gain profit by that. Retailers are interested about their selling, their profit. By accepting certain steps, the portion that can reason break or decrease the revenue can be prohibited. The aim of predicting the sales business is to collect data from various shops and analyze it by machine learning algorithms.” [pg. 350, § Introduction, ¶1])
Doshi-Velez, Lopes, and Vidhate are all in the same field of endeavor of using Markov Decision Processes to train an agent, thus are analogous arts. Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Vidhate discloses knowledge agents using reinforcement learning for dynamic decision making in retail applications. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by implementing the applying their Markov decision processes to a business/retail application as taught by Vidhate. One would have been motivated to make this modification in order to use machine learning algorithms to analyze sales to increase a retailer’s sales profits. [Abstract, Vidhate]

Regarding claim 20, Doshi-Velez/Lopes teaches The computer implemented method of claim 1(will be interpreted as dependent on claim 14), however fails to explicitly teach wherein the agent represents a pricing engine for setting pricing for a business and the environment represents the business
Vidhate teaches wherein the agent represents a pricing engine for setting pricing for a business (“State of the system become Input as (xi, ii). Actions: Assume set of possible actions i.e. action set for agent 1 is (that means Price of products in shop 1), A1 = Price p = {8 to 14} = {8.0; 9.0; 10.0; 10.5; 11.0; 11.5; 12.0; 12.5; 13.0; 13.5}. Set of possible actions i.e. action set for agent 2 is A2 = Price p = {5 to 9} = {5.0; 6.0; 7.0; 7.5; 8.0; 8.5; 9.0}. Set of possible actions i.e. action set for agent 3 is A3 = Price p = {10 to 13} = {10.0; 10.5; 11.0; 11.5; 12.0; 12.5; 13.0}. Output is the possible action taken i.e. price in this case. It is now the state-action pair system can be easily modeled using Q learning i.e. Q(s, a).” [pg. 355, § 4 Model Design, ¶2]) and the environment represents the business (“The retail store sells the household items and gain profit by that. Retailers are interested about their selling, their profit. By accepting certain steps, the portion that can reason break or decrease the revenue can be prohibited. The aim of predicting the sales business is to collect data from various shops and analyze it by machine learning algorithms.” [pg. 350, § Introduction, ¶1])
Doshi-Velez, Lopes, and Vidhate are all in the same field of endeavor of using Markov Decision Processes to train an agent, thus are analogous arts. Doshi-Velez discloses an approach for discovering latent task parameterizations. Lopes discloses an active learning method for reward estimation using MDPs. Vidhate discloses knowledge agents using reinforcement learning for dynamic decision making in retail applications. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Doshi-Velez’s/Lopes’ teachings by implementing the applying their Markov decision processes to a business/retail application as taught by Vidhate. One would have been motivated to make this modification in order to use machine learning algorithms to analyze sales to increase a retailer’s sales profits. [Abstract, Vidhate]

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Konidaris et al. (“Hidden Parameter Markov Decision Processes: An Emerging Paradigm for Modeling Families of Related Tasks”) discloses a HiP-MDPs to improve a robot’s performance on future related tasks. (See Abstract)
Killian et al. (“Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes”) discloses learning related tasks using latent embeddings. 
Nagabandi et al. (“LEARNING TO ADAPT IN DYNAMIC, REAL-WORLD ENVIRONMENTS THROUGH META-REINFORCEMENT LEARNING”) discloses meta reinforcement learning and training the agent to adapt as it interacts with the environment.
Finn et al. (“Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”) discloses meta learning to solve new tasks using a small number of training samples.
Yao et al. (“Direct Policy Transfer via Hidden Parameter Markov Decision Processes”) discloses transfer learning with HiP-MDPs.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        
/BRIAN M SMITH/Primary Examiner, Art Unit 2122