DETAILED ACTION
This action is in reply to an application filed December 4th, 2020, and a preliminary amendment to the specification filed April 27th, 2021. Claims 1-24 are currently pending.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on January 26th, 2022 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to because Fig. 1B and Fig. 1C have text that cannot be easily read due to said text being overlaid other text and parts of the figures. It is recommended that applicant amend Fig. 1B and 1C to have all the intended text to be fully legible and not overlaying anything else.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are requested in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The abstract of the disclosure is objected to because it is over 150 words.  Correction is required.  See MPEP § 608.01(b).
The disclosure is objected to because of the following informalities: page 2 Para 005 line 17 "COntrol" has an improper capitalization on the first o and should read "Control".  
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: "the model-learning program" is only refenced in the Summary section and it is not recited in the Detailed Description but the “module learning program” is recited in the Detailed Description, “the model-learning module learns the behavior of the real system with a speed-integration model”, and "the model-learning module generates and provides offline learned states to the policy learning module".
Appropriate correction is requested.

Claim Objections
Claims 1, 2, and 7 are objected to because of the following informalities:
Claim 1 line 10: "the model learning program" lacks proper antecedent basis in the claim. Applicant is advised to amend the claim to read "a model learning program".
Claim 1 line 12: "the offline states" lacks proper antecedent basis in the claim. Applicant is advised to amend the claim to read "offline states".
Claims 2 and 7: "the real system" lacks proper antecedent basis in the claim. Applicant is advised to amend the claims to read "a real system" or "the system" or amend claim 1 to recite "a real system".
Claim 7: “the policy unit” lacks proper antecedent basis in the claim. Applicant is advised to amend the claim to recite “a policy unit”.
Claim 8: “the policy optimization unit” lacks proper antecedent basis in the claim. Applicant is advised to amend the claim to recite “a policy optimization unit” or to amend the claim to depend on claim 7.
Appropriate correction is requested.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
The model learning module, policy learning module, model-learning module, in claims 1-24.
The policy unit in claims 7, 10, and 17.
The policy optimization unit in claims 7 and 8.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 1, 2, 3, 13, and 21 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 is indefinite as it is unclear if "model learning module" and "module-learning module" are the same or two different things. Note that this issue is, by extension, also renders all claims dependent on 1 that recites "model learning module" or "model-learning module" are indefinite. For sake of finding prior art, the examiner has interpreted that these two are not the same, rather that the "module-learning module" is a part of the "module learning module". It is recommended that applicant make this matter clear and to review all recitations of either "model learning module" and "model-learning module" in claims dependent on claim 1 recite the intended module.
Claim 1 is also indefinite as in lines 14-15 the limitation "generate particle online estimates based on the particle measurements and possibly prior particle online estimates" is unclear as the term "possibly" leaves it ambiguous if this is one invention or two. Applicant is advised to amend the limitation to recite either "generate particle online estimates based on the particle measurements and prior particle online estimates" or amend claim 1 to recite "generate particle online estimates based on the particle measurements" and to add a dependent claim that recites something like "[t]he controller of claim 1, wherein the policy learning module generates the particle online estimates based on the particle measurements and prior particle online estimates".
Claim 1 recites the limitation "the offline state estimator estimates and provides the offline states to the model- learning program" in lines 12-13 unless applicant meant "the model learning program" or "model-learning module" in which case the examiner cannot determine which one was intended. For the purpose of prior art rejection below the examiner has interpreted this was meant to recite "the offline state estimator estimates and provides the offline states to the model- learning module".
Claim 2 is inconsistent with the specification and is indefinite as it recites the limitation "the model-learning module learns the behavior of the real system with a speed-integration model" in lines 1-2. As, based on page 32 Para 00080 and 00081 of applicant’s specification, the speed-integration module is part of the model learning program, for the purpose of the prior art rejection below the examiner has interpreted the claim to recite "the model learning program learns the behavior of the real system with a speed-integration model". 
Claims 3, 13, and 21 recites the limitation "the model-learning module generates and provides offline learned states to the policy learning module" which lacks proper antecedent basis with the spec. Furthermore page 21 Para 00057 recites "In this case, the steps may include offline-modeling to generate offline-learning states... using the model learning module 1300B. The steps further perform providing the offline states to the policy learning module 1400B...". Applicant is advised to amend the claim to recite "the model learning module generates and provides offline states to the policy learning module" or some equivalent. It is also recommended for applicant to amend the specification to resolve any inconsistencies in the specification without adding new matter. For the purpose of prior art rejection below, the examiner has interpreted claims 3, 13, and 21 to recite "the model learning module generates and provides offline states to the policy learning module".
Applicant is advised to amend the claims noted above as well as to review the rest of the claims that recite “module learning module” or “module-learning module” to recite the intended limitations or to amend the specification to bring the relevant passages in line with the claimed invention. It is also recommended for applicant to amend the specification to resolve any inconsistencies in the specification without adding new matter.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-10, 13-17, and 20-24 are rejected under 35 U.S.C. 103 as being unpatentable over Wright et al. (US Pub. No. 20180012137 A1), herein after Wright, as evidenced by Sutton et al. (Sutton, Richard S., and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.), herein after Sutton, and Kormushev et al. (Kormushev, P., and Caldwell, D. G. Direct policy search reinforcement learning based on particle filtering. In Proceedings of the 10th European Workshop on Reinforcement Learning (2012).), herein after Kormushev, both having been incorporated by reference.
Regarding claim 1, Wright teaches [a] controller for controlling a system that includes a policy configured to control the system, comprising (Wright: Para. 0307; "The method may further update an automated controller for controlling the system with the updated estimate of the optimal control policy, wherein the automated controller operates according to the updated estimate of the optimal control policy to automatically alter at least one of a state of the system and the environment of the system."): an interface connected to the system, the interface being configured to acquire an action state and a measurement state via sensors measuring the system (Wright: Para. 0419 and 0409; "In some embodiments, the input/output devices may include one or more storage devices. The processor may access, read from, write to, store in, erase, modify, and/or the like a storage device in accordance with program instructions executed by the processor. A storage device may facilitate accessing, storing, retrieving, modifying, deleting, and/or the like data by the processor... A storage device may be connected to the system bus via an interface such as PCI, PCI Express, USB... and/or the like." "The sensor inputs may be provided directly, through a preprocessing system, or as a processed output of another system."); a memory to store computer-executable program modules including a model learning module and a policy learning module; a processor configured to perform steps of the program modules (Wright: Para.  0412 and 0415; "The present technology is performed using automated data processors, which may be purpose-built and optimized for the algorithm employed, or general purpose hardware." "The processor may be connected to system memory, e.g., DDR2, DDR3."), the steps include: …modeling to generate …learning states based on the action state and measurement state using the model learning program, wherein the model learning module includes [a]... state estimator and a model-learning module, wherein the… state estimator estimates and provides the… states to the model-learning program, wherein the policy learning module includes a model of the… state estimator configured to generate… estimates based on the… measurements and possibly prior… estimates (Wright: Para. 0325 and 0327; "To meet the challenges to using trajectory data in complex returns, Complex return Fitted Q-Iteration (CFQI) is provided, a generalization of the FQI framework which allows for any general return based estimate, enabling the seamless integration of complex returns within AVI. Two distinct methods for utilizing complex returns within the CFQI framework are provided. The first method is similar to the idea of Q(λ) [Watkins 1992] and uses truncated portions of trajectories, that are consistent with the approximation of Q* , to calculate complex return estimates without introducing off-policy bias. The second method is more a novel approach that makes use of the inherent negative bias of complex returns, due to the value iteration context, as a lower bound for value estimates." "It should be noted that this definition of the n-step returns differs from the standard on-policy definition of the n-step returns because of its use of the max operation. In principle each of the n-step returns can be used as approximations of Q* (s.sub.t, a.sub.t). Individually each estimator has its own distinct biases and variances. However, when combined, through averaging they can produce an estimator with lower variance than any one individual return [Dietterich 2000]. It is this idea that motivated the development of complex returns [Sutton 1998]."); providing the… states to the policy learning module to generate policy parameters; and updating the policy of the system to operate the system based on the policy parameters (Wright: Para. 0305; "...(a) estimating a long term value for operation at a current state of the environment over a series of predicted future environmental states; (b) using a complex return of the data set to determine a bound to improve the estimated long term value; and (c) producing an updated estimate of an optimal control policy dependent on the improved estimate of the long term value. The updated estimate of an optimal control policy may be used to control a controlled system.").
Wright does not explicitly teach the steps include: offline-modeling to generate offline-learning states based on the action state and measurement state using the model learning program, wherein the model learning module includes an offline state estimator and a model-learning module, wherein the offline state estimator estimates and provides the offline states to the model-learning program, wherein the policy learning module includes a model of the online state estimator configured to generate particle online estimates based on the particle measurements and possibly prior particle online estimates, however these features are well known in the art as evidenced by Sutton (Sutton: Chapter III Section 7.1 and Chapter II Section 5; "As with the n-step TD methods, the updating can be either on-line or off-line. The approach that we have been taking so far is what we call the theoretical, or forward, view of a learning algorithm." "Monte Carlo methods require only experience—sample sequences of states, actions, and rewards from on-line or simulated interaction with an environment. Learning from on-line experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. Learning from simulated experience is also powerful. Although a model is required, the model need only generate sample transitions, not the complete probability distributions of all possible transitions that are required by dynamic programming (DP) methods.") and Kormushev (Kormushev: Page 3 Section 3; "Particle filters, also known as Sequential Monte Carlo methods Doucet et al. (2000, 2001), originally come from statistics and are similar to importance sampling methods. Particle filters are able to approximate any probability density function, and can be viewed as a ‘sequential analogue’ of Markov chain Monte Carlo (MCMC) batch methods."). 
It would have been obvious to one ordinarily skilled in the art before the filling of the application to include in Wright online and offline estimations of the states of the system, as taught by Sutton, and to utilize particle filters during the processing of the data, as taught by Kormushev, for the benefit of creating an accurate yet efficient reinforced learning model learning system.
Regarding claim 2, Wright remains as applied as in claim 1, however Wright does not explicitly teach [t]he controller of claim 1, wherein the model-learning module learns the behavior of the real system with a speed-integration model however this feature is well known in the art as evidenced by Sutton (Sutton: Chapter I Section 3.8; "Having Q* makes choosing optimal actions still easier. With Q*, the agent does not even have to do a one-step-ahead search: for any state s, it can simply find any action that maximizes Q*(s, a). The action-value function effectively caches the results of all one-step-ahead searches. It provides the optimal expected long-term return as a value that is locally and immediately available for each state–action pair. Hence, at the cost of representing a function of state–action pairs, instead of just of states, the optimal action-value function allows optimal actions to be selected without having to know anything about possible successor states and their values, that is, without having to know anything about the environment’s dynamics.").
Regarding claim 3, Wright remains as applied as in claim 1, however Wright does not explicitly teach [t]he controller of claim 1, wherein the model-learning module generates and provides offline learned states to the policy learning module however this feature is well known in the art as evidenced by Sutton (Sutton: Chapter III Sections 7 and 7.1; "The other way to view eligibility traces is more mechanistic. From this perspective, an eligibility trace is a temporary record of the occurrence of an event, such as the visiting of a state or the taking of an action. The trace marks the memory parameters associated with the event as eligible for undergoing learning changes. When a TD error occurs, only the eligible states or actions are assigned credit or blame for the error. Thus, eligibility traces help bridge the gap between events and training information. Like TD methods themselves, eligibility traces are a basic mechanism for temporal credit assignment." "The methods that use n-step backups are still TD methods because they still change an earlier estimate based on how it differs from a later estimate. Now the later estimate is not one step later, but n steps later. Methods in which the temporal difference extends over n steps are called n-step TD methods... Because of the error reduction property, one can show formally that on-line and off-line TD prediction methods using n-step backups converge to the correct predictions under appropriate technical conditions. The n-step TD methods thus form a family of valid methods, with one-step TD methods and Monte Carlo methods as extreme members.").
Regarding claim 4, Wright remains as applied as in claim 1 and goes on to further teach [t]he controller of claim 1, wherein the policy learning module includes a policy optimization program, wherein the policy optimization program performs policy optimization based offline states from the model learning module and generates the policy parameters (Wright: Para. 0361 and 0014; " A salient feature of the n-step returns is that their variances increase with n due to the stochastic nature of the Markov Decision Process (MDP). The function approximation variance, which can be a substantial component of the overall variance, is often considered to be roughly the same across different samples. The bias of n-step returns is a more complex issue. Among various types of biases (e.g., off-policy bias, function approximation bias, sampling bias, etc.), the behavior of the off-policy bias is unique. When the target policy is an optimal policy, like in the AVI context, the off-policy bias introduced by a suboptimal trajectory is strictly negative, and its magnitude increases as more suboptimal actions are followed towards the end of the trajectory." "MDPs can be solved by linear programming or dynamic programming.").
Regarding claim 5, Wright remains as applied as in claim 4 and goes on to further teach [t]he controller of claim 4, wherein the policy learning module includes a system model, wherein the system model generates… states based on previous… states and the action state (Wright: Para. 0307; "In either case, the purpose of the updated control policy is to control a system, and typically the controlled system is a physical system, i.e., governed by laws of physics and thermodynamics. Likewise, the environment is typically a physical environment. Some or all of these laws may be modelled in analyzing the data or implementing the control policy. Another possibility is that the controlled system is a computational system governed by rules of operation, but the relevant rules may not be rules of physics or thermodynamics. The computational system in this case is real, and the purpose of the controller may be to modify its operation without replacing its core components or reprogramming it.").
Wright does not explicitly teach wherein the system model generates particle states based on previous particle states and the action state however this feature is well known in the art as evidenced by Kormushev (Kormushev: Page 4 Section 5; "Using the reformulation of RL from the previous section, here we propose a novel RL algorithm based on Particle Filters (RLPF). The main idea of RLPF is to use particle filtering as a method for choosing the sampling points, i.e. for calculating a parameter vector θ for each trial. ").
Regarding claim 6, Wright remains as applied as in claim 5 and goes on to further teach [t]he controller of claim 5, wherein the policy learning module includes a sensor model configured to generate… measurements based on the… states (Wright: Para. 0409; "In a physical control system, various types of sensors may be employed, such as position, velocity, acceleration, angle, angular velocity, vibration, impulse, gyroscopic, compass, magnetometer, SQUID, SQIF, pressure, temperature, volume, chemical characteristics, mass, illumination, light intensity, biosensors, micro electromechanical system (MEMS) sensors etc. The sensor inputs may be provided directly, through a preprocessing system, or as a processed output of another system.").
Wright does not explicitly teach generate particle measurements based on the particle states however this feature is well known in the art and would have been obvious to include as evidenced by Kormushev (Kormushev: Page 4 Section 5; "Using the reformulation of RL from the previous section, here we propose a novel RL algorithm based on Particle Filters (RLPF). The main idea of RLPF is to use particle filtering as a method for choosing the sampling points, i.e. for calculating a parameter vector θ for each trial. ").
Regarding claim 7, Wright remains as applied as in claim 1 and goes on to further teach [t]he controller of claim 1, wherein the policy learning module includes a policy optimization unit configured to generate the policy parameters based on the particle measurements and the particle online estimates, wherein the policy optimization unit provides the policy parameters to update the policy unit of the real system (Wright: Para. 0420; "In some implementations, the operating environment component may include an operating system subcomponent. The operating system subcomponent may provide an abstraction layer that facilitates the use of, communication among, common services for, interaction with, security of, and/or the like of various System elements, components, data stores, and/or the like. In some embodiments, the operating system subcomponent may facilitate execution of program instructions by the processor by providing process management capabilities.").
Regarding claim 8, Wright remains as applied as in claim 1 goes on to further teach [t]he controller of claim 1, wherein the policy optimization unit includes a Dropout method and an early stopping strategy configured to improve the policy parameters generated by a policy optimization unit (Wright: Para. 0340; "One way to handle the off-policy bias of the complex returns is to attempt to avoid it by truncating the trajectories where they appear to go off-policy. This idea is borrowed from the Q(λ) [Watkins 1989] approach. In this approach the current {circumflex over (Q)} provides an approximation of the optimal policy, {circumflex over (π)}* that can be used to approximate when a trajectory takes an off-policy sub-optimal action. During the process of calculating the complex return estimates, samples in a trajectory after the first off-policy action are not considered. Assuming {circumflex over (π)}* converges to a close approximation of π* , a strong assumption, this approach should not introduce off-policy bias and can take advantage of portions of trajectories that follow the optimal policy to reduce variance and overall error.").
Regarding claim 9, Wright remains as applied as in claim 1, however Wright does not explicitly teach [t]he controller of claim 1, wherein the offline state estimator is formed from acausal filters, Kalman smoother or central difference velocity approximators however this feature is well known in the art as evidenced by Sutton and would have been obvious to include for the known benefit (Sutton: Chapter III Section 7.1; "In the previous section we presented the forward or theoretical view of the tabular TD(λ) algorithm as a way of mixing backups that parametrically shifts from a TD method to a Monte Carlo method. In this section we instead define TD(λ) mechanistically, and in the next section we show that this mechanism correctly implements the forward view. The mechanistic, or backward, view of TD(λ) is useful because it is simple conceptually and computationally. In particular, the forward view itself is not directly implementable because it is acausal, using at each step knowledge of what will happen many steps later.").

Regarding claim 10, Wright  remains as applied as in claim 1 and goes on to further teach [a]vehicle control system for controlling motions of a vehicle, comprising: a controller of claim 1, wherein the controller is connected to a motion controller of the vehicle and vehicle motion sensors that measure the motions of the vehicle, wherein the control system generates policy parameters based on measurement data of the motion, wherein the control system provides the policy parameters to a motion controller of the vehicle to update a policy unit of the motion controller (Wright: Para. 0408; "Thus have been described improvements to reinforcement learning technology, which may be applied to, by way of example, and without limitation, robot control (such as bipedal or quadrupedal walking or running, navigation, grasping, and other control skills); vehicle control (autonomous vehicle control, steering control...").
It would have been obvious to one ordinarily skilled in the art before the effective filling date of the applicant’s claimed invention to modify the system from Wright to apply it to a vehicle for motion control of the vehicle as part of vehicle control for the benefit of improving autonomous vehicle driving control via reinforced learning.
Regarding claim 13, Wright remains as applied as in claim 10 and goes on to further teach [t]he vehicle control system of claim 10, wherein the model-learning module is configured to generate and provide… learned states to the policy learning module, wherein the policy learning module generates policy parameters (Wright: Para. 0325 and 0327; "To meet the challenges to using trajectory data in complex returns, Complex return Fitted Q-Iteration (CFQI) is provided, a generalization of the FQI framework which allows for any general return based estimate, enabling the seamless integration of complex returns within AVI. Two distinct methods for utilizing complex returns within the CFQI framework are provided. The first method is similar to the idea of Q(λ) [Watkins 1992] and uses truncated portions of trajectories, that are consistent with the approximation of Q* , to calculate complex return estimates without introducing off-policy bias. The second method is more a novel approach that makes use of the inherent negative bias of complex returns, due to the value iteration context, as a lower bound for value estimates." "It should be noted that this definition of the n-step returns differs from the standard on-policy definition of the n-step returns because of its use of the max operation. In principle each of the n-step returns can be used as approximations of Q* (s.sub.t, a.sub.t). Individually each estimator has its own distinct biases and variances. However, when combined, through averaging they can produce an estimator with lower variance than any one individual return [Dietterich 2000]. It is this idea that motivated the development of complex returns [Sutton 1998].").
Wright does not explicitly teach the model-learning module is configured to generate and provide offline learned states to the policy learning module however this feature is well known in the art and would have been obvious to include as evidenced by Sutton (Sutton: Chapter III Section 7.1 and Chapter II Section 5; "As with the n-step TD methods, the updating can be either on-line or off-line. The approach that we have been taking so far is what we call the theoretical, or forward, view of a learning algorithm." "Monte Carlo methods require only experience—sample sequences of states, actions, and rewards from on-line or simulated interaction with an environment. Learning from on-line experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. Learning from simulated experience is also powerful. Although a model is required, the model need only generate sample transitions, not the complete probability distributions of all possible transitions that are required by dynamic programming (DP) methods.").
Regarding claim 14, Wright remains as applied as in claim 13, however Wright does not explicitly teach [t]he vehicle control system of claim 13, wherein the policy learning module includes an online state estimator, wherein the online state estimator performs policy optimization based the offline learned states of the model learning module however this feature is well known in the art and would have been obvious to include as evidenced by Sutton (Sutton: Chapter III Section 7.1 and Chapter II Section 5; "As with the n-step TD methods, the updating can be either on-line or off-line. The approach that we have been taking so far is what we call the theoretical, or forward, view of a learning algorithm." "Monte Carlo methods require only experience—sample sequences of states, actions, and rewards from on-line or simulated interaction with an environment. Learning from on-line experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. Learning from simulated experience is also powerful. Although a model is required, the model need only generate sample transitions, not the complete probability distributions of all possible transitions that are required by dynamic programming (DP) methods.").
Regarding claim 15, Wright remains as applied as in claim 13 and goes on to further teach [t]he vehicle control system of claim 13, wherein the policy learning module includes a system model and a sensor model (Wright: Para. 0409 and 0420; "In a physical control system, various types of sensors may be employed, such as position, velocity, acceleration, angle, angular velocity, vibration, impulse, gyroscopic, compass, magnetometer, SQUID, SQIF, pressure, temperature, volume, chemical characteristics, mass, illumination, light intensity, biosensors, micro electromechanical system (MEMS) sensors etc. The sensor inputs may be provided directly, through a preprocessing system, or as a processed output of another system." "In some implementations, the operating environment component may include an operating system subcomponent. The operating system subcomponent may provide an abstraction layer that facilitates the use of, communication among, common services for, interaction with, security of, and/or the like of various System elements, components, data stores, and/or the like. In some embodiments, the operating system subcomponent may facilitate execution of program instructions by the processor by providing process management capabilities.").
Wright does not explicitly teach wherein the system model generates particle measurements based on the particle states however this feature is well known in the art and would have been obvious to include as evidenced by Kormushev (Kormushev: Pages 4-5 Section 5; "Using the reformulation of RL from the previous section, here we propose a novel RL algorithm based on Particle Filters (RLPF). The main idea of RLPF is to use particle filtering as a method for choosing the sampling points, i.e. for calculating a parameter vector θ for each trial.  We define a policy particle pi to be the tuple pi = (hθi, τi, Ri, wi), where the particle pi represents the outcome of a single trial τi performed by executing an RL policy π(θi), where θi is a vector of policy parameter values modulating the behaviour of the RL policy π. The policy particle also stores the value of the reward function evaluated for this trial Ri = R(τi(π(θi))). The variable τi contains task-specific information recorded during the trial depending on the nature of the task. The information in τi is used by the reward function to perform its evaluation.").
Regarding claim 16, Wright remains as applied as in claim 15 and goes on to further teach [t]he vehicle control system of claim 15, wherein the policy learning module includes [a] state estimator model configured to generate… estimates based on the… measurements and a prior… measurement (Wright: Para. 0325 and 0327; "To meet the challenges to using trajectory data in complex returns, Complex return Fitted Q-Iteration (CFQI) is provided, a generalization of the FQI framework which allows for any general return based estimate, enabling the seamless integration of complex returns within AVI. Two distinct methods for utilizing complex returns within the CFQI framework are provided. The first method is similar to the idea of Q(λ) [Watkins 1992] and uses truncated portions of trajectories, that are consistent with the approximation of Q* , to calculate complex return estimates without introducing off-policy bias. The second method is more a novel approach that makes use of the inherent negative bias of complex returns, due to the value iteration context, as a lower bound for value estimates." "It should be noted that this definition of the n-step returns differs from the standard on-policy definition of the n-step returns because of its use of the max operation. In principle each of the n-step returns can be used as approximations of Q* (s.sub.t, a.sub.t). Individually each estimator has its own distinct biases and variances. However, when combined, through averaging they can produce an estimator with lower variance than any one individual return [Dietterich 2000]. It is this idea that motivated the development of complex returns [Sutton 1998].")
Wright does not explicitly teach an online state estimator model configured to generate particle online estimates based on the particle measurements and a prior particle online measurement however these features are well known in the art as evidenced by Sutton (Sutton: Chapter III Section 7.1 and Chapter II Section 5; "As with the n-step TD methods, the updating can be either on-line or off-line. The approach that we have been taking so far is what we call the theoretical, or forward, view of a learning algorithm." "Monte Carlo methods require only experience—sample sequences of states, actions, and rewards from on-line or simulated interaction with an environment. Learning from on-line experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. Learning from simulated experience is also powerful. Although a model is required, the model need only generate sample transitions, not the complete probability distributions of all possible transitions that are required by dynamic programming (DP) methods.") and Kormushev (Kormushev: Page 3 Section 3 and Pages 5-6 Section 5; "Particle filters, also known as Sequential Monte Carlo methods Doucet et al. (2000, 2001), originally come from statistics and are similar to importance sampling methods. Particle filters are able to approximate any probability density function, and can be viewed as a ‘sequential analogue’ of Markov chain Monte Carlo (MCMC) batch methods." "The pseudo-code for RLPF is given in Algorithm 1. A detailed description of the algorithm follows... In lines 15-18, a particle is selected based on the inverse density function mechanism, described earlier in this section. In lines 19-20, a new particle is selected by adding exponentially decayed noise to the previously selected particle.").

Regarding claim 17, Wright remains as applied as in claim 1 and goes on to further teach [a] robotic control system for controlling motions of a robot, comprising: a controller of claim 1, wherein the controller is connected to an actuator controller of the robot and sensors that is configured to measure states of the robot, wherein the control system generates policy parameters based on measurement data of the sensors, wherein the control system provides the policy parameters to the actuator controller of the robot to update a policy unit of the actuator controller (Wright: Para. 0408; "Thus have been described improvements to reinforcement learning technology, which may be applied to, by way of example, and without limitation, robot control (such as bipedal or quadrupedal walking or running, navigation, grasping, and other control skills); vehicle control (autonomous vehicle control, steering control...").
It would have been obvious to one ordinarily skilled in the art before the effective filling date of the applicant’s claimed invention to modify the system from Wright to apply it to a robot for motion control of the robot as part of robot control for the benefit of applying reinforced learning to the motion controls of a robot.
Regarding claim 20, Wright remains as applied as in claim 17 however Wright does not explicitly teach [t]he robotic control system of claim 17, wherein a model learning module in the controller includes an offline state estimator and a model-learning module, wherein the offline state estimator estimates and provides offline states to the model-learning module however this feature is well known in the art and would have been obvious to include as evidenced by Sutton (Sutton: Chapter III Sections 7 and 7.1; "The other way to view eligibility traces is more mechanistic. From this perspective, an eligibility trace is a temporary record of the occurrence of an event, such as the visiting of a state or the taking of an action. The trace marks the memory parameters associated with the event as eligible for undergoing learning changes. When a TD error occurs, only the eligible states or actions are assigned credit or blame for the error. Thus, eligibility traces help bridge the gap between events and training information. Like TD methods themselves, eligibility traces are a basic mechanism for temporal credit assignment." "The methods that use n-step backups are still TD methods because they still change an earlier estimate based on how it differs from a later estimate. Now the later estimate is not one step later, but n steps later. Methods in which the temporal difference extends over n steps are called n-step TD methods... Because of the error reduction property, one can show formally that on-line and off-line TD prediction methods using n-step backups converge to the correct predictions under appropriate technical conditions. The n-step TD methods thus form a family of valid methods, with one-step TD methods and Monte Carlo methods as extreme members.").
Regarding claim 21, Wright remains as applied as in claim 20 and goes on to further teach [t]he robotic control system of claim 20, wherein the model-learning module is configured to generate and provide… learned states to the policy learning module, wherein the policy learning module generates policy parameters (Wright: Para. 0325 and 0327; "To meet the challenges to using trajectory data in complex returns, Complex return Fitted Q-Iteration (CFQI) is provided, a generalization of the FQI framework which allows for any general return based estimate, enabling the seamless integration of complex returns within AVI. Two distinct methods for utilizing complex returns within the CFQI framework are provided. The first method is similar to the idea of Q(λ) [Watkins 1992] and uses truncated portions of trajectories, that are consistent with the approximation of Q* , to calculate complex return estimates without introducing off-policy bias. The second method is more a novel approach that makes use of the inherent negative bias of complex returns, due to the value iteration context, as a lower bound for value estimates." "It should be noted that this definition of the n-step returns differs from the standard on-policy definition of the n-step returns because of its use of the max operation. In principle each of the n-step returns can be used as approximations of Q* (s.sub.t, a.sub.t). Individually each estimator has its own distinct biases and variances. However, when combined, through averaging they can produce an estimator with lower variance than any one individual return [Dietterich 2000]. It is this idea that motivated the development of complex returns [Sutton 1998].").
Wright does not explicitly teach the model-learning module is configured to generate and provide offline learned states to the policy learning module however this feature is well known in the art and would have been obvious to include as evidenced by Sutton (Sutton: Chapter III Section 7.1 and Chapter II Section 5; "As with the n-step TD methods, the updating can be either on-line or off-line. The approach that we have been taking so far is what we call the theoretical, or forward, view of a learning algorithm." "Monte Carlo methods require only experience—sample sequences of states, actions, and rewards from on-line or simulated interaction with an environment. Learning from on-line experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. Learning from simulated experience is also powerful. Although a model is required, the model need only generate sample transitions, not the complete probability distributions of all possible transitions that are required by dynamic programming (DP) methods.").
Regarding claim 21, Wright remains as applied as in claim 20 and goes on to further teach [t]he robotic control system of claim 21, wherein the policy learning module includes [a] state estimator, wherein the… state estimator performs policy optimization based the… learned states of the model learning module and generates the policy parameters (Wright: Para. 0325 and 0327; "To meet the challenges to using trajectory data in complex returns, Complex return Fitted Q-Iteration (CFQI) is provided, a generalization of the FQI framework which allows for any general return based estimate, enabling the seamless integration of complex returns within AVI. Two distinct methods for utilizing complex returns within the CFQI framework are provided. The first method is similar to the idea of Q(λ) [Watkins 1992] and uses truncated portions of trajectories, that are consistent with the approximation of Q* , to calculate complex return estimates without introducing off-policy bias. The second method is more a novel approach that makes use of the inherent negative bias of complex returns, due to the value iteration context, as a lower bound for value estimates." "It should be noted that this definition of the n-step returns differs from the standard on-policy definition of the n-step returns because of its use of the max operation. In principle each of the n-step returns can be used as approximations of Q* (s.sub.t, a.sub.t). Individually each estimator has its own distinct biases and variances. However, when combined, through averaging they can produce an estimator with lower variance than any one individual return [Dietterich 2000]. It is this idea that motivated the development of complex returns [Sutton 1998].").
Wright does not explicitly teach the policy learning module includes an online state estimator, wherein the online state estimator performs policy optimization based the offline learned states of the model learning module however this feature is well known in the art and would have been obvious to include as evidenced by Sutton (Sutton: Chapter III Section 7.1 and Chapter II Section 5; "As with the n-step TD methods, the updating can be either on-line or off-line. The approach that we have been taking so far is what we call the theoretical, or forward, view of a learning algorithm." "Monte Carlo methods require only experience—sample sequences of states, actions, and rewards from on-line or simulated interaction with an environment. Learning from on-line experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. Learning from simulated experience is also powerful. Although a model is required, the model need only generate sample transitions, not the complete probability distributions of all possible transitions that are required by dynamic programming (DP) methods.").
Regarding claim 23, Wright remains as applied as in claim 20 and goes on to further teach [t]he robotic control system of claim 20, wherein the policy learning module includes a system model and a sensor model, wherein the system model generates… measurements based on the… states (Wright: Para. 0409 and 0420; "In a physical control system, various types of sensors may be employed, such as position, velocity, acceleration, angle, angular velocity, vibration, impulse, gyroscopic, compass, magnetometer, SQUID, SQIF, pressure, temperature, volume, chemical characteristics, mass, illumination, light intensity, biosensors, micro electromechanical system (MEMS) sensors etc. The sensor inputs may be provided directly, through a preprocessing system, or as a processed output of another system." "In some implementations, the operating environment component may include an operating system subcomponent. The operating system subcomponent may provide an abstraction layer that facilitates the use of, communication among, common services for, interaction with, security of, and/or the like of various System elements, components, data stores, and/or the like. In some embodiments, the operating system subcomponent may facilitate execution of program instructions by the processor by providing process management capabilities.").
Wright does not explicitly teach wherein the system model generates particle measurements based on the particle states however this feature is well known in the art and would have been obvious to include as evidenced by Kormushev (Kormushev: Page 4 Section 5; "Using the reformulation of RL from the previous section, here we propose a novel RL algorithm based on Particle Filters (RLPF). The main idea of RLPF is to use particle filtering as a method for choosing the sampling points, i.e. for calculating a parameter vector θ for each trial. ").
Regarding claim 24, Wright remains as applied as in claim 21 and goes on to further teach [t]he robotic control system of claim 21, wherein the policy learning module includes [a] state estimator model configured to generate… estimates based on the particle measurements and a prior… measurement (Wright: Para. 0325 and 0327; "To meet the challenges to using trajectory data in complex returns, Complex return Fitted Q-Iteration (CFQI) is provided, a generalization of the FQI framework which allows for any general return based estimate, enabling the seamless integration of complex returns within AVI. Two distinct methods for utilizing complex returns within the CFQI framework are provided. The first method is similar to the idea of Q(λ) [Watkins 1992] and uses truncated portions of trajectories, that are consistent with the approximation of Q* , to calculate complex return estimates without introducing off-policy bias. The second method is more a novel approach that makes use of the inherent negative bias of complex returns, due to the value iteration context, as a lower bound for value estimates." "It should be noted that this definition of the n-step returns differs from the standard on-policy definition of the n-step returns because of its use of the max operation. In principle each of the n-step returns can be used as approximations of Q* (s.sub.t, a.sub.t). Individually each estimator has its own distinct biases and variances. However, when combined, through averaging they can produce an estimator with lower variance than any one individual return [Dietterich 2000]. It is this idea that motivated the development of complex returns [Sutton 1998].").
Wright does not explicitly teach an online state estimator model configured to generate particle online estimates based on the particle measurements and a prior particle online measurement however these features are well known in the art and would have been obvious to include as evidenced by Sutton (Sutton: Chapter III Section 7.1 and Chapter II Section 5; "As with the n-step TD methods, the updating can be either on-line or off-line. The approach that we have been taking so far is what we call the theoretical, or forward, view of a learning algorithm." "Monte Carlo methods require only experience—sample sequences of states, actions, and rewards from on-line or simulated interaction with an environment. Learning from on-line experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. Learning from simulated experience is also powerful. Although a model is required, the model need only generate sample transitions, not the complete probability distributions of all possible transitions that are required by dynamic programming (DP) methods.") and Kormushev (Kormushev: Page 3 Section 3 and Pages 5-6 Section 5; "Particle filters, also known as Sequential Monte Carlo methods Doucet et al. (2000, 2001), originally come from statistics and are similar to importance sampling methods. Particle filters are able to approximate any probability density function, and can be viewed as a ‘sequential analogue’ of Markov chain Monte Carlo (MCMC) batch methods." "The pseudo-code for RLPF is given in Algorithm 1. A detailed description of the algorithm follows... In lines 15-18, a particle is selected based on the inverse density function mechanism, described earlier in this section. In lines 19-20, a new particle is selected by adding exponentially decayed noise to the previously selected particle."). 

Claims 11, 12, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wright as applied to claims 10 and 17 above, and further in view of Cella et al. (US Pub. No. 20180284758 A1), herein after Cella.
Regarding claim 11, Wright remains as applied as in claim 10, however Wright is silent to [t]he vehicle control system of claim 10, wherein the motion controller is configured to control suspensions of the vehicle.
In a similar field, Cella teaches [t]he vehicle control system of claim 10, wherein the motion controller is configured to control suspensions of the vehicle (Cella: Para. 0314; "In embodiments, the platform 100 may include the local data collection system 102 deployed in the environment 104 to monitor signals from additional large machines such as turbines, windmills, industrial vehicles, robots, and the like. These large mechanical machines include multiple components and elements providing multiple subsystems on each machine. To that end, the platform 100 may include the local data collection system 102 deployed in the environment 104 to monitor signals from individual elements such as axles, bearings, belts, buckets, gears, shafts, gear boxes, cams, carriages, camshafts, clutches, brakes, drums, dynamos, feeds, flywheels, gaskets, pumps, jaws, robotic arms, seals, sockets, sleeves, valves, wheels, actuators, motors, servomotor, and the like.") for the benefit of creating a reinforced learning model that can be applied to a wide range of mechanical systems and subsystems.
It would have been obvious to one ordinarily skilled in the art before the effective filling date of the applicant’s claimed invention to modify the reinforced learning model for vehicle control from Wright with the ability to apply the model to the suspension controls, as taught by Cella, for the benefit of creating a reinforced learning model that can be applied to a wide range of mechanical systems and subsystems.
Regarding claim 12, Wright remains as applied as in claim 10, however Wright is silent to [t]he vehicle control system of claim 10, wherein the motion controller is configured to control actuators of the vehicle.
In a similar field, Cella teaches [t]he vehicle control system of claim 10, wherein the motion controller is configured to control actuators of the vehicle (Cella: Para. 0314; "In embodiments, the platform 100 may include the local data collection system 102 deployed in the environment 104 to monitor signals from additional large machines such as turbines, windmills, industrial vehicles, robots, and the like. These large mechanical machines include multiple components and elements providing multiple subsystems on each machine. To that end, the platform 100 may include the local data collection system 102 deployed in the environment 104 to monitor signals from individual elements such as axles, bearings, belts, buckets, gears, shafts, gear boxes, cams, carriages, camshafts, clutches, brakes, drums, dynamos, feeds, flywheels, gaskets, pumps, jaws, robotic arms, seals, sockets, sleeves, valves, wheels, actuators, motors, servomotor, and the like.") for the benefit of creating a reinforced learning model that can be applied to a wide range of mechanical systems and subsystems.
It would have been obvious to one ordinarily skilled in the art before the effective filling date of the applicant’s claimed invention to modify the reinforced learning model for vehicle control from Wright with the ability to apply the model to the actuator controls of the vehicle, as taught by Cella, for the benefit of creating a reinforced learning model that can be applied to a wide range of mechanical systems and subsystems.

Regarding claim 18, Wright remains as applied as in claim 17, however Wright is silent to [t]he robotic control system of claim 17, wherein the actuator controller is configured to control at least one actuator of the robot.
In a similar field, Cella teaches [t]he robotic control system of claim 17, wherein the actuator controller is configured to control at least one actuator of the robot (Cella: Para. 0314; "In embodiments, the platform 100 may include the local data collection system 102 deployed in the environment 104 to monitor signals from additional large machines such as turbines, windmills, industrial vehicles, robots, and the like. These large mechanical machines include multiple components and elements providing multiple subsystems on each machine. To that end, the platform 100 may include the local data collection system 102 deployed in the environment 104 to monitor signals from individual elements such as axles, bearings, belts, buckets, gears, shafts, gear boxes, cams, carriages, camshafts, clutches, brakes, drums, dynamos, feeds, flywheels, gaskets, pumps, jaws, robotic arms, seals, sockets, sleeves, valves, wheels, actuators, motors, servomotor, and the like.") for the benefit of creating a reinforced learning model that can be applied to a wide range of mechanical systems and subsystems.
It would have been obvious to one ordinarily skilled in the art before the effective filling date of the applicant’s claimed invention to modify the reinforced learning model for vehicle control from Wright with the ability to apply the model to a controller for an actuator of a robot, as taught by Cella, for the benefit of creating a reinforced learning model that can be applied to a wide range of mechanical systems and subsystems.
Regarding claim 19, Wright remains as applied as in claim 17, however Wright is silent to [t]he robotic control system of claim 17, wherein the actuator controller is configured to control more than one actuator of the robot.
In a similar field, Cella teaches [t]he robotic control system of claim 17, wherein the actuator controller is configured to control more than one actuator of the robot (Cella: Para. 0314; "In embodiments, the platform 100 may include the local data collection system 102 deployed in the environment 104 to monitor signals from additional large machines such as turbines, windmills, industrial vehicles, robots, and the like. These large mechanical machines include multiple components and elements providing multiple subsystems on each machine. To that end, the platform 100 may include the local data collection system 102 deployed in the environment 104 to monitor signals from individual elements such as axles, bearings, belts, buckets, gears, shafts, gear boxes, cams, carriages, camshafts, clutches, brakes, drums, dynamos, feeds, flywheels, gaskets, pumps, jaws, robotic arms, seals, sockets, sleeves, valves, wheels, actuators, motors, servomotor, and the like.") for the benefit of creating a reinforced learning model that can be applied to a wide range of mechanical systems and subsystems.
It would have been obvious to one ordinarily skilled in the art before the effective filling date of the applicant’s claimed invention to modify the reinforced learning model for vehicle control from Wright with the ability to apply the model to a controller for multiple actuators of a robot, as taught by Cella, for the benefit of creating a reinforced learning model that can be applied to a wide range of mechanical systems and subsystems.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Gupta et al. (US Pub. No. 20100094786 A1) discloses a reinforcement learning process for optimizing the policies of robots and utilizes many of the same algorithms that Wright utilizes.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Aaron K McCullers whose telephone number is (571)272-3523. The examiner can normally be reached Monday - Friday, Roughly 7:30 AM - 5:30 AM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Angela Ortiz can be reached on (571) 272-1206. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.K.M./             Examiner, Art Unit 3663                                                                                                                                                                                           
/ANGELA Y ORTIZ/             Supervisory Patent Examiner, Art Unit 3663