DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/20/2021 has been entered.
Status of the Application
	Claims 2-4 have been cancelled. Claims 1 and 5-24 are currently pending in this application.
Claim Objections
	Claim 5 has been amended in order to overcome the current objection to the claim. Therefore, the objection to claim 5 has been withdrawn.
Response to Arguments
	Presented arguments have been fully considered, but are rendered moot in view of new ground(s) of rejection necessitated by amendment(s) initiated by the applicant(s).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Sablak et al. (Hereafter, “Sablak”) [US 2012/0081552 A1] in view of Roumeliotis et al. (Hereafter, “Roumeliotis”) [US 2016/0327395 A1] in further view of Sriram et al. (Hereafter, “Sriram”) [US 2019/0294889 A1].
In regards to claim 1, Sablak discloses an object-tracking system ([0002] a video camera system for tracking a moving object) comprising: a camera configured to capture an image of a surrounding environment in accordance with a first camera configuration ([0004] Movable cameras which may pan, tilt and/or zoom may also be used to track objects. The use of a PTZ (pan, tilt, zoom) camera system will typically reduce the number of cameras required for a given surveillance site and also thereby reduce the number and cost of the video feeds and system integration hardware such as multiplexers and switchers associated therewith.), wherein the camera is moveable within a local environment and configured to adopt a second camera configuration ([0006] When a PTZ system is employed, the camera is typically repositioned by analyzing the motion of the target object and predicting a future location of the target object. The camera is then adjusted to reposition the estimated future location of the target object in the center of the FOV.); ([0008] The processor receives video images acquired by the camera and selectively adjusts the camera. The processor is programmed to detect a moving target object in the video images and adjust the camera to track the target object wherein the processor adjusts the camera at a plurality of varied adjustment rates.) ([0008] the processor is programmed to detect a moving target object in the video images [0043] Tracking unit 50 performs several functions, it controls video decoder 58 and captures video frames acquired by camera 22; it registers video frames taken at different times to remove the effects of camera motion; it performs a video content analysis to detect target objects which are in motion within the FOV of camera 22) using a detection algorithm ([0084] If a moving target object is present in the images, the centroid of the target object is determined at block 120. At block 122 images I1 and I2 and the data associated therewith are swapped as described above with respect to block 98. At block 124 the size of the detected target object, i.e., the Blob_Size, is compared to a threshold value and, if the target object is not large enough, or if no target object has been found in the images, the process returns to block 84. If the target object is larger than the threshold size, the process continues on to block 100 through 106 where the adjustment parameters of camera 22 are determined and then communicated to camera 22 as described above.), estimate a current position of the moveable object, estimate a current position of the camera relative to the current position of the moveable object ([0043] calculates the relative direction, speed and size of the detected target objects [0052] Tracking unit 50 does not require the two images which are used to determine the motion of the POI to be taken with the camera having the same pan, tilt and focal length settings for each image. Instead, tracking unit 50 maps or aligns one of the images with the other image and then determines the relative velocity and direction of movement of the POI.), determine the second camera configuration based at least in part on the future position of the moveable object ([0006] When a PTZ system is employed, the camera is typically repositioned by analyzing the motion of the target object and predicting a future location of the target object. The camera is then adjusted to reposition the estimated future location of the target object in the center of the FOV.), 
Sablak fails to explicitly disclose an inertial measurement unit (IMU) associated with the camera, wherein the IMU is configured to generate inertial data representing at least one of an angular velocity or linear acceleration of the camera; and a computer that is operatively coupled with the camera and the inertial measurement unit (IMU), estimate a current position of the camera relative to the current position of the moveable object of using the inertial data, and determine a position of the camera within the local environment, and generate a real-time map of the local environment in a GPS-denied environment, wherein the real-time map reflects the current position of the moveable object and the current position of the camera relative to the moveable object.
Roumeliotis discloses an object-tracking system ([Abstract] Vision-aided inertial navigation system (VINS)) comprising: a camera configured to capture an image of a surrounding environment in accordance with a first camera configuration, wherein the camera is moveable within a local environment and configured to adopt a second camera configuration ([0022] FIG. 2 illustrates an example implementation of VINS 10 in further detail. Image source 12 of VINS 10 images an environment in which VINS 10 operates so as to produce image data 14. That is, image source 12 generates image data 14 that captures a number of features visible in the environment. Image source 12 may be, for example, one or more cameras that capture 2D or 3D images, a laser scanner or other optical device that produces a stream of 1D image data, a depth sensor that produces image data indicative of ranges for features within the environment, a stereo vision system having multiple cameras to produce 3D information, a Doppler radar and the like. In this way, image data 14 provides exteroceptive information as to the external environment in which VINS 10 operates. Moreover, image source 12 may capture and produce image data 14 at time intervals in accordance one or more clocks associated with the image source. In other words, image source 12 may produce image data 14 at each of a first set of time instances along a trajectory within the three-dimensional (3D) environment, wherein the image data captures features 15 within the 3D environment at each of the first time instances.); an inertial measurement unit (IMU) associated with the camera, wherein the IMU is configured to generate inertial data representing at least one of an angular velocity or linear acceleration of the camera ([0020] In addition, IMUs of VINS 10 produces IMU data indicative of a dynamic motion of VINS 10. [0023] IMU 16 produces IMU data 18 indicative of a dynamic motion of VINS 10. IMU 16 may, for example, detect a current rate of acceleration using one or more accelerometers as VINS 10 is translated, and detect the rate rotational velocity (i.e., the rate of change in rotational attributes like pitch, roll and yaw) using one or more gyroscopes as VINS 10 is rotated. IMU 14 produces IMU data 18 to specify the detected motion. In this way, IMU data 18 provides proprioceptive information as to the VINS 10 own perception of its movement and orientation within the environment.); and a computer that is operatively coupled with the camera and the inertial measurement unit (IMU) ([Fig. 2] Processing unit 20 is coupled to Image Source 12 and IMU 16), wherein the computer is configured to: process the image from the camera ([0020] While traversing environment 2, the image sources of VINS 10 produce image data at discrete time instances along the trajectory within the three-dimensional (3D) environment, where the image data captures features 15 within the 3D environment at each of the time instances.) ([0026] Feature extraction and tracking module 12 extracts features 15 from image data 14 acquired by image source 12 and stores information describing the features in feature database 25. Feature extraction and tracking module 12 may, for example, perform corner and edge detection to identify features and track features 15 across images using, for example, the Kanade-Lucas-Tomasi (KLT) techniques described in Bruce D. Lucas and Takeo Kanade, An iterative image registration technique with an application to stereo vision, In Proc. of the International Joint Conference on Artificial Intelligence, pages 674-679, Vancouver, British Columbia, Aug.  24-28 1981, the entire content of which in incorporated herein by reference. [0027] Outlier rejection module 13 provides robust outlier rejection of measurements from image source 12 and IMU 16.  For example, outlier rejection module may apply a Mahalanobis distance tests to the feature measurements to identify and reject outliers. As one example, outlier rejection module 13 may apply a 2-Point Random sample consensus (RANSAC) technique described in Laurent Kneip, Margarita Chli, and Roland Siegwart, Robust Real-Time Visual Odometty with a Single Camera and an Imu, In Proc.  of the British Machine Vision Conference, pages 16.1-16.11, Dundee, Scotland, Aug.  29-Sep. 2 2011, the entire content of which in incorporated herein by reference.), estimate a current position of the moveable object, estimate a current position of the camera relative to the current position of the moveable object of using the inertial data ([0020] As described in detail herein, VINS 10 includes a hardware-based computing platform that implements an estimator that fuses the image data and the IMU data to perform localization of VINS 10 within environment 10. That is, based on the image data and the IMU data, VINS 10 determines, at discrete points along the trajectory of VINS as the VINS traverses environment 2, poses (position and orientation) of VINS 10 as well as positions of features 15. [0021] As described herein, in one example implementation, VINS 10 implements an inverse, sliding-window filter (ISWF) for processing inertial and visual measurements. That is, estimator 22 applies the ISWF to process image data 14 and IMU data 18 to estimate the 3D IMU pose and velocity together with the time-varying IMU biases and to produce, based on the captured image data, estimates for poses of VINS 10 along the trajectory and an overall map of visual features 15.), predict a future position of the moveable object ([0028] As described herein, estimator 22 implements filter 23 that iteratively updates predicted state estimates over a bounded-size sliding window of state estimates for poses of VINS 10 and positions of features 15 in real-time as new image data 14 and IMU data 18 is obtained. That is, by implementing the filtering approach, estimator 22 of VINS 10 marginalizes out past state estimates and measurements through the sliding window as VINS 10 traverses environment 2 for simultaneous localization and mapping (SLAM).), determine the second camera configuration based at least in part on the future position of the moveable object ([0038] As further described herein, in one example estimator 22 recursively updates the state estimates with state vector 17 by: classifying, for each of the poses estimated for VINS 10 along the trajectory, each of the features 15 observed at the respective pose into either a first set of the features or a second set of the features, maintaining a state vector of having predicted states for a position and orientation of the VINS and for positions with the environment for the first set of features for a sliding widow of two or more of the most recent poses along the trajectory without maintaining predicted state estimates for positions of the second set of features within state vector 17, and applying a sliding window filter that utilizes an inverse to compute constraints between consecutive ones of the poses within the sliding window based on the second set of features and compute, in accordance with the constraints, updates for the predicted state estimates within the state vector for the sliding window.), determine a position of the camera within the local environment ([0020] That is, estimator 22 applies the ISWF to process image data 14 and IMU data 18 to estimate the 3D IMU pose and velocity together with the time-varying IMU biases and to produce, based on the captured image data, estimates for poses of VINS 10 along the trajectory and an overall map of visual features 15.), and generate a real-time map of the local environment in a GPS-denied environment ([0019] Environment 2 may, for example, represent an environment where conventional GPS-signals are unavailable for navigation, such as on the moon or a different planet or even underwater.), wherein the real-time map reflects the current position of the moveable object and the current position of the camera relative to the moveable object ([0019] VINS 10 may perform simultaneous localization and mapping (SLAM) by constructing a map of environment 2 while simultaneously determining the position and orientation of VINS 10 as the VINS traverses the environment [0019] Features 15, also referred to as landmarks, represent objects visible within environment 2, such as rocks, trees, signs, walls, stairs, chairs, tables, and the like. Features 15 may be moving or stationary objects within environment 2. [0020] Utilizing these techniques, VINS 10 may navigate environment 2 and, in some cases, may construct a map of the environment including the positions of features 15.).
([Abstract] multi-sensor object tracking) comprising: a camera configured to capture an image of a surrounding environment in accordance with a first camera configuration ([0042] track objects within particular regions in the fields of view of multiple image sensors to form trajectories within those regions), ([0059] For example, and without limitation, the sensor(s) may comprise any combination of an image sensor(s), …, inertial measurement unit (IMU) sensor(s) (e.g., accelerometer(s), gyroscope(s), magnetic compass(es), magnetometer(s), etc.), …, stereo camera(s), wide-view camera(s) (e.g., fisheye cameras), infrared camera(s), surround camera(s) (e.g., 360 degree cameras), long-range and/or mid-range camera(s),… , and/or other sensor type.); and a computer that is operatively coupled with the camera and the inertial measurement unit (IMU), wherein the computer is configured to: process the image from the camera graphical processing unit (GPU) ([0072] Some or all of the image recognition techniques may be executed as processing tasks by one or more graphical processing units (GPUs) in the local one or more computing devices. According to some examples, some or all of the image recognition techniques may be executed by one or more GPUs in computing devices remotely positioned from the area, such as in a distributed (e.g., cloud) computing environment. In still further examples, some or all of the processing may be performed by a combination of GPUs from local and remote computing environments.), extraction to process the image, and wherein the GPU is configured to provide an object detector with a ([0072] The perception system 102 may apply image recognition techniques (e.g., using the object detector 114, the occupancy determiner 116, the object attribute determiner 118, and/or the intra-feed object tracker 120) to extract metadata from images captured by image sensors (and/or other sensors) in or around an area, such as the area 200 (e.g., a monitored structure).), detect a moveable object within the image using a detection algorithm ([0049] The intra-feed object tracker 120 may be configured to track motion of objects within a feed of the sensor data 162--such as a single-camera feed--and may employ the object detections from the object detector 114 and the object attributes (e.g., to generate one or more object trajectories for a feed). The global location determiner 122 may be used to determine global locations of objects, such as of objects that may be detected using the object detector 114. [0065] In any of these examples, the object detector 114 may analyze the image data to detect and/or identify an object in the area 200, such as within an image of the area 200 and/or a field of view(s) of a sensor in the area 200 (e.g., using object perception). The object detector 114 may analyze the image data to extract and/or determine a presence and/or location(s) of one or more objects in an image(s) represented by the image data and/or in the environment. This may include the object detector 114 determining a bounding box of an object and/or location coordinates of the object in an image (e.g., four coordinate pairs of corners of a bounding box) and/or one or more confidence values associated with a detection. The object detector 114 may employ, for example, one or more machine learning models to determine one or more object attributes of an object. For example, and without limitation, the machine learning model(s) may include any type of machine learning model, such as a machine learning model(s) using linear regression, logistic regression, decision trees, support vector machines (SVM), Naive Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models.), estimate a current position of the moveable object ([0085] The intra-feed object tracker 120 may be configured to track motion of objects within a feed (and/or sub-feed) of the sensor data 162--such as within a single surface--and may employ the object detections from the object detector 114 and optionally object attributes from the object attribute determiner 118 (e.g., to generate one or more object trajectories for a feed). . For example, the intra-feed object tracker 120 may determine location coordinates and/or trajectories of each object within a feed and/or surface (the location coordinates and/or trajectories may correspond to locations of bounding boxes over time).), ([0042] Approaches described herein may track objects within particular regions in the fields of view (e.g., aisles of parking structures) of multiple image sensors to form trajectories within those regions. The trajectories from different regions of different fields of view may be merged to form a single trajectory for a particular object, thereby leveraging tracking information from multiple image sensors that can compensate for any deficiencies of the individual trajectories.), ([0006] Disclosed approaches may allow for efficient, real-time monitoring and detection of vehicles, persons, and/or other objects in a wide variety of environments or areas. [0202] The present disclosure further provides approaches for identifying the objects observed by the perception system 102 in multiple time-periods. At times no sensor of an object tracking system may detect an object in an area due to coverage holes. For example, a particular region may not be covered by any of the cameras or other sensors, vehicles may be in a tunnel which may frustrate GPS-based trackers, etc. Disclosed approaches allow for the smart area monitoring system 100 to handle transient object disappearances.) and the current position of the camera relative to the moveable object ([0084] The location calibrator 128 may calibrate location data for the cameras using any suitable approach (e.g., checker-board based calibration). For every object detected (e.g., a vehicle), a camera may use the location data to emit the global coordinates and global time information, or this information may otherwise be determined from the image data from the camera (e.g., on a perception server). In some examples, an object's coordinates may be computed using a transformation matrix to map the camera or image coordinates to the global coordinates.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the use of the IMU in the camera’s housing (VINS) and the use of SLAM for constructing a map of the environment surrounding the camera’s housing (VINS) as taught by Roumeliotis. The motivation behind this modification would have been to determine the position and orientation of the VINS (including 

In regards to claim 22, the limitations of claim 1 have been addressed. Sablak fails to explicitly disclose wherein the computer is configured to perform, in conjunction with the camera and IMU, simultaneous localization and mapping (SLAM).  
Roumeliotis discloses wherein the computer is configured to perform, in conjunction with the camera and IMU, simultaneous localization and mapping (SLAM) ([0019] VINS 10 may perform simultaneous localization and mapping (SLAM) by constructing a map of environment 2 while simultaneously determining the position and orientation of VINS 10 as the VINS traverses the environment).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the use of the IMU in the camera’s housing (VINS) and the use of SLAM for constructing a map of the environment surrounding the camera’s housing (VINS) as taught by Roumeliotis. The motivation behind this modification would have been to determine the position and orientation of the VINS (including camera) and the moving and stationary features in the environment using the IMU which can 

In regards to claim 23, the limitations of claim 1 have been addressed. Sablak fails to explicitly disclose wherein the computer is trained to track the moveable object through machine learning by artificial neural networks.
Sriram discloses wherein the computer is trained to track the moveable object through machine learning by artificial neural networks ([0049] The intra-feed object tracker 120 may be configured to track motion of objects within a feed of the sensor data 162--such as a single-camera feed--and may employ the object detections from the object detector 114 and the object attributes (e.g., to generate one or more object trajectories for a feed). The global location determiner 122 may be used to determine global locations of objects, such as of objects that may be detected using the object detector 114. [0065] In any of these examples, the object detector 114 may analyze the image data to detect and/or identify an object in the area 200, such as within an image of the area 200 and/or a field of view(s) of a sensor in the area 200 (e.g., using object perception). The object detector 114 may analyze the image data to extract and/or determine a presence and/or location(s) of one or more objects in an image(s) represented by the image data and/or in the environment. This may include the object detector 114 determining a bounding box of an object and/or location coordinates of the object in an image (e.g., four coordinate pairs of corners of a bounding box) and/or one or more confidence values associated with a detection. The object detector 114 may employ, for example, one or more machine learning models to determine one or more object attributes of an object. For example, and without limitation, the machine learning model(s) may include any type of machine learning model, such as a machine learning model(s) using linear regression, logistic regression, decision trees, support vector machines (SVM), Naive Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak and Roumeliotis with the teachings of object detection and image recognition using a graphical processing unit as taught by Sriram in order to improve the object detection and tracking in areas where GPS tracking is unreliable, for example in parking garages [See Sriram].

In regards to claim 24, the limitations of claim 1 have been addressed. Sablak discloses wherein ([0003] There are numerous known video surveillance systems which may be used to track a moving object such as a person or vehicle.).
Roumeliotis discloses wherein the camera is coupled to or integrated with a first vehicle ([0019] VINS 10 may be, for example, a robot, mobile sensing platform, a mobile phone, a laptop, a tablet computer, a vehicle, and the like.) and the moveable object ([0019] Features 15, also referred to as landmarks, represent objects visible within environment 2, such as rocks, trees, signs, walls, stairs, chairs, tables, and the like. Features 15 may be moving or stationary objects within environment 2.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak’s fixed camera tracking a moving object such as a vehicle with the use of a vehicle with a camera (VINS) tracking the position and orientation of moving objects as taught Roumeliotis. The motivation behind this modification would have been to determine the position and orientation of the VINS (including camera) and the moving features in the environment using the IMU which can accurately track dynamic motions over short time durations in GPS-denied environments [See Roumeliotis].

Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Sablak in view of Roumeliotis in further view of Sriram in even further view of Fahn et al. (Hereafter, “Fahn”) [US 2009/0128618 A1].
In regards to claim 5, the limitations of claim 1 have been addressed. Sablak fails to explicitly disclose wherein the camera is coupled to or integrated with a wearable that is associated with the user.
Fahn discloses wherein the camera is coupled to or integrated with a wearable that is associated with the user ([0004] A handheld image capture system has an imager which is controlled to perform operations to obtain an image.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the use of the camera being handheld by the user as taught by Fahn in order to improve object detection through the image capture.

In regards to claim 6, the limitations of claim 1 have been addressed. Sablak fails to explicitly disclose wherein at least one of the current position of the moveable object, the 
Fahn discloses wherein at least one of the current position of the moveable object, the current position of the user, or the future position of the moveable object is determined using a Kalman filter ([0050] A visual tracking or video tracking system can also be used, which includes algorithms such as, but not limited to: blob tracking, kernel-based tracking, contour tracking, Kalman filters, and particle filters.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the teachings of Fahn in order to improve object detection through the image capture.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Sablak in view of Roumeliotis in further view of Sriram in even further view of Fisher et al. (Hereafter, “Fisher”) [US 2012/0233000 A1].
In regards to claim 7, the limitations of claim 1 have been addressed. Sablak fails to explicitly disclose wherein the computer is operatively coupled with a global positioning system (GPS), wherein the computer is configured to determine the current position of the user relative to the moveable object using the GPS system in a non-GPS-denied environment.
Roumeliotis discloses ([0019] In some implementations, the techniques described herein may be used within environments having GPS or similar signals and may provide supplemental localization and mapping information.).
Fisher discloses wherein the computer is operatively coupled with a global positioning system (GPS), wherein the computer is configured to determine the current position of the user relative to the moveable object in a GPS-denied environment ([0173] In this manner, remote devices users A and B can capture image(s) of target object or event C at the same time using the GPS and compass data from the remote device application(s) to triangulate on target C. The location, direction, speed, and orientation of a moving target could then be calculated by triangulation from the GPS and compass information captured over a period of time at suitable intervals and tagged to each respective image. Each of the remote device users would follow or track the movement of the target object and either store the time, GPS and directional information as metadata associated with the captured images, or communicate the information to the server to be save and used in the calculations at a later time. The GPS and directional information may be collected every second, or more preferably multiple times per second, where the suitable interval is determined based on the object's speed or the amount of detail required to be captured. The GPS and Compass location of Target C is therefore known due to triangulation. This is also true for moving targets if the cell phone users follow the target (i.e. we beam coordinates for triangulation back to the server every second).).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the use of the techniques of determination of the position and orientation of the VINS and the features in a QPS enabled .

Claims 8-15 are rejected under 35 U.S.C. 103 as being unpatentable over Sablak in view of Fahn in further view of KNOBLAUCH et al. (Hereafter, “Knoblauch”) [US 2015/0243069 A1] in even further view of Roumeliotis in even further view of Sriram.
In regards to claim 8, Sablak discloses a positioning system ([0002] a video camera system for tracking a moving object), comprising: a camera, wherein the camera is oriented in accordance with a current pan, tilt, and/or zoom (PTZ) configuration ([0004] Movable cameras which may pan, tilt and/or zoom may also be used to track objects. The use of a PTZ (pan, tilt, zoom) camera system will typically reduce the number of cameras required for a given surveillance site and also thereby reduce the number and cost of the video feeds and system integration hardware such as multiplexers and switchers associated therewith.), and wherein the camera is configured to capture an image while oriented in accordance with the current PTZ configuration ([0008] The invention comprises, in one form thereof, a video tracking system which includes a video camera having a field of view wherein the camera is selectively adjustable and adjustment of the camera varies the field of view of the camera. Also included is at least one processor which is operably coupled to the camera.); a processor configured to process the image using a computer vision technique ([0008] The processor receives video images acquired by the camera and selectively adjusts the camera. The processor is programmed to detect a moving target object in the video images and adjust the camera to track the target object wherein the processor adjusts the camera at a plurality of varied adjustment rates.) ([0006] When a PTZ system is employed, the camera is typically repositioned by analyzing the motion of the target object and predicting a future location of the target object. The camera is then adjusted to reposition the estimated future location of the target object in the center of the FOV.); a detector configured to detect a moveable object within the image ([0008] the processor is programmed to detect a moving target object in the video images [0043] Tracking unit 50 performs several functions, it controls video decoder 58 and captures video frames acquired by camera 22; it registers video frames taken at different times to remove the effects of camera motion; it performs a video content analysis to detect target objects which are in motion within the FOV of camera 22), wherein the moveable object is detected using a bounding box ([0084] If a moving target object is present in the images, the centroid of the target object is determined at block 120. At block 122 images I1 and I2 and the data associated therewith are swapped as described above with respect to block 98. At block 124 the size of the detected target object, i.e., the Blob_Size, is compared to a threshold value and, if the target object is not large enough, or if no target object has been found in the images, the process returns to block 84. If the target object is larger than the threshold size, the process continues on to block 100 through 106 where the adjustment parameters of camera 22 are determined and then communicated to camera 22 as described above.), ([0089] In alternative embodiments, the tracking unit may give up control of camera 22 during human operator and/or camera initiated movement of camera and continue to analyze the images acquired by camera 22 to detect target objects.), and wherein the detector is configured to deactivate a detection algorithm if it is no longer compatible with the type of object being detected ([0090] Once tracking unit 50 has detected a target object, it will continuously track the target object until it can no longer locate the target object, for example, the target object may leave the area which is viewable by camera 22 or may be temporarily obscured by other objects in the FOV. When unit 50 first loses the target object it will enter into a reacquisition subroutine. If the target object is reacquired, tracking unit will continue tracking the target object, if the target has not been found before the completion of the reacquisition subroutine, tracking unit 50 will change its status to Looking for Target and control of the camera position will be returned to either the camera controller or the human operator.); and a state estimator configured to store a current estimated position of a user and calculate a new estimated position of the user based on the type of object, an estimated location of the moveable object ([0043] calculates the relative direction, speed and size of the detected target objects), and a stored map ([0052] Tracking unit 50 does not require the two images which are used to determine the motion of the POI to be taken with the camera having the same pan, tilt and focal length settings for each image. Instead, tracking unit 50 maps or aligns one of the images with the other image and then determines the relative velocity and direction of movement of the POI.), 
Fahn discloses a positioning system ([Title] System and Method for Object Selection in a Handheld Image Capture Device), comprising: a camera, wherein the camera is oriented in accordance with a current pan, tilt, and/or zoom (PTZ) configuration ([0034] FIG. 1 shows an imager with a multiple-axis actuating mechanism (hereinafter "MAAM imager"). The MAAM imager shown in FIG. 1 is a single-imager handheld camera with a multiple-axis actuating mechanism (herein after a "single-imager MAAM camera"). The single-imager MAAM camera 100 includes a camera body 120 and an actuated imager 110.), and wherein the camera is configured to capture an image while oriented in accordance with the current PTZ configuration ([0050] Initially, the bicyclist 401, being located inside the wide field of view a 350, is selected as the object of interest to be centered.); a processor configured to process the image using a computer vision technique ([0053-0054] auto-rotate process, auto-center process, auto-zoom process) ([0050] Meanwhile, the actuated imager 310, based on object location information, initially moves the lens using one or both of the auto-pan DOF and the auto-tilt DOF so as to bring the image of the bicyclist to the center of its image field 420, which is defined by the standard field of view β 340. Subsequently, the actuated imager 310 continues to move the lens to physically track the moving object, based on the object location information, so that the bicyclist at the later position and time 402 remains centered within the image field 420 of the actuated imager.); a detector configured to detect a moveable object within the image, ([0050] This selection of an object of interest is performed by an object selection module which will be described in detail in reference to FIG. 9 below. The bicyclist 401 appears in the upper right portion of an image field 410 of the static imager 330 (FIG. 3B) which is defined by the wide field of view a 350. Subsequently, the stationary imager 330 continues to track the bicyclist to a later position 402 at a later time using a software algorithm. Such software-based tracking may be achieved by optic flow or other vision algorithms (e.g., O'Sullivan, Igoe, Physical Computing: Sensing and Controlling the Physical World with Computers, Chapter 9, Thomson Course Tech., 2004). The SwisTrack tool (see, e.g., SwisTrack: A Tracking Tool for Multi-Unit Robotic and Biological Systems, by Correll, Nikolaus; Sempo, Gregory; Lopez de Meneses, Yuri; Halloy, Jose; Deneubourg, Jean-Louis; Martinoli, Alcherio, in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (2006), p. 2185-2191, 2006) can be used for trajectory tracking of multiple moving objects, with its core image manipulation functions provided by Intel Corporation's Open Source Computer Vision Library ("OpenCV Library"), for example. A visual tracking or video tracking system can also be used, which includes algorithms such as, but not limited to: blob tracking, kernel-based tracking, contour tracking, Kalman filters, and particle filters. Based on the image provided by the stationary imager 330, the object selection module calculates the object location information regarding the center coordinate of the bicyclist in its image field 410.), and wherein the detector is configured to deactivate a detection algorithm if it is no longer compatible with the type of object being detected; and a state estimator configured to ([0045] The single-imager MAAM camera, such as that shown in FIGS. 2A and 2B may be used for an auto-centering purpose, e.g., centering an object of interest in the center of the captured image field. In other embodiments, the object of interest could be centered in a particular zone or placed at the intersection of particular zones of the captured image field. Assuming an object of interest is selected, the selected object may be centered automatically by a combination of the panning and the tilting motions of the actuated imager. The auto-center feature will be described in detail in reference to FIG. 4 below.), 
Knoblauch discloses a state estimator configured to store a current estimated position of a user and calculate a new estimated position of the user based on the type of object, an estimated location of the moveable object, and a stored map of an environment, wherein the stored map includes the estimated location of the moveable object relative to the current estimated position ([0091] Referring now to FIG. 6, FIG. 6 shows an example method 600 for view independent color equalized 3D scene texturing. The method 600 of FIG. 6 may be performed by any suitable computing device or within any suitable computing environment, such as those discussed below with respect to FIGS. 7-9. As part of various examples, suitable devices or systems may be used. Certain examples may use aspects of SLAM (Simultaneous Location and Mapping) or PTAM (Parallel Tracking and Mapping) systems as means of identifying a camera pose or a position of the camera relative to a captured object or scene. Various alternative examples may also use these mapping systems to create a geometric model of a system, or for collecting various data that may be used for creation of a 3D model.). 
Roumeliotis discloses a positioning system ([Abstract] Vision-aided inertial navigation system (VINS)), comprising: a camera, wherein the camera is oriented ([0022] FIG. 2 illustrates an example implementation of VINS 10 in further detail. Image source 12 of VINS 10 images an environment in which VINS 10 operates so as to produce image data 14. That is, image source 12 generates image data 14 that captures a number of features visible in the environment. Image source 12 may be, for example, one or more cameras that capture 2D or 3D images, a laser scanner or other optical device that produces a stream of 1D image data, a depth sensor that produces image data indicative of ranges for features within the environment, a stereo vision system having multiple cameras to produce 3D information, a Doppler radar and the like. In this way, image data 14 provides exteroceptive information as to the external environment in which VINS 10 operates. Moreover, image source 12 may capture and produce image data 14 at time intervals in accordance one or more clocks associated with the image source. In other words, image source 12 may produce image data 14 at each of a first set of time instances along a trajectory within the three-dimensional (3D) environment, wherein the image data captures features 15 within the 3D environment at each of the first time instances.); a state estimator configured to store a current estimated , wherein the state estimator is trained to calculate the new estimated position ([0020] As described in detail herein, VINS 10 includes a hardware-based computing platform that implements an estimator that fuses the image data and the IMU data to perform localization of VINS 10 within environment 10. That is, based on the image data and the IMU data, VINS 10 determines, at discrete points along the trajectory of VINS as the VINS traverses environment 2, poses (position and orientation) of VINS 10 as well as positions of features 15. [0021] As described herein, in one example implementation, VINS 10 implements an inverse, sliding-window filter (ISWF) for processing inertial and visual measurements. That is, estimator 22 applies the ISWF to process image data 14 and IMU data 18 to estimate the 3D IMU pose and velocity together with the time-varying IMU biases and to produce, based on the captured image data, estimates for poses of VINS 10 along the trajectory and an overall map of visual features 15. [0028] As described herein, estimator 22 implements filter 23 that iteratively updates predicted state estimates over a bounded-size sliding window of state estimates for poses of VINS 10 and positions of features 15 in real-time as new image data 14 and IMU data 18 is obtained. That is, by implementing the filtering approach, estimator 22 of VINS 10 marginalizes out past state estimates and measurements through the sliding window as VINS 10 traverses environment 2 for simultaneous localization and mapping (SLAM). [0038] As further described herein, in one example estimator 22 recursively updates the state estimates with state vector 17 by: classifying, for each of the poses estimated for VINS 10 along the trajectory, each of the features 15 observed at the respective pose into either a first set of the features or a second set of the features, maintaining a state vector of having predicted states for a position and orientation of the VINS and for positions with the environment for the first set of features for a sliding widow of two or more of the most recent poses along the trajectory without maintaining predicted state estimates for positions of the second set of features within state vector 17, and applying a sliding window filter that utilizes an inverse to compute constraints between consecutive ones of the poses within the sliding window based on the second set of features and compute, in accordance with the constraints, updates for the predicted state estimates within the state vector for the sliding window.).
Sriram discloses a positioning system ([Title] Smart Area Monitoring with Artificial Intelligence), comprising: a camera, ([0042] track objects within particular regions in the fields of view of multiple image sensors to form trajectories within those regions); a processor configured to process the image using a computer vision technique via a graphical processing unit (GPU) ([0072] Some or all of the image recognition techniques may be executed as processing tasks by one or more graphical processing units (GPUs) in the local one or more computing devices. According to some examples, some or all of the image recognition techniques may be executed by one or more GPUs in computing devices remotely positioned from the area, such as in a distributed (e.g., cloud) computing environment. In still further examples, some or all of the processing may be performed by a combination of GPUs from local and remote computing environments.), wherein the GPU is configured to use feature extraction to process the image, and wherein the GPU is configured to provide an object detector with a preprocessed image ([0072] The perception system 102 may apply image recognition techniques (e.g., using the object detector 114, the occupancy determiner 116, the object attribute determiner 118, and/or the intra-feed object tracker 120) to extract metadata from images captured by image sensors (and/or other sensors) in or around an area, such as the area 200 (e.g., a monitored structure).); ([0065] In any of these examples, the object detector 114 may analyze the image data to detect and/or identify an object in the area 200, such as within an image of the area 200 and/or a field of view(s) of a sensor in the area 200 (e.g., using object perception). The object detector 114 may analyze the image data to extract and/or determine a presence and/or location(s) of one or more objects in an image(s) represented by the image data and/or in the environment. This may include the object detector 114 determining a bounding box of an object and/or location coordinates of the object in an image (e.g., four coordinate pairs of corners of a bounding box) and/or one or more confidence values associated with a detection. The object detector 114 may employ, for example, one or more machine learning models to determine one or more object attributes of an object. For example, and without limitation, the machine learning model(s) may include any type of machine learning model, such as a machine learning model(s) using linear regression, logistic regression, decision trees, support vector machines (SVM), Naive Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models.), ([0065] In any of these examples, the object detector 114 may analyze the image data to detect and/or identify an object in the area 200, such as within an image of the area 200 and/or a field of view(s) of a sensor in the area 200 (e.g., using object perception). The object detector 114 may analyze the image data to extract and/or determine a presence and/or location(s) of one or more objects in an image(s) represented by the image data and/or in the environment. This may include the object detector 114 determining a bounding box of an object and/or location coordinates of the object in an image (e.g., four coordinate pairs of corners of a bounding box) and/or one or more confidence values associated with a detection. The object detector 114 may employ, for example, one or more machine learning models to determine one or more object attributes of an object. For example, and without limitation, the machine learning model(s) may include any type of machine learning model, such as a machine learning model(s) using linear regression, logistic regression, decision trees, support vector machines (SVM), Naive Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models. [0131] The object detector 114 may detect one or more objects from a fisheye image one or more neural networks, such as using deep learning methods. To do so, the object detector 114 may use a trained a CNN to determine localization of objects in an image, thereby reducing or eliminating the need for a dewarper or dewarping process.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the use of a library of algorithms for the object detection as taught by Fahn in order to improve the object detection and tracking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak and Fahn with the identification of the position of the camera relative to the captured object of the scene as taught by Knoblauch in order to map the position of the camera into the scene/environment for reconstruction of 3D scenes [See Knoblauch, 0004]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the 
In regards to claim 9, the limitations of claim 8 have been addressed. Sablak fails to explicitly disclose wherein the camera is coupled to or integrated with a wearable that is associated with the user.
Fahn discloses wherein the camera is coupled to or integrated with a wearable that is associated with the user ([0004] A handheld image capture system has an imager which is controlled to perform operations to obtain an image.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the use of the camera being handheld by the user as taught by Fahn in order to improve object detection through the image capture.

In regards to claim 10, the limitations of claim 8 have been addressed. Sablak discloses wherein the controller develops a new PTZ configuration ([0006] When a PTZ system is employed, the camera is typically repositioned by analyzing the motion of the target object and predicting a future location of the target object. The camera is then adjusted to reposition the estimated future location of the target object in the center of the FOV.) at least partly based on at least one of: the type of object being detected, the new estimated position of the user, or information shared by an external device.
Fahn discloses wherein the controller develops a new PTZ configuration at least partly based on at least one of: the type of object being detected, the new estimated position of the user, or information shared by the external device ([0050] Initially, the bicyclist 401, being located inside the wide field of view a 350, is selected as the object of interest to be centered. This selection of an object of interest is performed by an object selection module which will be described in detail in reference to FIG. 9 below. The bicyclist 401 appears in the upper right portion of an image field 410 of the static imager 330 (FIG. 3B) which is defined by the wide field of view a 350. Subsequently, the stationary imager 330 continues to track the bicyclist to a later position 402 at a later time using a software algorithm. Such software-based tracking may be achieved by optic flow or other vision algorithms (e.g., O'Sullivan, Igoe, Physical Computing: Sensing and Controlling the Physical World with Computers, Chapter 9, Thomson Course Tech., 2004). The SwisTrack tool (see, e.g., SwisTrack: A Tracking Tool for Multi-Unit Robotic and Biological Systems, by Correll, Nikolaus; Sempo, Gregory; Lopez de Meneses, Yuri; Halloy, Jose; Deneubourg, Jean-Louis; Martinoli, Alcherio, in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (2006), p. 2185-2191, 2006) can be used for trajectory tracking of multiple moving objects, with its core image manipulation functions provided by Intel Corporation's Open Source Computer Vision Library ("OpenCV Library"), for example.  A visual tracking or video tracking system can also be used, which includes algorithms such as, but not limited to: blob tracking, kernel-based tracking, contour tracking, Kalman filters, and particle filters. Based on the image provided by the stationary imager 330, the object selection module calculates the object location information regarding the center coordinate of the bicyclist in its image field 410. Meanwhile, the actuated imager 310, based on object location information, initially moves the lens using one or both of the auto-pan DOF and the auto-tilt DOF so as to bring the image of the bicyclist to the center of its image field 420, which is defined by the standard field of view β 340. Subsequently, the actuated imager 310 continues to move the lens to physically track the moving object, based on the object location information, so that the bicyclist at the later position and time 402 remains centered within the image field 420 of the actuated imager. In case the object of interest 401 remains stationary, the actuated imager 310 will initially move the lens so as to center the object of interest in its image field 420 based on the object location. However, after the initial centering is complete, no further tracking by the actuated imager will be necessary unless the object or the handheld camera moves with respect to the background.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the teachings of Fahn in order to improve object detection through the image capture.

In regards to claim 11, the limitations of claim 8 have been addressed. Sablak discloses wherein the camera is an omnidirectional camera ([0004] Movable cameras which may pan, tilt and/or zoom may also be used to track objects.).

In regards to claim 12, the limitations of claim 8 have been addressed. Sablak discloses further comprising a second camera configured to capture an image ([0039] Illustrated system 20 is a single camera application, however, the present invention may be used within a larger surveillance system having additional cameras which may be either stationary or moveable cameras or some combination thereof to provide coverage of a larger or more complex surveillance area.).

In regards to claim 13, the limitations of claim 8 have been addressed. Sablak fails to explicitly disclose further comprising an inertial measurement unit (IMU).
Fahn discloses further comprising an inertial measurement unit (IMU) ([0065] In some embodiments, the camera body movement detection unit 931 is based on an inertial sensor such as a MEMS-based accelerometer available from Analog Devices (Norwood, Mass.), for example.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the teachings of Fahn in order to improve object detection through the image capture.

In regards to claim 14, the limitations of claim 8 have been addressed. Sablak discloses wherein the state estimator uses odometry, at least in part, to calculate a new estimated position of the user ([0043] Tracking unit 50 performs several functions, it controls video decoder 58 and captures video frames acquired by camera 22; it registers video frames taken at different times to remove the effects of camera motion; it performs a video content analysis to detect target objects which are in motion within the FOV of camera 22; it calculates the relative direction, speed and size of the detected target objects; it sends direction and speed commands to camera 22; it performs all serial communications associated with the above functions; and it controls the operation of the status indicators 70, 72 and relay 74.).
Fahn discloses wherein the state estimator uses odometry, at least in part, to calculate a new estimated position of the user ([0065] In some embodiments, the camera body movement detection unit 931 is based on an inertial sensor such as a MEMS-based accelerometer available from Analog Devices (Norwood, Mass.), for example.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the teachings of Fahn in order to improve object detection through the image capture.

In regards to claim 15, the limitations of claim 8 have been addressed. Sablak fails to explicitly disclose wherein the state estimator uses a Kalman filter.
([0050] A visual tracking or video tracking system can also be used, which includes algorithms such as, but not limited to: blob tracking, kernel-based tracking, contour tracking, Kalman filters, and particle filters.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the teachings of Fahn in order to improve object detection through the image capture.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Sablak in view of Fahn in further view of Knoblauch in even further view of Roumeliotis in even further view of Sriram in even further view of Wonneberger [US 2006/0078162 A1].
In regards to claim 16, the limitations of claim 8 have been addressed. Sablak fails to explicitly disclose further comprising an interface configured to receive user input, wherein the input is used to help determine the type of object being detected.
Wonneberger discloses further comprising an interface configured to receive user input ([0026] The Object Determiner 103 may receive the sequence of video images from the Moveable Camera 101 and may create a border 203 substantially around the outside of an object 210 of interest within the current video image (see FIG. 2). The border 203 may be, e.g., but not limited to, selected by the User 109 and/or may be automatically determined by adjusting the border of a previous video image using an object motion model, as will be described below in detail. [0031] The object border 203 may also be the outer border of object area 204. The object border 203 may define the area substantially surrounding the object 210. The object border 203 may be, e.g., defined by User 109, who draws the border using a mouse or a joy-stick, or may be automatically determined as a result of adjusting the object border 203 of the previous video image.), wherein the input is used to help determine the type of object being detected ([0035] FIG. 3 illustrates an exemplary embodiment of adjusting an object border according to the present invention. As discussed above, the object border 203 may be selected by the User 109, or may be determined by the Object Determiner 103.  When the User 109 is not selecting the object border, the Object Determiner 103 may determine the object border 203. Initially, the Object Determiner 103 may identify the moving object 210 by any conventional segmentation algorithm, and then may place the border 203 around the object 210 such that the object 210 may be substantially within the border 203. After the initial border is identified, in subsequent video images, the Object Determiner 103 may receive the object motion model from the Object Motion Estimator 105 to adjust the object border 203.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the teachings of Wonneberger in order to stabilize the video obtained by a camera.

Claims 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sablak in view of Roumeliotis in further view of Fahn in further view of Ju et al. (Hereafter, “Ju”) [US 2019/0156138 A1].
In regards to claim 17, Sablak discloses a method for visually localizing an individual ([0002] a video camera system for tracking a moving object), the method comprising the steps of: capturing an image containing an object via a camera using a first pan, tilt, and/or zoom ([0004] Movable cameras which may pan, tilt and/or zoom may also be used to track objects. The use of a PTZ (pan, tilt, zoom) camera system will typically reduce the number of cameras required for a given surveillance site and also thereby reduce the number and cost of the video feeds and system integration hardware such as multiplexers and switchers associated therewith. [0008] The invention comprises, in one form thereof, a video tracking system which includes a video camera having a field of view wherein the camera is selectively adjustable and adjustment of the camera varies the field of view of the camera. Also included is at least one processor which is operably coupled to the camera.); processing the image to determine an appropriate detection algorithm ([0008] The processor receives video images acquired by the camera and selectively adjusts the camera. The processor is programmed to detect a moving target object in the video images and adjust the camera to track the target object wherein the processor adjusts the camera at a plurality of varied adjustment rates.) ([0008] the processor is programmed to detect a moving target object in the video images [0043] Tracking unit 50 performs several functions, it controls video decoder 58 and captures video frames acquired by camera 22; it registers video frames taken at different times to remove the effects of camera motion; it performs a video content analysis to detect target objects which are in motion within the FOV of camera 22), wherein the detection algorithm circumscribes the object with a bounding box ([0084] If a moving target object is present in the images, the centroid of the target object is determined at block 120. At block 122 images I1 and I2 and the data associated therewith are swapped as described above with respect to block 98. At block 124 the size of the detected target object, i.e., the Blob_Size, is compared to a threshold value and, if the target object is not large enough, or if no target object has been found in the images, the process returns to block 84. If the target object is larger than the threshold size, the process continues on to block 100 through 106 where the adjustment parameters of camera 22 are determined and then communicated to camera 22 as described above.); determining whether the object is moving or stationary ([0008] processor is programmed to detect a moving target object in the video images); ([0043] calculates the relative direction, speed and size of the detected target objects), ([0052] Tracking unit 50 does not require the two images which are used to determine the motion of the POI to be taken with the camera having the same pan, tilt and focal length settings for each image. Instead, tracking unit 50 maps or aligns one of the images with the other image and then determines the relative velocity and direction of movement of the POI.); determining a second PTZ configuration; and orientating the camera in accordance with the second PTZ configuration ([0006] The camera may then remain stationary as the target object moves away from the center of the FOV and a new estimated future target location is computed. The camera will then be repositioned to once again recenter the target object. Such discrete camera movements are continually repeated to track the target object.); 
Roumeliotis discloses a method for visually localizing an individual ([Abstract] Vision-aided inertial navigation system (VINS)), the method comprising the steps of: capturing an image containing an object via a camera ([0022] FIG. 2 illustrates an example implementation of VINS 10 in further detail. Image source 12 of VINS 10 images an environment in which VINS 10 operates so as to produce image data 14. That is, image source 12 generates image data 14 that captures a number of features visible in the environment. Image source 12 may be, for example, one or more cameras that capture 2D or 3D images, a laser scanner or other optical device that produces a stream of 1D image data, a depth sensor that produces image data indicative of ranges for features within the environment, a stereo vision system having multiple cameras to produce 3D information, a Doppler radar and the like. In this way, image data 14 provides exteroceptive information as to the external environment in which VINS 10 operates. Moreover, image source 12 may capture and produce image data 14 at time intervals in accordance one or more clocks associated with the image source. In other words, image source 12 may produce image data 14 at each of a first set of time instances along a trajectory within the three-dimensional (3D) environment, wherein the image data captures features 15 within the 3D environment at each of the first time instances.); processing the image to determine an appropriate detection algorithm based on a characteristic of the object ([0026] Feature extraction and tracking module 12 extracts features 15 from image data 14 acquired by image source 12 and stores information describing the features in feature database 25. Feature extraction and tracking module 12 may, for example, perform corner and edge detection to identify features and track features 15 across images using, for example, the Kanade-Lucas-Tomasi (KLT) techniques described in Bruce D. Lucas and Takeo Kanade, An iterative image registration technique with an application to stereo vision, In Proc. of the International Joint Conference on Artificial Intelligence, pages 674-679, Vancouver, British Columbia, Aug.  24-28 1981, the entire content of which in incorporated herein by reference. [0027] Outlier rejection module 13 provides robust outlier rejection of measurements from image source 12 and IMU 16.  For example, outlier rejection module may apply a Mahalanobis distance tests to the feature measurements to identify and reject outliers. As one example, outlier rejection module 13 may apply a 2-Point Random sample consensus (RANSAC) technique described in Laurent Kneip, Margarita Chli, and Roland Siegwart, Robust Real-Time Visual Odometty with a Single Camera and an Imu, In Proc.  of the British Machine Vision Conference, pages 16.1-16.11, Dundee, Scotland, Aug.  29-Sep. 2 2011, the entire content of which in incorporated herein by reference.); estimating a position of the object in relation to one of a user or other objects, wherein the position is estimated using a ([0020] As described in detail herein, VINS 10 includes a hardware-based computing platform that implements an estimator that fuses the image data and the IMU data to perform localization of VINS 10 within environment 10. That is, based on the image data and the IMU data, VINS 10 determines, at discrete points along the trajectory of VINS as the VINS traverses environment 2, poses (position and orientation) of VINS 10 as well as positions of features 15. [0021] As described herein, in one example implementation, VINS 10 implements an inverse, sliding-window filter (ISWF) for processing inertial and visual measurements. That is, estimator 22 applies the ISWF to process image data 14 and IMU data 18 to estimate the 3D IMU pose and velocity together with the time-varying IMU biases and to produce, based on the captured image data, estimates for poses of VINS 10 along the trajectory and an overall map of visual features 15.), and storing the position of the object in a map memory, wherein the map memory contains a map of an environment surrounding the object; determining a second ([0020] That is, estimator 22 applies the ISWF to process image data 14 and IMU data 18 to estimate the 3D IMU pose and velocity together with the time-varying IMU biases and to produce, based on the captured image data, estimates for poses of VINS 10 along the trajectory and an overall map of visual features 15.); and generating a real-time map of the local environment in a GPS-denied environment ([0019] Environment 2 may, for example, represent an environment where conventional GPS-signals are unavailable for navigation, such as on the moon or a different planet or even underwater.), wherein the real-time map reflects a current position of the camera and the position of the object ([0019] VINS 10 may perform simultaneous localization and mapping (SLAM) by constructing a map of environment 2 while simultaneously determining the position and orientation of VINS 10 as the VINS traverses the environment [0019] Features 15, also referred to as landmarks, represent objects visible within environment 2, such as rocks, trees, signs, walls, stairs, chairs, tables, and the like. Features 15 may be moving or stationary objects within environment 2. [0020] Utilizing these techniques, VINS 10 may navigate environment 2 and, in some cases, may construct a map of the environment including the positions of features 15.).
Fahn discloses a method for visually localizing an individual ([Title] System and Method for Object Selection in a Handheld Image Capture Device), the method comprising the steps of: capturing an image containing an object via a camera using a first pan, tilt, and/or zoom (PTZ) configuration ([0034] FIG. 1 shows an imager with a multiple-axis actuating mechanism (hereinafter "MAAM imager"). The MAAM imager shown in FIG. 1 is a single-imager handheld camera with a multiple-axis actuating mechanism (herein after a "single-imager MAAM camera"). The single-imager MAAM camera 100 includes a camera body 120 and an actuated imager 110. [0050] Initially, the bicyclist 401, being located inside the wide field of view a 350, is selected as the object of interest to be centered.); processing the image to determine an appropriate detection algorithm ([0053-0054] auto-rotate process, auto-center process, auto-zoom process) ([0050] This selection of an object of interest is performed by an object selection module which will be described in detail in reference to FIG. 9 below. The bicyclist 401 appears in the upper right portion of an image field 410 of the static imager 330 (FIG. 3B) which is defined by the wide field of view a 350. Subsequently, the stationary imager 330 continues to track the bicyclist to a later position 402 at a later time using a software algorithm. Such software-based tracking may be achieved by optic flow or other vision algorithms (e.g., O'Sullivan, Igoe, Physical Computing: Sensing and Controlling the Physical World with Computers, Chapter 9, Thomson Course Tech., 2004). The SwisTrack tool (see, e.g., SwisTrack: A Tracking Tool for Multi-Unit Robotic and Biological Systems, by Correll, Nikolaus; Sempo, Gregory; Lopez de Meneses, Yuri; Halloy, Jose; Deneubourg, Jean-Louis; Martinoli, Alcherio, in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (2006), p. 2185-2191, 2006) can be used for trajectory tracking of multiple moving objects, with its core image manipulation functions provided by Intel Corporation's Open Source Computer Vision Library ("OpenCV Library"), for example. A visual tracking or video tracking system can also be used, which includes algorithms such as, but not limited to: blob tracking, kernel-based tracking, contour tracking, Kalman filters, and particle filters. Based on the image provided by the stationary imager 330, the object selection module calculates the object location information regarding the center coordinate of the bicyclist in its image field 410.); determining whether the object is moving or stationary ([0052] An object of interest 501, such as a bicyclist in the illustration, may be moving or stationary.); in response to determining the object is stationary ([0050] In case the object of interest 401 remains stationary, the actuated imager 310 will initially move the lens so as to center the object of interest in its image field 420 based on the object location. [0052] For the purpose of illustration of the auto-zoom capability, the object of interest is assumed to be stationary. This is because even if the object is moving in an absolute sense with respect to the background, the object remains stationary in a relative sense within an image field 550 and 560 of the actuated imager 310 due to the auto-center process as discussed above in reference to FIG. 4): estimating a position of the object in relation to one of a user or other ([0045] The single-imager MAAM camera, such as that shown in FIGS. 2A and 2B may be used for an auto-centering purpose, e.g., centering an object of interest in the center of the captured image field. In other embodiments, the object of interest could be centered in a particular zone or placed at the intersection of particular zones of the captured image field. Assuming an object of interest is selected, the selected object may be centered automatically by a combination of the panning and the tilting motions of the actuated imager. The auto-center feature will be described in detail in reference to FIG. 4 below.), wherein the position is estimated using a Kalman filter ([0050] A visual tracking or video tracking system can also be used, which includes algorithms such as, but not limited to: blob tracking, kernel-based tracking, contour tracking, Kalman filters, and particle filters.) and inertial measurements from an inertial measurement unit (IMU) ([0065] In some embodiments, the camera body movement detection unit 931 is based on an inertial sensor such as a MEMS-based accelerometer available from Analog Devices (Norwood, Mass.), for example.), and ([0050] Meanwhile, the actuated imager 310, based on object location information, initially moves the lens using one or both of the auto-pan DOF and the auto-tilt DOF so as to bring the image of the bicyclist to the center of its image field 420, which is defined by the standard field of view β 340. Subsequently, the actuated imager 310 continues to move the lens to physically track the moving object, based on the object location information, so that the bicyclist at the later position and time 402 remains centered within the image field 420 of the actuated imager.).
 based on a characteristic of the object; selecting the appropriate detection algorithm from a library of detection algorithms and a surrounding environment ([0004] However, in different monitor environments and scenarios, properties, forms, and moving tendencies of the tracked object as well as types of the monitor environment are all different. Therefore, an algorithm designer generally designs a suitable algorithm process according to the monitor environment and scenario to detect and track objects accurately and efficiently. Most of the object tracking algorithms are adopted to detect and track people or object, such as vehicles, in an opening space.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the use of the IMU in the camera’s housing (VINS) and the use of SLAM for constructing a map of the environment surrounding the camera’s housing (VINS) as taught by Roumeliotis. The motivation behind this modification would have been to determine the position and orientation of the VINS (including camera) and the moving and stationary features in the environment using the IMU which can accurately track dynamic motions over short time durations in GPS-denied environments [See Roumeliotis]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the ability to determine if the object is stationary or moving and center the object in the image when the object is stationary as taught by Fahn in order to improve object detection through the image capture. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak and Fahn with use of the 

In regards to claim 18, the limitations of claim 17 have been addressed. Sablak discloses wherein computer vision is used in at least one of the steps of: processing the image ([0008] The processor receives video images acquired by the camera and selectively adjusts the camera. The processor is programmed to detect a moving target object in the video images and adjust the camera to track the target object wherein the processor adjusts the camera at a plurality of varied adjustment rates.), ([0008] the processor is programmed to detect a moving target object in the video images [0043] Tracking unit 50 performs several functions, it controls video decoder 58 and captures video frames acquired by camera 22; it registers video frames taken at different times to remove the effects of camera motion; it performs a video content analysis to detect target objects which are in motion within the FOV of camera 22), and determining whether the object is moving or stationary ([0008] processor is programmed to detect a moving target object in the video images).
Fahn discloses wherein computer vision is used in at least one of the steps of: processing the image ([0053-0054] auto-rotate process, auto-center process, auto-zoom process), selecting the appropriate detection algorithm ([0050] This selection of an object of interest is performed by an object selection module which will be described in detail in reference to FIG. 9 below. The bicyclist 401 appears in the upper right portion of an image field 410 of the static imager 330 (FIG. 3B) which is defined by the wide field of view a 350. Subsequently, the stationary imager 330 continues to track the bicyclist to a later position 402 at a later time using a software algorithm. Such software-based tracking may be achieved by optic flow or other vision algorithms (e.g., O'Sullivan, Igoe, Physical Computing: Sensing and Controlling the Physical World with Computers, Chapter 9, Thomson Course Tech., 2004). The SwisTrack tool (see, e.g., SwisTrack: A Tracking Tool for Multi-Unit Robotic and Biological Systems, by Correll, Nikolaus; Sempo, Gregory; Lopez de Meneses, Yuri; Halloy, Jose; Deneubourg, Jean-Louis; Martinoli, Alcherio, in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (2006), p. 2185-2191, 2006) can be used for trajectory tracking of multiple moving objects, with its core image manipulation functions provided by Intel Corporation's Open Source Computer Vision Library ("OpenCV Library"), for example. A visual tracking or video tracking system can also be used, which includes algorithms such as, but not limited to: blob tracking, kernel-based tracking, contour tracking, Kalman filters, and particle filters. Based on the image provided by the stationary imager 330, the object selection module calculates the object location information regarding the center coordinate of the bicyclist in its image field 410.), detecting an object within the image ([0050] In case the object of interest 401 remains stationary, the actuated imager 310 will initially move the lens so as to center the object of interest in its image field 420 based on the object location. [0052] For the purpose of illustration of the auto-zoom capability, the object of interest is assumed to be stationary. This is because even if the object is moving in an absolute sense with respect to the background, the object remains stationary in a relative sense within an image field 550 and 560 of the actuated imager 310 due to the auto-center process as discussed above in reference to FIG. 4), and determining whether the object is moving or stationary ([0052] An object of interest 501, such as a bicyclist in the illustration, may be moving or stationary.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the teachings of Fahn in order to improve object detection through the image capture.

In regards to claim 19, the limitations of claim 17 have been addressed. Sablak discloses wherein the camera comprises a plurality of cameras that have omnidirectional coverage between them ([0039] Illustrated system 20 is a single camera application, however, the present invention may be used within a larger surveillance system having additional cameras which may be either stationary or moveable cameras or some combination thereof to provide coverage of a larger or more complex surveillance area.).

In regards to claim 20, the limitations of claim 17 have been addressed. Sablak discloses further comprising the step of sharing at least one of estimated position and/or map information with an external device ([0013] The tracking system may also include a display device and an input device operably coupled to said system wherein an operator may view the video images on the display device and input commands or data into the system through the input device. The display device and input device may be positioned remotely from said camera. [0092] The present invention can be used in many environments where it is desirable to have video surveillance capabilities. For example, system 20 may be used to monitor manufacturing and warehouse facilities and track individuals who enter restricted areas. Head end unit 32 with display 38 and input devices 34 and 36 may be positioned at a location remote from the area being surveyed by camera 22 such as a guard room at another location in the building. Although system 20 includes a method for automatically detecting a target object, the manual selection of a target object by a human operator, such as by the operation of joystick 36, could also be employed with the present invention. After manual selection of the target object, system 20 would track the target object as described above for target objects identified automatically.).

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Sablak in view of Roumeliotis in further view of Sriram in even further view of Goh [US 2010/0165114 A1].
In regards to claim 21, the limitations of claim 1 have been addressed. Sablak fails to explicitly disclose wherein the detection algorithm is selected from a library of detection algorithms based on a characteristic of the moveable object determining using the image data.
Goh discloses wherein the detection algorithm is selected from a library of detection algorithms based on a characteristic of the moveable object determining using the image data ([0051] Accordingly, the digital photographing apparatus 100 may use various types of object recognizing algorithms. An object recognizing algorithm is selected according to a type of an object to be detected. For example, when a face of a person is to be detected, a face recognizing algorithm is used.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sablak with the selection of the object 
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kaitlin A Retallick whose telephone number is (571)270-3841.  The examiner can normally be reached on Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chris Kelley can be reached on (571) 272-7331.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/KAITLIN A RETALLICK/             Primary Examiner, Art Unit 2482