DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/21/2020 is being considered by the examiner.

EXAMINER’S AMENDMENT
Authorization for this examiner’s amendment was given in an interview with Karthik Murthy on 7/26/2022.
The application has been amended as follows: 
1. (Currently Amended) A detection and tracking system using a camera unit on a robot for safety monitoring for use in and around water-related environments comprising: 
a processing unit enabling real-time detection and tracking of objects; 
wherein the robot is able to propel itself and move throughout a body of water, both on a surface and underwater; 
wherein the camera unit functions both on the surface and underwater; 
wherein the robot optimizes a cleaning cycle of the body of water utilizing deep learning techniques; 
wherein the robot has localization sensors and software that allow the robot to be aware of a position of the robot in a pool; 
wherein the camera unit is able to send its video feed live over an internet; wherein processing is performed in a cloud; wherein the robot sends and receives data from the cloud; 
wherein the processing utilizes deep learning algorithms, including artificial neural networks, that perform video analytics using a method comprising the following steps: 
a. areas of interest around the body of water are defined upon initial set up of the system; in a man-made body of water, area surrounding the pool would be defined as Area 1 and the pool itself would be defined as Area 2; in a context of an ocean, such interest areas are a beach area, the ocean and any other pre-defined areas; 
1b. in each frame the system extracts features and uses the deep learning algorithms to identify if an image consists of a person and/or object defined by the system; this analysis is performed in real-time with no time delay; 
c. the system recognizes and distinguishes between different types of objects, as well as stationary objects; 
d. an identification and classification of each 

3. (Currently Amended) The system of claim 1 further comprising: 
self-learning capabilities that provide flexibility and a user specific operation; 
wherein the self-learning alarm in presence of an intruder or under age user while avoiding false alarms when an authorized person is using the body of water; 
the self-learning capabilities also provide the robot with an ability to navigate through the body of water without impacting any persons or the objects; 
2the self-learning capabilities also provide the robot with an ability to closely track the persons in the body of water at an increasingly level of accuracy.

4. (Currently Amended) The system of claim 1, further comprising: 
wherein boundaries between areas of interest are defined; 
one boundary is an area in the vicinity of the pool; 
another boundary is for the pool area itself; 
additional areas of interest may be defined by a user; 
the camera unit acquires an image of the unit that includes individuals in a swimming pool; 
the camera unit produces an image which includes a full view of head and shoulders of each individual; 
this image of the 
this clip image is then transferred to an automated face locator which performs a function of registering position and orientation of the face in the image of the 
a location of the face is determined in two phases: 
first, the clip image is found by defining a bounding box, resulting in a bounding box 3based image;
 [[S]]second, a location of eyes of the individual is determined;
 [[O]]once the location of the eyes is determined, the face is rotated about an axis located at a midpoint (gaze point) between the eyes to achieve a precise vertical alignment of the eyes; 
this results in the automated face locator having a relatively precise alignment of a test image with the reference image; 
then the automated face locator will be used to locate the face in the test image; 
the clip image defined by the bounding box will not include hair.

6. (Currently Amended) The system of claim 1, further comprising: 
an automated face verifier receives a clipped and registered reference image and a test image and makes a determination of whether the persons depicted in the reference image and [[a]]the test image are the same or are different; 
this determination is made using a neural network which has been previously trained on numerous faces to make this determination; 
once trained, the automated face verifier is able to make the determination without having actually been exposed to a face of the individual whose face is being verified.

7. (Currently Amended) The system of claim 6, further comprising: 
[[a]]the test image and [[a]]the reference image are acquired; 
these images are then both processed by a clip processor which defines a bounding box containing predetermined portions of each face;
 the reference image may be stored in various ways[[;]]: 
i) either an entire image of [[the]]a previous facial image may be recorded 
ii) or only a previously derived clip may be stored; 
iii) or a clip that is compressed in a compression method for storage may be stored which is then decompressed from storage for use;
 iv) or some other parameterization of the clip may be stored and accessed later to reduce an amount of storage capacity required 
v) or the reference image may be stored in a database;
 5then the reference and test images are clipped, this occurs in two stages: 
first, a coarse location of a silhouette of a face is found; 
next, a first neural network is used to find a precise bounding box; 
 a region of this bounding box is defined vertically to be from just below a chin to just above a natural hair line, or implied natural hair line if the person is bald; 
a horizontal region of the face in a clipping region is defined to be between a beginning of [[the]] ears at [[the]]a back of [[the]]a cheek on both sides of the face;
 if one ear is not visible because the face is turned at an angle, the clipping region is defined to be an edge of the cheek or nose, whichever is more extreme; 
this process performed by the clip processor; 
next, a second neural network is used to locate eyes; 
 a resulting image of the eyes is then rotated about a gaze point; 
the above steps are repeated for both the reference and the test images; 
the test image and reference image are then registered using position of the eyes as reference points.

8. (Currently Amended) The system of claim 7, further comprising: 
the registered images are normalized; 
6this includes normalizing each feature value by a mean of all the feature values; 
components of [[the]] input image vectors represent a measure of a feature at a certain location, and these components comprise continuous valued numbers; 
next, a third neural network is used to perform a verification of [[the]]a match or mismatch between two faces;
 first, weights are assigned; 
location of the weights and features are registered; 
once the weight assignments are made, appropriate weights in [[a ]]the third neural network are selected;
assigned reference weights comprise a first weight vector and assigned test weights comprise a second weight vector; 
[[a ]]the third neural network then determines a normalized dot product of the first weight vector and the second weight vector;
 this is a dot product of vectors on[[ an]] a unit circle in N dimensioned space, wherein each weight vector is first normalized relative to its length; 
a result is a number which is [[the]]an output of the third neural network; 
this output is then compared to threshold outputs;
above the threshold outputs indicate 

9. (Currently Amended) The system of claim 1, further comprising: 
an acquired image may comprise either a test or [[the]]a reference image;
 the reference image includes a face of the person as well as additional portions such as neck and shoulders and also will include background clutter;
 an image subtraction process is performed to subtract the background; 
an image of the background without the face is acquired; 
the image of the face and background is then subtracted from the background;
a result is a facial image without the background; 
then non-adaptive edge detection image processing techniques are used to determine a very coarse location of a silhouette of the face;
 next the reference image is scaled down by a factor of 20; 
this results in a hierarchy of resolutions; 
if the test and reference images are first scaled down to have coarsely scaled inputs then convolutions will yield a measure of more coarse features;
 conversely, if higher resolution inputs are used with a same size and type kernel convolution then the convolutions will yield finer resolution features;
 thus, a scaling process results in a plurality of features at different sizes; 
the next step is to perform a convolution on the scaled reference image;
 the convolutions used have zero-sum kernel coefficients; 
8a plurality of distributions of coefficients are used in order to achieve a plurality of different feature types, including a center surround, vertical and/or horizontal bars;
 this results in different feature types at each different scale; 
then repeated for a plurality of scales and convolution kernels this results in a feature space set composed of a number of scales, a number of features, based on a number of kernels; 
this feature space then becomes an input to a neural network;
 this comprises a conventional single layer linear proportional neural network which has been trained to produce as output coordinates of the four corners of the desired bounding box when given a facial outline image as input.

10. (Currently Amended) The system of claim 9, further comprising: 
the processing unit locating a given feature on the face and registering the corresponding coarse features in the reference and test images before performing the comparison process; 
wherein the coarse feature used is eyes; 
wherein an adaptive neural network is used to find a location of each of the eyes; 
first, data outside the bounding box resulting from the feature space is eliminated;
 this feature space is input into a neural network, which has been trained to generate a x coordinate point of a single point, referred to as a "mean gaze"; 
the mean gaze is defined as the mean gaze position along a horizontal axis between the eyes referred to 9earlier;
 a x position of [[the]]a left and right eye are added together and divided by two to derive the mean gaze position;
 the neural network is trained with known faces in various orientations to generate as output theposition; 
once the mean gaze is determined, a determination is made of which of five bands along the horizontal axis the mean gaze falls into; 
a number of categories of where the mean gaze occurs are created; 
wherever a computed mean gaze is located on the x coordinate will determine which band it falls into; 
this will determine which of five neural networks will be used to find the location of the eyes; 
next, [[the]]a feature set is input to the neural network determined earlier; 
the neural network determined earlier has been trained to determine x and y coordinates of eyes having the mean gaze in a selected band, wherein the x coordinate is different from the x coordinate mentioned earlier.

11. (Currently Amended) The system of claim 10, further comprising: 
wherein the robot acts as a game platform such that the camera unit on the robot 10facilitates augmented reality games; 
wherein a user of those games is not in the pool, but instead is using a mobile device outside the pool; 
wherein the augmented reality game on the mobile device constantly receives and sends data to the cloud, and the robot constantly sends and receives data to the cloud, such that the augmented reality game involves creatures and locations throughout the pool that are not physically present, but are visible to the user on the mobile device; 
wherein the data the robot sends to the cloud is based on the localization sensors and software of the robot, and this data is used as part of the augmented reality game to display the creatures and other aspects of the augmented reality game.

12. (Currently Amended) The system of claim 10, further comprising: 
wherein the robot utilizes deep learning, a form of machine learning, in order to navigate more skillfully and efficiently, and in order to keep better track of persons in the body of water; 
wherein the robot utilizes the deep learning by using a cascade of multiple layers of nonlinear processing units for feature extraction and transformation; 
wherein each successive layer uses output from a previous layer as input; 
wherein learning is unsupervised and in a form of pattern analysis; 
wherein the robot analyses local geographic features of the body of water, and uses 11pattern analysis mentioned earlier to learn different and optimal directions to move in; 
wherein the robot tracks the persons and avoids any persons in the body of water; 
wherein the robot attempts to learn to predict movement of the persons and the objects in the body of water so as to avoid them more skillfully, and track them more skillfully.

13. (Currently Amended) A detection and tracking system using a camera mounted inside a pool overlooking a bottom of the pool, and a robot that uses a video from the camera for navigation and for safety monitoring for use in and around water-related environments comprising: 
a processing unit enabling real-time detection and tracking of objects; 
wherein the robot is able to propel itself and move throughout a body of water, both on a surface and underwater; 
wherein the robot optimizes a cleaning cycle of the body of water utilizing deep learning techniques; 
wherein the robot has localization sensors and software that allow the robot to be aware of a position of the robot in the pool; 
wherein the camera is able to send its video feed live over an internet; 
wherein the processing is performed in a cloud; 
wherein the robot sends and receives data from the cloud; 
12wherein processing utilizes deep learning algorithms, including artificial neural networks, that perform video analytics using a method comprising the following steps: 
a. areas of interest around the body of water are defined upon initial set up of the system; in a man-made body of water, area surrounding the pool would be defined as Area 1 and the pool itself would be defined as Area 2; in a context of [[the]]an ocean, such interest areas are a beach area, the ocean and any other pre-defined areas;
 b. in each frame the system extracts features and uses the deep learning algorithms to identify if an image consists of a person and/or object defined by the system; this analysis is performed in real-time with no time delay; 
c. the system recognizes and distinguishes between different types of objects, as well as stationary objects;
 d. identification and classification of each 

14. (Currently Amended) The system of claim 13, further comprising: 
wherein the robot acts as a game platform such that the camera the bottom of the pool and the robot together facilitate augmented reality games; 
wherein a user of those games is not in the pool, but instead is using a mobile device 13outside the pool; 
wherein the augmented reality game on the mobile device constantly receives and sends data to the cloud, the robot constantly sends and receives data to the cloud, and the camera mounted inside the pool overlooking the bottom of the pool constantly sends and receives data to the cloud, such that the augmented reality game involves creatures and locations throughout the pool that are not physically present, but are visible to the user on the mobile device; 
wherein the data the robot sends to the cloud is based on the localization sensors and software of the robot, and this data is used as part of the augmented reality game to display the creatures and other aspects of the augmented reality game.

15. (Currently Amended) A detection and tracking method using a camera unit on a robot for safety monitoring for use in and around water-related environments comprising: 
a processing unit enabling real-time detection and tracking of objects; 
wherein the robot is able to propel itself and move throughout a body of water, both on a surface and underwater; 
wherein the camera unit functions both on the surface and underwater; 
wherein the robot optimizes a cleaning cycle of the body of water utilizing deep learning techniques; 
wherein the robot has localization sensors and software that allow the robot to be aware of a position of the robot in a pool; 
14wherein the camera is able to send its video feed live over an internet; 
wherein processing is performed in a cloud; 
wherein the robot sends and receives data from the cloud; 
wherein the processing utilizes deep learning algorithms, including artificial neural networks, that perform video analytics using a method comprising the following steps:
 a. areas of interest around the body of water are defined upon initial set up of the system; in a man-made body of water, area surrounding the pool would be defined as Area 1 and the pool itself would be defined as Area 2; in a context of [[the]]an ocean, such interest areas are a beach area, the ocean and any other pre-defined areas;
 b. in each frame the system extracts features and uses the deep learning algorithms to identify if an image consists of a person and/or object defined by the system; this analysis is performed in real-time with no time delay; 
c. the system recognizes and distinguishes between different types of objects, as well as stationary objects; 
d. identification and classification of each 

17. (Currently Amended) The method of claim 15 further comprising: 
self-learning capabilities that provide flexibility and a user specific operation;
 the self-learning capabilities can be used to detect and sound an alarm in [[the]] presence of an intruder or under age user while avoiding false alarms when an authorized person is using the body of water;
 the self-learning capabilities also provide the robot with an ability to navigate through the body of water without impacting any persons or the objects; 
the self-learning capabilities also provide the robot with an ability to closely track the persons in the body of water at an increasing[[ly]] level of accuracy.

18. (Currently Amended) The method of claim 15, further comprising: 
wherein boundaries between areas of interest are defined; 
one boundary is an area in [[the]]a vicinity of the pool; 
another boundary is for the pool area itself; 
additional areas of interest may be defined by a user; 
the camera unit acquires an image of the unit that includes individuals in a swimming pool; 
the camera unit produces an image which includes an entire head and shoulders of each individual; 
the image of the object is adaptively clipped to include just an immediate area of a face of the individual to yield a clip which is a same size as a reference image; 
this clip image is then transferred to an automated face locator which performs a function of registering position and orientation of the face in the image;
 location of the face is determined in two phases:
 first, the clip image is found by defining a bounding box, resulting in a bounding box based image; 
[[S]]second, location of eyes of the individual is determined; 
[[O]]once the location of the eyes is determined, the face is rotated about an axis located at a midpoint (gaze point) between the eyes to achieve a precise vertical alignment of the eyes; 
this results in the automated face locator having a relatively precise alignment of a test image with the reference image; 
then the automated face locator will be used to locate the face in the test image;
 the clip image defined by the bounding box will not include hair.

19. (Currently Amended) The method of claim 15, further comprising: 
an automated face verifier receives a clipped and registered reference image and a test image and makes a determination of whether the persons depicted in the reference image and test image are the same or are different; 
this determination is made using a neural network which has been previously trained on numerous faces to make this determination;
 once trained, the automated face verifier is able to make the determination without having actually been exposed to a face of an individual whose face is being verified; 
wherein the test image and [[a]]the reference image are acquired;
 these images are then both processed by a clip processor which defines a bounding box containing predetermined portions of each face;
 the reference image may be stored in various ways[[;]]:
 vi) either an entire image of the previous facial image may be recorded; 
vii) or only a previously derived clip may be stored; 
viii) or a clip that is compressed in a compression method for storage may be stored which is then decompressed from storage for use;
 ix) or some other parameterization of the clip may be stored and accessed later to reduce an amount of storage capacity required; 
x) or the reference image may be stored in a database; 
18then the reference and test images are clipped, this occurs in two stages: 
first, a coarse location of a silhouette of a face is found; 
next, a first neural network is used to find a precise bounding box; 
a region of this bounding box is defined vertically to be from just below a chin to just above a natural hair line, or implied natural hair line if the person is bald; 
a horizontal region of the face in this clipping region is defined to be between a beginning of [[the]] ears at [[the]]a back of [[the]]a cheek on both sides of the face;
 if one ear is not visible because the face is turned at an angle, a clipping region is defined to be an edge of the cheek or nose, whichever is more extreme; 
this process performed by the clip processor; 
next, a second neural network is used to locate eyes; 
a resulting image of the eyes is then rotated about a gaze point; 
the above steps are repeated for both the reference and the test images; 
the test image and reference image are then registered using a position of the eyes as reference points.

20. (Currently Amended) The method of claim 19, further comprising: 
wherein the robot acts as a game platform such that the camera unit overlooking a bottom of the pool and the robot together facilitate augmented reality games; 
19wherein a user of those games is not in the pool, but instead is using a mobile device outside the pool; 
wherein the augmented reality game on the mobile device constantly receives and sends data to the cloud, the robot constantly sends and receives data to the cloud, and the camera unit mounted inside the pool overlooking the bottom of the pool constantly sends and receives data to the cloud, such that the augmented reality game involves creatures and locations throughout the pool that are not physically present, but are visible to the user on the mobile device; 
wherein the data the robot sends to the cloud is based on the localization sensors and software of the robot, and this data is used as part of the augmented reality game to display the creatures and other aspects of the augmented reality game.

Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: The closet prior art of record and Bennet et al. (US 10,942,990 B2), Pichon (US 2018/0266134 A1), and Goldenberg et al. (US 2018/0073265 A1) fail to anticipate or render obvious at least the following limitations in independent claim 1,13, and 15 (i.e., from examiner’s amendment), such as, a processing unit enabling real-time detection and tracking of objects; wherein the robot is able to propel itself and move throughout a body of water, both on a surface and underwater; wherein the camera unit functions both on the surface and underwater; wherein the robot optimizes a cleaning cycle of the body of water utilizing deep learning techniques; wherein the robot has localization sensors and software that allow the robot to be aware of a position of the robot in a pool; wherein the camera unit is able to send its video feed live over an internet; wherein processing is performed in a cloud; wherein the robot sends and receives data from the cloud; wherein the processing utilizes deep learning algorithms, including artificial neural networks, that perform video analytics using a method comprising the following steps: a. areas of interest around the body of water are defined upon initial set up of the system; in a man-made body of water, area surrounding the pool would be defined as Area 1 and the pool itself would be defined as Area 2; in a context of an ocean, such interest areas are a beach area, the ocean and any other pre-defined areas; 1b. in each frame the system extracts features and uses the deep learning algorithms to identify if an image consists of a person and/or object defined by the system; this analysis is performed in real-time with no time delay; c. the system recognizes and distinguishes between different types of objects, as well as stationary objects; d. an identification and classification of each object is then cross referenced with a specific location of the object as recognized by the system and the areas of interest as outlined in point a above.
Claims 2-12, 14, and 16-20 depend on independent claims 1, 13, and 15, respectively, and are therefore also allowed.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daniella M. DiGuglielmo whose telephone number is (571)272-2682. The examiner can normally be reached Monday - Friday 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on 571-272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Daniella M. DiGuglielmo/Examiner, Art Unit 2664                                                                                                                                                                                                        
/PING Y HSIEH/Primary Examiner, Art Unit 2664