DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 11069036. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-20 of 11069036 disclose every limitation of claims 1-20 in the instant application.


DETAILED ACTION
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 15-20 are not directed to one of the four patent-eligible subject matter categories: process, machine, manufacture, or composition of matter. The subject matter of the claim must be directed to one of the four subject matter categories. If it is not, the claim is not eligible for patent protection and should be rejected under 35 U.S.C. 101, for at least this reason. For example, the claims involves a computer program per se, Gottschalk v. Benson, 409 U.S. at 72, 175 USPQ at 676-77.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-2, 4, 8-9, 11, 15-16, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eswara (PGPUB: 20200320665) in view of Lee (PGPUB: 20200275017).

Regarding claims 1, 8, and 15. Eswara teaches a system, comprising: 
a processor that executes computer-executable components stored in a computer-readable memory (see Fig. 2), the computer-executable components comprising: 
a detection component that localizes a face depicted in a frame of a video stream (see Fig. 1, paragraph 33, pixels representing one or more moving objects (e.g., one or more moving people) may be detected based on the difference between a current input video frame); 
an anonymization component that anonymizes pixels in the frame that correspond to the face (see Fig. 1, paragraph 57, the moving object 76 may be obscured by a fuzz ball in areas where the pixels may have a color which falls within one or more color ranges associated with human skin (e.g., face, hands, neck), as indicated by 90).
However, Eswara does not expressly teach a tracking component that tracks the face in a subsequent frame of the video stream based on a structural similarity index between the frame and the subsequent frame satisfying a threshold.
Lee teaches since the processor 120 successfully detects the face from an image frame in the video stream, the processor 120 can execute the face tracking program to continuously determine the position of the face in continuous image frames, thereby continuously tracking the movement of the face in the target area (see Fig. 2B, paragraph 40); when the processor 120 determines that the similarity between the face appearing in the video stream and a face in the at least one face information exceeds a similarity threshold (such as 80%), the processor 120 determines that the face appearing in the video stream matches the face in the at least one face information. Therefore, the processor 120 can determine that the face in the at least one face information appears in the target area (see Fig. 1, paragraph 36). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Lee for providing the processor 120 can execute the face tracking program to continuously determine the position of the face in continuous image frames, thereby continuously tracking the movement of the face in the target area and when the processor 120 determines that the similarity between the face appearing in the video stream and a face in the at least one face information exceeds a similarity threshold (such as 80%), the processor 120 determines that the face appearing in the video stream matches the face in the at least one face information, as a tracking component that tracks the face in a subsequent frame of the video stream based on a structural similarity index between the frame and the subsequent frame satisfying a threshold. Therefore, the combination of the teaching, suggestion, or motivation in the prior art would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention.

Regarding claims 2, 9, and 16. The combination teaches the system of claim 1, wherein localizing the face via the detection component consumes more computational resources than tracking the face via the tracking component, and wherein, if the structural similarity index satisfies the threshold, the tracking component tracks the face in the subsequent frame, the detection component refrains from localizing the face in the subsequent frame (see Lee, Fig. 1, paragraph 36, when the processor 120 determines that the similarity between the face appearing in the video stream and a face in the at least one face information exceeds a similarity threshold (such as 80%), the processor 120 determines that the face appearing in the video stream matches the face in the at least one face information. Therefore, the processor 120 can determine that the face in the at least one face information appears in the target area), and 
the anonymization component anonymizes pixels in the subsequent frame corresponding to the face (see Eswara, Fig. 6A).

Regarding claims 4, 11, and 18. The combination teaches the system of claim 1, wherein the threshold is 80% (see Lee, Fig. 1, paragraph 36, when the processor 120 determines that the similarity between the face appearing in the video stream and a face in the at least one face information exceeds a similarity threshold (such as 80%), the processor 120 determines that the face appearing in the video stream matches the face in the at least one face information. Therefore, the processor 120 can determine that the face in the at least one face information appears in the target area).


Claim(s) 3, 10, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eswara (PGPUB: 20200320665) in view of Lee (PGPUB: 20200275017), and further in view of Choi (PGPUB: 20120093368).

Regarding claims 3, 10, and 17. The combination teaches the system of claim 1, wherein localizing the face via the detection component consumes more computational resources than tracking the face via the tracking component, and the detection component localizes the face in the subsequent frame (see Fig. 2, paragraph 65, when the processor 120 determines that the distance between the human figure and the image capture device 130 is less than or equal to the predetermined distance, the processor 120 can execute the face tracking program again to continue tracking the face in the continuous image frames, which is the same as the operation of step S205), and 
the anonymization component anonymizes pixels in the subsequent frame corresponding to the face (see Eswara, Fig. 6A).
However, the combination does not expressly teach wherein, if the structural similarity index fails to satisfy the threshold, the tracking component refrains from tracking the face in the subsequent frame.
Choi teach that the subject tracking apparatus 100 in accordance with an example embodiment of the present invention may confirm that the second block is a block indicating the region of the face A in the second frame as shown at a process of 421. To the contrary, if a degree of similarity between the first block in the first frame and the second block in the second frame as the result of block matching is less than the predetermined threshold, it may be judged that the result of block matching is not reliable (see Fig. 4, paragraph 38).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Choi for providing if a degree of similarity between the first block in the first frame and the second block in the second frame as the result of block matching is less than the predetermined threshold, it may be judged that the result of block matching is not reliable, as wherein, if the structural similarity index fails to satisfy the threshold, the tracking component refrains from tracking the face in the subsequent frame. Therefore, combining the elements from prior arts according to known methods and technique, such as frames matching is less than the predetermined threshold, would yield predictable results.


Claim(s) 5, 12, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eswara (PGPUB: 20200320665) in view of Lee (PGPUB: 20200275017), and further in view of Dehghan (PGPUB: 20200082549).

Regarding claims 5, 12, and 19. The combination does not expressly teach the combination teaches the system of claim 1, wherein the detection component localizes the face by:
executing a first machine learning algorithm on the frame, thereby generating a bounding box around a person depicted in the frame; 
executing a second machine learning algorithm on the bounding box, thereby generating a heatmap depicting key points of an anatomy of the person; and identifying one or more facial key points in the heatmap.
Dehghan teaches that a detection neural network 422 may be run on a subset of frames of input video 402. These detection frames may be stored in a frame memory buffer 426, and detection unit 422 may detect the location of objects, such as faces, in the detection frames. A location of a detected object may be indicated, for example, by a bounding box within the frame of video, or by an indication of the shape of an object and the location of the shape within the frame of video. Cropping unit 428 may crop the detection frame stored the frame memory 426 based on the locations of objects determined by detection neural network 422. Cropped object images may be provided to object tracking unit 440 and tracking neural network 442 and object analysis unit 462. Tracking neural network 442 may track changes in a detected object's location based on a current frame and the object image from a previous detection frame to determine a new location and a tracking score (see Fig. 4, paragraph 31); Object detection 820 and may produce a location map including location measures for the detected object(s) in various regions within the detected frame, and object tracking 840 may produce a location map including location measures for the tracked objected in the various regions within a tracked frame (see Fig. 8, paragraph 44); a direct output of the detection neural network and the tracking neural network may include a location map. A location map may be a “heat map” of an object location, where each entry in the map corresponds to a frame region, such as a pixel or group of pixels, and each entry indicates the likelihood that a portion of the object is contained within that entry's corresponding region. Map compare 842 may then compare the similarity of these location heat maps, for example by calculating a Kullback-Leibler (KL) divergence score (see Fig. 4 and 8, paragraph 45).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by XXX for providing that these detection frames may be stored in a frame memory buffer 426, and detection unit 422 may detect the location of objects, such as faces, in the detection frames. A location of a detected object may be indicated, for example, by a bounding box within the frame of video, or by an indication of the shape of an object and the location of the shape within the frame of video, as executing a first machine learning algorithm on the frame, thereby generating a bounding box around a person depicted in the frame; providing a location map may be a “heat map” of an object location, where each entry in the map corresponds to a frame region, such as a pixel or group of pixels, and each entry indicates the likelihood that a portion of the object is contained within that entry's corresponding region. Map compare 842 may then compare the similarity of these location heat maps, as executing a second machine learning algorithm on the bounding box, thereby generating a heatmap depicting key points of an anatomy of the person; and identifying one or more facial key points in the heatmap. Therefore, combining the elements from prior arts according to known methods and technique, such as neural networks, bounding box, heat map to detect and track objects, would yield predictable results.


Claim(s) 6, 13, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eswara (PGPUB: 20200320665) in view of Lee (PGPUB: 20200275017), in view of Dehghan (PGPUB: 20200082549), in view of KLUG (PGPUB: 20210182556), and further in view of Grauman (PGPUB: 20210174817).

Regarding claims 6, 13, and 20. The combination does not expressly teach the system of claim 5, wherein the first machine learning algorithm is a trained YOLOv3 object detection algorithm.
KLUG teaches for the detection of known objects, an object detection 131 function is utilized such as YOLOv3, which is a technique used for detecting and localizing objects in images. In example implementations, the YOLOv3 model is trained on objects that are expected to be found in the given environment, (e.g. monitor, keyboard, mouse, clocks, doors, frames, exit signs, lights for Environment Monitoring Robot (EMR), and so on). During inference, the model provides both locations (e.g., in the form of bounded boxes 132) and categories of the detected object for a given image. This information is further provided to the anomaly detection 140 function (see Fig. 1, paragraph 29). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by KLUG for providing the YOLOv3 model is trained on objects that are expected to be found in the given environment, as  wherein the first machine learning algorithm is a trained YOLOv3 object detection algorithm. Therefore, combining the elements from prior arts according to known methods and technique, such as trained YOLOv3 model, would yield predictable results.
However, the combination does not expressly teach wherein the second machine learning algorithm is a trained Simple Pose ResNet algorithm.
Grauman teaches that the system 200 includes a neural network, shown as ResNet network 204 (or any other suitable feature, architecture, neural network, etc.), according to some embodiments. The ResNet network 204 can be used to extract visual features from the detected objects in the video V after a 4.sup.th ResNet block. The visual features may have dimensions (H/32)×(W/32)×D where H, W, and D denote the frame and channel dimensions of the visual feature. In some embodiments, the visual features identified by the ResNet network 204 are visual features of the detected object 202 (see Fig. 2, paragraph 63); object detector 410 is configured to automatically detect or find objects in all frames of the video data that is input to object detector 410. Object detector 410 can use a Faster R-CNN object detector with a ResNet-101 backbone that is trained with open images or other training data. In some embodiments, other object detectors are used (see Fig. 4, paragraph 92).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Grauman for providing object detector 410 is configured to automatically detect or find objects in all frames of the video data that is input to object detector 410. Object detector 410 can use a Faster R-CNN object detector with a ResNet-101 backbone that is trained with open images or other training data. In some embodiments, other object detectors are used, as wherein the second machine learning algorithm is a trained Simple Pose ResNet algorithm. Therefore, combining the elements from prior arts according to known methods and technique, such as trained ResNet, would yield predictable results.


Claim(s) 7 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eswara (PGPUB: 20200320665) in view of Lee (PGPUB: 20200275017), and further in view of Nicholes (PGPUB: 20200099448).

Regarding claims 7 and 14. The combination does not expressly teach the system of claim 1, wherein the tracking component tracks the face by executing a median flow tracker on the frame and the subsequent frame.
Nicholes teaches that the controller 140 executes the video tracker to track an object (e.g., the identified beacon) in a video. The video tracker can be a median flow tracker algorithm/process. The median flow tracker algorithm takes history, size, and shape of the object into account. The median flow tracker algorithm/process can also indicate if the output track is valid as it is possible for the beacon to go out of frame. If the beacon goes out of frame, the median flow tracker reports to the controller 140 to return the camera 157 to a coarse pointing solution for link partner reacquisition. In this regard, where the blob detector algorithm/process only considers the current frame fed to it, the median flow tracker algorithm/process understands, for instance, that the beacon cannot jump across the frame in a single frame. In this way, the median flow tracker algorithm/process provides a more realistic tracking of physical objects in the frame (see Fig. 4 and 5, paragraph 62).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Nicholes for providing the video tracker can be a median flow tracker algorithm/process. The median flow tracker algorithm takes history, size, and shape of the object into account. The median flow tracker algorithm/process can also indicate if the output track is valid as it is possible for the beacon to go out of frame, as wherein the tracking component tracks the face by executing a median flow tracker on the frame and the subsequent frame. Therefore, combining the elements from prior arts according to known methods and technique, such as a median flow tracker algorithm/process, would yield predictable results


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN JIA whose telephone number is (571)270-5536. The examiner can normally be reached 9:00 am-7:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XIN JIA/Primary Examiner, Art Unit 2667