DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Janumpally et al (WO2019/050508A1) in view of Thota et al (US20200162641).
Regarding claim 1, Janumpally teaches a method for detecting an object across frames of a video (para. [0013], a system may analyze batches of video for performing face detection, facial recognition, and grouping of faces recognized as corresponding to the same individuals), the method comprising: 

clustering each of the detected one or more objects of the first group in each frame into one or more clustered-object groups (para. [0013], [0088], Video portions containing an individual targeted as possibly being of interest may be grouped together based on the grouping of recognized faces; Clustering and/or classification techniques may then be applied to the features); 
identifying one or more frames of the video without one of the one or more clustered-object groups (para. [0053], [0075], If there is no entry in the grouping data structure having a matching face that matches the recognized face, a new entry 
analyzing the identified one or more frames to detect a second group of one or more objects in the identified one or more frames groups (para. [0053], [0059], [0075], If there is no entry in the grouping data structure having a matching face that matches the recognized face, a new entry may be created in the grouping data structure; When the video processing program is analyzing a new video portion, the grouping program may initially attempt to associate the face with an existing identity using the facial recognition program If the grouping program is unable to locate an existing identity, the grouping program may create a new identity, assign a new identity identifier 304 to the new identity, and create a new entry in the video portion correlation data structure 308 for the new identity 302).

Janumpally fails to teach analyzing the identified one or more frames using an optical image classification engine. However Thota teaches analyzing one or more frames using an optical image classification engine to detect a second group of one or more objects in the one or more frames groups (para. [0036], the object detector 204 may be configured to detect the first set of objects, the second set of objects, and related object types based on object detection and classification technique). It is also 
Therefore taking the combined teachings of Janumpally and Thota as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Thota into the method of Janumpally. The motivation to combine Thota and Janumpally would be to improve overall quality of video content (para. [0015] of Thota).


Regarding claim 2, the modified invention of Janumpally teaches a method further comprising: 
clustering one or more objects of the second group detected from each of the identified one or more frames into the one or more clustered-object groups (para. [0013], [0052]-[0053] of Janumpally, Video portions containing an individual targeted as possibly being of interest may be grouped together based on the grouping of recognized faces; If there is no entry in the grouping data structure having a matching face that matches the recognized face, a new entry may be created in the grouping data structure).



redacting objects belonging to a first clustered-object group of the one or more clustered-object groups (para. [0013], [0016], [0059] of Janumpally, the faces of other non-targeted individuals appearing in the video portions that include a targeted face may be redacted;  the faces of the non-targeted individuals may be redacted).


Regarding claim 4, the modified invention of Janumpally teaches a method further comprising: 
merging the first and second groups (para. [0017], [0047], [0074] of Janumpally, the system may single out the same faces in different video portions and group the faces according to an identity, location of video setting, time at which the video was recorded, and/or duration of an identified emotion; these video portions may be grouped to the same identity based on matching recognized faces to the same identity; The grouping data structure 300 may include a plurality of identities 302, and each identity 302 may be associated with an identity identifier (ID) 304) to form a merged list of detected objects in the video (fig. 3 and para. [0047] of Janumpally, The grouping of the faces in the multiple video portions).


Regarding claim 5, the modified invention of Janumpally teaches a method further comprising: 



Regarding claim 6, the modified invention of Janumpally teaches a method wherein redacting one or more of the detected objects comprises: 
displaying on a display device one or more objects from each of the one or more clustered-object groups (fig. 5 of Janumpally); 
receiving, from a user, a selection of one or more objects from one or more clustered-object groups (para. [0080] of Janumpally, The UI 500 includes a selection column 508 with an interactive select/deselect control (e.g., a checkbox) that the reviewer may use to select or deselect a face for redaction in the video portion); and 
redacting one or more objects based on the selection of the one or more objects (para. [0080] of Janumpally, The UI 500 includes a selection column 508 with an interactive select/deselect control (e.g., a checkbox) that the reviewer may use to select or deselect a face for redaction in the video portion).


Regarding claim 7, the modified invention of Janumpally teaches a method wherein detecting the first group of one or more objects comprises defining a boundary perimeter for each of the detected one or more objects of the first group (para. [0087] of Janumpally, . For example, a virtual frame may be established around each detected 
wherein clustering each of the detected one or more objects comprises clustering the one or more objects into the one or more clustered-object groups based at least on a coordinate of the boundary perimeter of each head (para. [0087]-[0089] of Janumpally, An identifier for each face may be returned for each frame with top, left, right, bottom of the bounding area recorded as coordinates for the face; Clustering and/or classification techniques may then be applied to the features; the tracking program 126 may provide the face coordinates 708 of the detected faces).


Regarding claim 8, the modified invention of Janumpally teaches a method of claim 6, wherein detecting the first group of one or more objects comprises: 
generating bounding boxes for one or more objects in each frame (para. [0088] of Janumpally, a virtual frame may be established around each detected face indicating the bounding area of each detected face); and 
detecting one or more objects by classifying image data within the bounding boxes (para. [0088] of Janumpally, An identifier for each face may be returned for each frame with top, left, right, bottom of the bounding area recorded as coordinates for the face).


Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Janumpally et al (WO2019/050508A1) and Thota et al (US20200162641) in view of Kant (US9922271).
Regarding claim 9, the modified invention of Janumpally teaches a method wherein clustering each of the detected one or more objects comprises: 
extracting object features for each of the detected one or more objects (para. [0013], [0086] of Janumpally, analyze batches of video for identifying portions of video that include certain features; the face detection algorithm may perform feature extraction from a frame of the video portion and apply one or more of the machine learning models (MLMs) 138 for decision-making with respect to the extracted features to correlate the extracted features with the shape and contours of the human face); and 
clustering the one or more objects into the one or more clustered-object groups based at least on the extracted object features (para. [0052], [0074] of Janumpally, the faces may be decomposed into features that can be used during grouping for comparison with any faces already included in a grouping data structure of previously recognized identities; each identity 302 may include one or more images 306 or at least stored facial recognition image features that associate the identity 302 with one or more recognized faces detected in one or more video portions).

Janumpally fails to teach using a scale invariant feature transform. However Kant teaches using a scale invariant feature transform (col. 11 line 63 – col. 12 line 4) to extract object features (col. 11 lines 63-67).
.

Claim 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Janumpally et al (WO2019/050508A1) and Thota et al (US20200162641) in view of Venetianer et al (US20180322750).
Regarding claim 10, the modified invention of Janumpally teaches a method wherein the second group of one or more objects comprises one or more different subgroups of objects (para. [0059], [0074] of Janumpally, the computing device determine non-targeted faces in the video portion; each identity 302 may include one or more images 306 or at least stored facial recognition image features that associate the identity 302 with one or more recognized faces detected in one or more video portions).

Janumpally fails to teach wherein the optical image classification engine comprises an optical flow engine or a motion estimation engine. However Venetianer teaches wherein an optical image classification engine comprises an optical flow engine 
Therefore taking the combined teachings of Janumpally and Thota with Venetianer as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Venetianer into the method of Janumpally and Thota. The motivation to combine Thota, Venetianer and Janumpally would be to reduce the amount of video surveillance data so analysts of the video surveillance data can be conducted (para. [0031] of Venetianer).

Claims 11-14 and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Janumpally et al (WO2019/050508A1) in view of Venetianer et al (US20180322750).
Regarding claim 11, Janumpally teaches a method for detecting an object across frames of a video, the method comprising: 
detecting one or more objects (para. [0013], [0015], a system may analyze batches of video for performing face detection, facial recognition, and grouping of faces recognized as corresponding to the same individuals; After identifying one or more targeted faces of interest), using a first image classifier (para. [0029], one or more machine learning models 138, which may be used by one or more of the functional components, such as the face detection program 121, the facial recognition program 122, the emotion determination program 124, the tracking program 126, the grouping program 128, the prioritization program 130, and/or the redaction program 132. Examples of such machine learning models 138 include predictive 
grouping the one or more objects detected over multiple frames of the video into one or more groups of distinct object (para. [0013], [0015], Video portions containing an individual targeted as possibly being of interest may be grouped together based on the grouping of recognized faces; The grouped faces along with the identified emotion(s) may subsequently be ranked or otherwise prioritized based on the identified emotion); 
identifying a first or last instance of detection of an object of a first groups of distinct object (para. [0074], The grouping data structure 300 may include a plurality of identities 302, and each identity 302 may be associated with an identity identifier (ID) 304. The video portion correlation data structure 308 includes a video source identifier 310, a start time 312 for the video portion at which the face is first detected in the camera field of view, an end time 314 at which the face leaves the camera view of view); and 
analyzing frames occurring before the first instance or frames occurring after the last instance (para. [0074], [0076], [0094], a start time 312 for the video portion at which the face is first detected in the camera field of view, an end time 314 at which the face leaves the camera view of view; The individual is tracked in that video portion and in another video portions from another video source, such as in a video portions taken 

Janumpally fails to teach using a second image classifier to detect one or more additional objects. However Venetianer teaches using a second image classifier to detect one or more additional objects (51 and 52 in fig. 5, para. [0152], [0156], Any motion detection algorithm for detecting movement between frames at the pixel level can be used for this block).
Therefore taking the combined teachings of Janumpally and Venetianer as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Venetianer into the method of Janumpally. The motivation to combine Venetianer and Janumpally would be to reduce the amount of video surveillance data so analysts of the video surveillance data can be conducted (para. [0031] of Venetianer).



redacting one or more objects of the first group (para. [0013], [0051] of Janumpally, the faces of other non-targeted individuals appearing in the video portions that include a targeted face may be redacted; In either event, each face in the video portion may be tracked for use in the subsequent redaction of faces from the video portion and/or for tracking the position of the face of a targeted individual) and the one or more additional objects from the video (para. [0035], [0067] of Janumpally, additional redaction on the redacted video portion; a selection of missed faces for redaction or faces indicated for redaction to remain unredacted. For example, the reviewer may use the UI on the client computing device to select one or more areas in the video portion in which non-targeted faces were not properly redacted).


Regarding claim 13, the modified invention of Janumpally teaches a method wherein the first and second image classifiers comprise a head detection neural network (para. [0029], [0088] of Janumpally, the facial recognition program 122 may employ one or more of the machine learning models (MLMs) 138 to provide recognition information 704) and an optical image classifier (para. [0152] of Venetianer para, Any motion detection algorithm for detecting movement between frames at the pixel level can be used for this block), respectively. It is noted that Venetianer teaches using two detection schemes in parallel (51 and 52 in fig. 5, para. [0152]) to addresses deficiencies in the other technique (para. [0154]).


Regarding claim 14, the modified invention of Janumpally teaches a method wherein the optical image classifier comprises an optical flow engine or a motion vector estimation engine (51 in fig. 5, para. [0152] of Venetianer, Any motion detection algorithm for detecting movement between frames at the pixel level can be used for this block).


Regarding claim 16, the modified invention of Janumpally teaches a method wherein identifying the first or last instance comprises identifying the first and the last instance of detection of the object of the first group (para. [0074], [0076] of Janumpally, a start time 312 for the video portion at which the face is first detected in the camera field of view, an end time 314 at which the face leaves the camera view of view; The individual is tracked in that video portion and in another video portions from another video source, such as in a video portions taken within several minutes of the first video portion through a nearby camera).


Regarding claim 17, the modified invention of Janumpally teaches a method wherein analyzing frames occurring before the first instance or frames occurring after the last instance comprises analyzing frames occurring before the first instance and frames occurring after the last instance of detection (312 and 314 in fig. 3, para. [0076] 


Regarding claim 18, the modified invention of Janumpally teaches a method wherein analyzing frames occurring before the first instance comprises analyzing frames occurring up to 10 seconds before the first instance (para. [0076], [0094] of Janumpally, The individual is tracked in that video portion and in another video portions from another video source, such as in a video portions taken within several minutes of the first video portion through a nearby camera; the emotion determination program 124 may be executed for video frames selected from every 5 seconds, 10 seconds, 20 seconds, 30 seconds, or the like, of the video portion), and wherein analyzing frames occurring after the last instance comprises analyzing frames occurring up to 10 seconds after the last instance (para. [0076], [0094] of Janumpally, The individual is tracked in that video portion and in another video portions from another video source, such as in a video portions taken within several minutes of the first video 


Regarding claim 19, the modified invention of Janumpally teaches a method wherein analyzing frames occurring before the first instance or frames occurring after the last instance comprises analyzing frames occurring before and after until a head is detected (para. [0074], [0076] of Janumpally, a start time 312 for the video portion at which the face is first detected in the camera field of view, an end time 314 at which the face leaves the camera view of view; The individual is tracked in that video portion and in another video portions from another video source, such as in a video portions taken within several minutes of the first video portion through a nearby camera).


Regarding claim 20, the claim recites similar limitations as claimed in claim 11 and is rejected for the same reasons as stated above. Furthermore, Janumpally teaches a memory (para. [0024] of Janumpally) and a processor coupled to the memory (para. [0023] of Janumpally).


Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Janumpally et al (WO2019/050508A1) and Venetianer et al (US20180322750) in view of Chen et al (US20190163977).
Regarding claim 15, the modified invention of Janumpally fails to teach a method wherein the optical image classification engine comprises a dlib correlation tracker engine.
However Chen teaches an optical image classification engine comprises a dlib correlation tracker engine (para. [0021], to detect the faces at each frame, standard face detection libraries such as dlib and openCV may be used to extract face images from original video at each consecutive frame).
Therefore taking the combined teachings of Janumpally and Venetianer with Chen as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Chen into the method of Janumpally and Venetianer. The motivation to combine Chen, Venetianer and Janumpally would be to obtain more training samples based on limited video data (para. [0023] of Chen).

Related Art
Hu (US20100296702) teaches person tracking across frames (para. [0030]).
Leichter et al (US20120251078) teaches facial tracking and comparing object trajectories (figs. 4 and 5).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEON VIET Q NGUYEN whose telephone number is (571)270-1185. The examiner can normally be reached Mon-Fri 11AM-7PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LEON VIET Q NGUYEN/Primary Examiner, Art Unit 2663