DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
Claims 1-20 are pending in this application. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-5, 7-15, and 17-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Peleg et al. (US Patent 8,311,277), hereby referred to as “Peleg”, in view of Sriram et al. (US PGPub US 2019/0294889), hereby referred to as “Sriram”. 

Consider Claims 1 and 11. 
Peleg teaches: 
1. A video type detection method, comprising: / 11. A video type detection apparatus, comprising: at least one processor; and a memory, connected with the at least one processor in communication; wherein, the memory stores instructions executable by the at least one processor, wherein the instructions are executed by the at least one processor to cause the at least one processor to: (Peleg: abstract, In a system and method for generating a synopsis video from a source video, at least three different source objects are selected according to one or more defined constraints, each source object being a connected subset of image points from at least three different frames of the source video. One or more synopsis objects are sampled from each selected source object by temporal sampling using image points derived from specified time periods. For each synopsis object a respective time for starting its display in the synopsis video is determined, and for each synopsis object and each frame a respective color transformation for displaying the synopsis object may be determined. The synopsis video is displayed by displaying selected synopsis objects at their respective time and color transformation, such that in the synopsis video at least three points that each derive from different respective times inthe source video are displayed simultaneously. Column 8 lines 8-55, FIGS. 2a to 2d show background images from a surveillance camera at Stuttgart airport. FIGS. 2a and 2b show daylight images while FIGS. 2c and 2d are at night. Parked cars and parked airplanes become part of the background)
1. obtaining N key frames of a first video, wherein N is an integer greater than 1, and a type of the first video is to be detected; / 11. obtain N key frames of a first video, wherein N is an integer greater than 1, and a type of the first video is to be detected; (Peleg: column 8 lines 10-30 Obtaining Activity Tubes, lines 43-67, Background Construction, Figure 2a and 2b; We used a simplification of [22] to compute the space-time tubes representing dynamic objects. This is done by combining background subtraction together with min-cut to get a 50 smooth segmentation of foreground objects. As in [22], image gradients that coincide with background gradients are attenuated, as they are less likely to be related to motion boundaries. The resulting "tubes" are connected components in the 3D space-time volume)
1. obtaining M energy scores corresponding to each of the N key frames by inputting each of the N key frames into M algorithm models corresponding to the first video type respectively, wherein M is an integer greater than 1; / 11. obtain M energy scores corresponding to each of the N key frames by inputting each of the N key frames into M algorithm models corresponding to the first video type respectively, wherein M is an integer greater than 1; (Peleg: column 10 lines 18-60, Energy Between Tubes, This energy will later be used by the optimization stage, creating a synopsis having maximum activity while avoiding conflicts and overlap between objects. Let B be the set of all activity tubes. Each tube bis defined over a finite time segment in the original video stream. Weights a and B are set by the user according to their relative importance for a particular query. Reducing the weights of the collision cost, for example, will result in a denser video where objects may overlap. Increasing this weight will result in sparser video where objects do not overlap and less activity is presented. An example for the different synopsis obtained by varying B is given in FIG. 10b. After extracting the activity tubes the pixel based cost can be replaced with object based cost.)
1. determining an energy score of the first video by a fusion strategy algorithm model according to NxM energy scores of the N key frames; / 11. determine an energy score of the first video by a fusion strategy algorithm model according to NxM energy scores of the N key frames; (Peleg: column 10 lines 63-67, column 11 lines 1-62, Activity Cost, The activity cost favors synopsis movies with maximum activity. It penalizes for objects that are not mapped to a valid time in the synopsis. Collision Cost, For every two "shifted" tubes and every relative time shift between them, we define the collision cost as the volume of their space-time overlap weighted by their activity measures a Changing the weight of the collision cost Ee changes the density of objects in the synopsis video as shown in FIG. 10b. Temporal Consistency Cost, The temporal consistency cost adds a bias towards preserving the chronological order of events. The preservation of chronological order is more important for tubes that have a strong interaction)
1. and comparing the energy score of the first video with a energy score threshold corresponding to a first video type, to determine whether the type of the first video is the first video type or not. / 11. and compare the energy score of the first video with a energy score threshold corresponding to a first video type, to determine whether the type of the first video is the first video type or not. (Peleg: column 12 lines 5-42, Energy Minimization Since the global energy function in Equations (7) and (15) is written as a sum of energy terms defined on single tubes or pairs of tubes, it can be minimized by various MRF-based techniques such as Belief Propagation [23] or Graph Cuts. Each state describes the subset of tubes that are included in the synopsis, and neighboring states are defined as states in which a single activity tube is removed or changes its mapping into the synopsis. As an initial state we used the state in which all tubes are shifted to the beginning of the synopsis 20 movie. Also, in order to accelerate computation, it is possible to restrict the temporal shifts of tubes to be in jumps of 10 frames.)
Peleg does not explicitly teach a “confidence score”
Sriram teaches: 
1. A video type detection method, comprising: / 11. A video type detection apparatus, comprising: at least one processor; and a memory, connected with the at least one processor in communication; wherein, the memory stores instructions executable by the at least one processor, wherein the instructions are executed by the at least one processor to cause the at least one processor to: (Sriram: abstract, The present disclosure provides various approaches for smart area monitoring suitable for parking garages or other areas. These approaches may include ROI-based occupancy detection to determine whether particular parking spots are occupied by leveraging image data from image sensors, such as cameras. These approaches may also include multi-sensor object tracking using multiple sensors that are distributed across an area that leverage both image data and spatial information regarding the area, to provide precise object tracking across the sensors. Further approaches relate to various architectures and configurations for smart area monitoring systems, as well as visualization and processing techniques. For example, as opposed to presenting video of an area captured by cameras, 3D renderings may be generated and played from metadata extracted from sensors around the area. [0045]-[0047], Figure 1A, With reference to FIG. 1A, FIG. 1A is an example system diagran1 of a smart area monitoring system 100, [0046] The smart area monitoring system 100 may include, among other things, a perception system 102, a semantic analysis system 104, and a visualization system 106. The perception system 102, the semantic analysis system 104, and the visualization system 106 may be communicatively coupled over a network(s) 110.)
1. obtaining N key frames of a first video, wherein N is an integer greater than 1, and a type of the first video is to be detected; / 11. obtain N key frames of a first video, wherein N is an integer greater than 1, and a type of the first video is to be detected; (Sriram: [0068], [0083] The global location determiner 122 may emit the global coordinates in which an object was observed in a global reference frame, which may be common across all sensors. The location calibrator 128 may calibrate each sensor to emit the object's global coordinates when an object is identified. The global reference frame may be geocoordinates (longitude, latitude and altitude) or a Euclidian space that identifies the position of the object in the wide area, as examples. In the case of a static sensor, such as magnetic loop or puck sensor, such a calibration may be in-situ; meaning that if those sensors detect an object, the object presence is the actual location of the sensor. For cameras, the calibration may be performed by mapping the camera's FoV to a global reference frame. [0084])
1. obtaining M confidence scores corresponding to each of the N key frames by inputting each of the N key frames into M algorithm models corresponding to the first video type respectively, wherein M is an integer greater than 1; / 11. obtain M confidence scores corresponding to each of the N key frames by inputting each of the N key frames into M algorithm models corresponding to the first video type respectively, wherein M is an integer greater than 1; (Sriram: [0066] Also, in any of these examples, the object attribute determiner 118 may analyze the image data to extract and/or determine one or more object attributes of an object…. For example, the object attribute determiner 118 may analyze location(s) of an object detected using the object detector 114 and/or may only analyze object detections that have a confidence score(s) exceeding a threshold value(s).)
1. determining a confidence score of the first video by a fusion strategy algorithm model according to NxM confidence scores of the N key frames; / 11. determine a confidence score of the first video by a fusion strategy algorithm model according to NxM confidence scores of the N key frames; (Sriram: [0109] In further examples, the occupancy status may include a level of confidence, or a confidence score, indicating a computed confidence in whether an ROI is occupied. The confidence score may, for example, range from 0 for a lowest confidence the ROI is occupied (and/or conversely a highest confidence the ROI is unoccupied) to 1 for a highest confidence the ROI is occupied (and/or conversely a lowest confidence the ROI is unoccupied). [0110]-[0112])
1. and comparing the confidence score of the first video with a confidence score threshold corresponding to a first video type, to determine whether the type of the first video is the first video type or not. / 11. and compare the confidence score of the first video with a confidence score threshold corresponding to a first video type, to determine whether the type of the first video is the first video type or not. (Sriram: [0115] For example, the occupancy determiner 116 may determine a length of the intersection and/or overlap and a maximum length 406A of an ROI indicator line that could intersect with the bounding box 402A (e.g., the diagonal along a direction of the ROI indicator line 404A). The occupancy determiner 116 may then compute a ratio between the maximum length 406A and the length of the intersection and/or overlap. Where the ratio exceeds a threshold value (e.g., >0.5), the occupancy determiner 116 may determine the parking spot 202A is occupied and set the occupancy status accordingly. Additionally or alternatively, the confidence score for the parking spot 202A may be based at least in part on the ratio. In various examples, a confidence score may be computed for each object detection. Also, a confidence score may be proportional to the length of the ROI indicator line 404A within the bounding box. Maximum confidence may be obtained if the bottom and top sides of the bounding box is cut by the ROI indicator line 404A)
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify Peleg’s method and system for video indexing and synopsis with the improvements of Sriram for smart area monitoring using artificial intelligence. The determination of obviousness is predicated upon the following findings: One skilled in the art would have been motivated to modify Peleg in order to improve the video indexing and synopsis method to leverage AI-based system architecture that incorporates a confidence measure to ensure features of interest are extracted.  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of Peleg, while the teaching of Sriram continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of improving the overall algorithm of Peleg for video indexing and synopsis, with the computational efficiency and accuracy of Sriram for incorporating confidence measures and AI architecture. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.

Consider Claims 2 and 12. 
The combination of Peleg and Sriram teaches: 
2. The method according to claim 1, further comprising: determining the confidence score threshold corresponding to the first video type according to a plurality of second videos, wherein the type of the second videos is the first video type. / 12. The device according to claim 11, wherein the at least one processor is further enabled to: determine the confidence score threshold corresponding to the first video type according to a plurality of second videos, wherein the type of the second videos is the first video type. (Peleg: column 9 lines 49-58, Finally, the 3D mask is grouped into connected components, denoted as "activity tubes". FIGS. 3a to 3d show four extracted tubes shown "flattened" over the corresponding backgrounds from FIG. 2. The left tubes correspond to ground vehicles, while the right tubes correspond to airplanes on the runway at the back. FIGS. 4a and 4b show synopsis frames derived using two extracted tubes from a "Billiard" scene so as to depict in a single frame a multitude of temporally separated players. Column 12 lines 42-67, column 13 lines 1-5, In this system, a server can view the live video feed, analyze the video for interesting events, and record an object based description of the video. This description lists for each camera the interesting objects, their duration, location, and their appearance.
A two phase process is proposed for synopsis of endless video:
1) Online Phase during video capture. This phase is done in real time. Object (tube) detection and segmentation. Inserting detected objects into the object queue. Removing objects from the object queue when reaching a space limit.
2) Response Phase constructing a synopsis according to user query. This phase may take a few minutes, depending on the amount of activity in the time period of interest. This phase includes: Constructing a time lapse video of the changing background. Background changes are usually caused by day-night differences, but can also be a result of an object that starts (stops) moving.
Stitching the tubes and the background into a coherent video. This action should take into account that activities from different times can appear simultaneously, and on a background from yet another time )

Consider Claims 3 and 13. 
The combination of Peleg and Sriram teaches: 
3. The method according to claim 2, wherein the determining the confidence score threshold corresponding to the first video type according to the plurality of second videos comprises: obtaining N key frames of each of the second videos; obtaining M confidence scores corresponding to each of the N key frames of each of the second videos by inputting the N key frames of each of the second videos into M algorithm models corresponding to the first video type respectively; and inputting each of the second videos and the NxM confidence scores corresponding thereto into the fusion strategy algorithm model for training and verification, to determine the confidence score threshold corresponding to the first video type respectively. / 13. The device according to claim 12, wherein the at least one processor is further enabled to: obtain N key frames of each of the second videos; obtain M confidence scores corresponding to each of the N key frames of each of the second videos by inputting the N key frames of each of the second videos into M algorithm models corresponding to the first video type respectively; and input each of the second videos and the NxM confidence scores corresponding thereto into the fusion strategy algorithm model for training and verification to determine the confidence score threshold corresponding to the first video type respectively. (Peleg: column 9 lines 49-58, Finally, the 3D mask is grouped into connected components, denoted as "activity tubes". FIGS. 3a to 3d show four extracted tubes shown "flattened" over the corresponding backgrounds from FIG. 2. The left tubes correspond to ground vehicles, while the right tubes correspond to airplanes on the runway at the back. FIGS. 4a and 4b show synopsis frames derived using two extracted tubes from a "Billiard" scene so as to depict in a single frame a multitude of temporally separated players. Column 12 lines 42-67, column 13 lines 1-5, In this system, a server can view the live video feed, analyze the video for interesting events, and record an object based description of the video. This description lists for each camera the interesting objects, their duration, location, and their appearance. Sriram: [0115] For example, the occupancy determiner 116 may determine a length of the intersection and/or overlap and a maximum length 406A of an ROI indicator line that could intersect with the bounding box 402A (e.g., the diagonal along a direction of the ROI indicator line 404A). The occupancy determiner 116 may then compute a ratio between the maximum length 406A and the length of the intersection and/or overlap. Where the ratio exceeds a threshold value (e.g., >0.5), the occupancy determiner 116 may determine the parking spot 202A is occupied and set the occupancy status accordingly. Additionally or alternatively, the confidence score for the parking spot 202A may be based at least in part on the ratio. In various examples, a confidence score may be computed for each object detection. Also, a confidence score may be proportional to the length of the ROI indicator line 404A within the bounding box. Maximum confidence may be obtained if the bottom and top sides of the bounding box is cut by the ROI indicator line 404A)

Consider Claims 4 and 14. 
The combination of Peleg and Sriram teaches: 
4. The method according to claim 3, wherein an eXtreme Gradient Boosting (XGBOOST) classifier is used as the fusion strategy algorithm model for training and verification. / 14. The device according to claim 13, wherein an eXtreme Gradient Boosting (XGBOOST) classifier is used as the fusion strategy algorithm model for training and verification. (Sriram: [0065]-[0066] For example, and without limitation, the machine learning model(s) may include any type of machine learning model, such as a machine learning model(s) using linear regression, logistic regression, decision trees, support vector machines (SVM), Naive Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models. Peleg: column 9 lines 42-48, We do not use a lower threshold with infinite weights, since the later stages of our algorithm can robustly handle pixels that are wrongly identified as foreground. For the same reason, we construct a mask of all foreground pixels in the space-time volume, and apply a 3D morphological dilation on this mask. As a result, each object is surrounded by several pixels from the background. This fact will be used later by the stitching algorithm)

Consider Claims 5 and 15. 
The combination of Peleg and Sriram teaches: 
5. The method according to claim 1, wherein the obtaining the N key frames of the first video comprises: sampling the first video at equal intervals, and extracting the N key frames. / 15. The device according to claim 11, wherein the at least one processor is further enabled to: sample the first video at equal intervals, and extract the N key frames. (Peleg: column 12 lines 5-42, Energy Minimization Since the global energy function in Equations (7) and (15) is written as a sum of energy terms defined on single tubes or pairs of tubes, it can be minimized by various MRF-based techniques such as Belief Propagation [23] or Graph Cuts. Each state describes the subset of tubes that are included in the synopsis, and neighboring states are defined as states in which a single activity tube is removed or changes its mapping into the synopsis. As an initial state we used the state in which all tubes are shifted to the beginning of the synopsis 20 movie. Also, in order to accelerate computation, it is possible to restrict the temporal shifts of tubes to be in jumps of 10 frames.) Sriram: [0199] In various examples, each sensor may provide for object detections at a very high frequency. For example, a camera might detect objects at 30 frames per second (when object detection is integrated into the camera).)

Consider Claims 7 and 17. 
The combination of Peleg and Sriram teaches: 
7. The method according to claim 1, further comprising: assigning corresponding weights to the M algorithm models; the determining the confidence score of the first video by the fusion strategy algorithm model according to the NxM confidence scores of the N key frames comprises: determining the confidence score of the first video according to the NxM confidence scores of the N key frames and the corresponding weights of the M algorithm models. / 17. The device according to claim 11, wherein the at least one processor is further enabled to: assign corresponding weights to the M algorithm models; determine, by the fusion strategy algorithm model, the confidence score of the first video according to the NxM confidence scores of the N key frames and the corresponding weights of the M algorithm models. (Peleg: column 12 lines 5-42, Energy Minimization Since the global energy function in Equations (7) and (15) is written as a sum of energy terms defined on single tubes or pairs of tubes, it can be minimized by various MRF-based techniques such as Belief Propagation [23] or Graph Cuts. Each state describes the subset of tubes that are included in the synopsis, and neighboring states are defined as states in which a single activity tube is removed or changes its mapping into the synopsis. As an initial state we used the state in which all tubes are shifted to the beginning of the synopsis 20 movie. Also, in order to accelerate computation, it is possible to restrict the temporal shifts of tubes to be in jumps of 10 frames.) Sriram: [0115] For example, the occupancy determiner 116 may determine a length of the intersection and/or overlap and a maximum length 406A of an ROI indicator line that could intersect with the bounding box 402A (e.g., the diagonal along a direction of the ROI indicator line 404A). The occupancy determiner 116 may then compute a ratio between the maximum length 406A and the length of the intersection and/or overlap. Where the ratio exceeds a threshold value (e.g., >0.5), the occupancy determiner 116 may determine the parking spot 202A is occupied and set the occupancy status accordingly. Additionally or alternatively, the confidence score for the parking spot 202A may be based at least in part on the ratio. In various examples, a confidence score may be computed for each object detection. Also, a confidence score may be proportional to the length of the ROI indicator line 404A within the bounding box. Maximum confidence may be obtained if the bottom and top sides of the bounding box is cut by the ROI indicator line 404A)

Consider Claims 8 and 18. 
The combination of Peleg and Sriram teaches: 
8. The method according to claim 1, wherein the M algorithm models comprise: a classification algorithm model, a feature logo algorithm model and a feature person algorithm model. / 18. The device according to claim 11, wherein the M algorithm models comprise: a classification algorithm model, a feature logo algorithm model and a feature person algorithm model. (Sriram: [0065]-[0066] For example, and without limitation, the machine learning model(s) may include any type of machine learning model, such as a machine learning model(s) using linear regression, logistic regression, decision trees, support vector machines (SVM), Naive Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models. Peleg: column 9 lines 42-48, We do not use a lower threshold with infinite weights, since the later stages of our algorithm can robustly handle pixels that are wrongly identified as foreground. For the same reason, we construct a mask of all foreground pixels in the space-time volume, and apply a 3D morphological dilation on this mask. As a result, each object is surrounded by several pixels from the background. This fact will be used later by the stitching algorithm)

Consider Claims 9 and 19. 
The combination of Peleg and Sriram teaches: 
9. The method according to claim 8, wherein the classification algorithm model comprises a rough classification algorithm model and a fine classification algorithm model. / 19. The device according to claim 18, wherein the classification algorithm model comprises a rough classification algorithm model and a fine classification algorithm model. (Sriram: [0065]-[0066] [0199] In various examples, each sensor may provide for object detections at a very high frequency. For example, a camera might detect objects at 30 frames per second (when
object detection is integrated into the camera). However, the smart area monitoring system 100 may use a lower granularity of tracking such as to reduce computational requirements and/or network bandwidth. Such down-sampling may also be used where the end use-case does not need object tracking at such a fine-time level of granularity. In some approaches, the sensing rate (e.g., of object detections) may be adjusted such that it matches application rate. This may not be feasible in some cases, such as where the sensors are serving multiple applications that may require different levels of granularity. Peleg: column 9 lines 42-48, We do not use a lower threshold with infinite weights, since the later stages of our algorithm can robustly handle pixels that are wrongly identified as foreground. For the same reason, we construct a mask of all foreground pixels in the space-time volume, and apply a 3D morphological dilation on this mask. As a result, each object is surrounded by several pixels from the background. This fact will be used later by the stitching algorithm)

Consider Claim 10. 
The combination of Peleg and Sriram teaches: 
10. The method according to claim 1, wherein the first video type comprises one of the following: a violent and terrorist video type, a political video type and a prohibited video type. (Sriram: [0060] In the example of FIG. 2, the area 200 includes cameras 228, 230, 232, 234, 236, 238, 240, 242, 244, and 246. One or more of the cameras may, for example, be a specialized camera device equipped with processors that are used to at least partially implement the object attribute determiner 118 to execute automatic recognition techniques such as automatic number-plate recognition (ANPR), also known as automatic license-plate recognition or reader technology (ALPR), license-plate recognition (LPR) technology, etc. Additionally or alternatively, one or more of the cameras may be general surveillance cameras or other types of cameras used to capture image data ( e.g., video data), and image recognition/inference techniques may be applied at least partially by the object attribute determiner 118 to the captured image data to detect and identify the license plate and/or other attribute information (color, make and model, size, etc.) of vehicles. Peleg column 6 lines 43-50, Video synopsis can make surveillance cameras and webcams more useful by giving the viewer the ability to view summaries of the endless video, in addition to the live video stream. To enable this, a synopsis server can view the live video feed, analyze the video for interesting events, and record an object-based description of the video. This description lists for each webcam the interesting objects, their duration, location, and their appearance FIGS. 2a to 2d show background images from a surveillance camera at Stuttgart airport at different times)

Consider Claim 20. 
The combination of Peleg and Sriram teaches: 
20. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are configured to cause a computer to perform the method according to claim 1. (Peleg: Claim 37, Sriram: [0236]-[0237], The computer-storage media may include both volatile and nonvolatile media and/or removable and nonremovable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, and/or other data types. For example, the memory 1104 may store computer-readable instructions (e.g., that represent a program(s) and/or a program element(s), such as an operating system. Computer-storage media may include, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which may be used to store the desired information and which may be accessed by computing device 1100. As used herein, computer storage media does not comprise signals per se.)

Claims 6 and 16 are rejected under 35 U.S.C. 103(a) as being unpatentable over Peleg et al. (US Patent 8,311,277), hereby referred to as “Peleg”, in view of Sriram et al. (US PGPub US 2019/0294889), hereby referred to as “Sriram”, further in view of Examiner’s Official Notice. 

Consider Claims 6 and 16. 
The combination of Peleg and Sriram teaches: The method according to claim 5 and The device according to claim 15, wherein sampling the first video at equal intervals. (Peleg: column 12 lines 5-42, Energy Minimization Since the global energy function in Equations (7) and (15) is written as a sum of energy terms defined on single tubes or pairs of tubes, it can be minimized by various MRF-based techniques such as Belief Propagation [23] or Graph Cuts. Each state describes the subset of tubes that are included in the synopsis, and neighboring states are defined as states in which a single activity tube is removed or changes its mapping into the synopsis. As an initial state we used the state in which all tubes are shifted to the beginning of the synopsis 20 movie. Also, in order to accelerate computation, it is possible to restrict the temporal shifts of tubes to be in jumps of 10 frames.) Sriram: [0199] In various examples, each sensor may provide for object detections at a very high frequency. For example, a camera might detect objects at 30 frames per second (when object detection is integrated into the camera).)
The combination of Peleg and Sriram does not teach “sampling the first video at an equal interval of 2 seconds”
Examiner takes Official Notice that it would have been obvious to one of ordinary skill in the art to modify the combination of Peleg and Sriram to use different sampling rates based on the field of endeavor for applying the video feature detection and indexing. Peleg and Sriram independently suggest different sampling rates, and one of ordinary skill in the art, at the time of the invention, would have been motivated to modify the combination of Peleg and Sriram in order to reduce or increase the sampling rate, dependent upon the video feeds that are to be analyzed. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question. 

Conclusion
The prior art made of record in form PTO-892 and not relied upon is considered pertinent to applicant's disclosure. 
Toderice et al., US PGPub 20120123978, Learning Tags For Video Annotation Using Latent Subtags
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAHMINA ANSARI whose telephone number is 571-270-3379.  The examiner can normally be reached on IFP Flex - Monday through Friday 9 to 5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached on 571-272-3638.  The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2600.



2662
/Tahmina Ansari/

June 18, 2022

/TAHMINA N ANSARI/Primary Examiner, Art Unit 2662