DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Arguments

Applicant's arguments and amendments received May 11, 2022 have been fully considered.  
with regard to 35 U.S.C. § 103, Applicant argues that the cited prior art does not disclose “see applicant argument pages 11-15”. This language corresponds to claims 1-20 and specifically to independent claims. 
As such, these have been considered but they are not persuasive as addressed below. See the rejection below how the art on record reads on the claimed invention as well as the examiner's interpretation of the cited art in view of the presented claim set. 
Further, applicant argument the claimed invention “computes similarities between adjacent video segments” corresponds with similarities between A & B and B & C. however, the comparison is not equal since the claimed invention not limited to similarities between A & B and B & C. Further, art of Dunlop compares a selected frame or frames within whole video frames or video frame within time interval [see para. 0093]. The comparison includes adjacent and outside adjacent frames or adjacent within time interval or outside the time interval. As such, the examiner stands with the rejection. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shetty et al. US 2016/0070962 further in view of Dunlop et al. US 2013/0259390.

In regarding to claim 1 Shetty teaches:

1. A method comprising: identifying, by a processor, video segments from a video, each video segment associated with a shot of the video; 
Shetty, Fig. 2

However, Shetty fails to explicitly teach, but Dunlop teaches:
computing, by the processor, degrees of similarity between adjacent video segments, the degrees of similarity associated with corresponding segmentation levels; 
According to one embodiment of output 26a, the classification score 32 is a value between 0 and 1 indicating the probability that a particular shot includes content associated with a predefined scene class 30. As will be understood, the classification score is represented in a variety of ways according to various embodiments, such as a percentage, a ratio (as compared to the other scene categories), and other similar ways. As shown, exemplary table 26a indicates a hypothetical set of classification scores for the mountain shot associated with video file 22 and shown in frames 24. The classification scores indicate a high probability that the scene includes content associated with (and therefore classified by) mountains (i.e. "mountainous"), "sky," and a "lake/river" (shown by classification scores 0.91, 0.78, and 0.73, respectively). These scores are as expected, considering the exemplary images 24 include mountains, sky, and a lake. Scene category "snow" received a significant score as well (i.e. 0.41), indicating that the shot contains some portion of this type of content.
Dunlop, 0080-0086, emphasis added


and segmenting, by the processor, the video segments based on the degrees of similarity and the segmentation levels to generate sets of scene-based video segments, each set of scene-based video segment associated with a corresponding segmentation level.  

Once the classification scores are calculated, a threshold value is applied to the scores to identify the scene classes that likely apply to the given shot. For example, a system operator may define a threshold value of 0.4, and thus any scene category receiving a classification score above the threshold is associated with the shot. Thus, if 0.4 were used as a threshold, then the shot would be associated with categories "mountainous," "sky," "lake/river," and "snow." If a higher threshold were used, say 0.7, then the shot would be classified as "mountainous," "sky," and "lake/river". A higher threshold might be used, for example, if a system operator desires to label shots only according to content that is prominent in the shots. According to one embodiment, the threshold is varied on a per-class basis. As will be appreciated, the threshold can be varied at a system operator's discretion to produce more accurate or focused results, include more or fewer classes per shot, etc.
Dunlop, 0080-0086, emphasis added


Accordingly, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Dunlp into a system of Shetty in order to computing, by the processor, degrees of similarity between adjacent video segments, the degrees of similarity associated with corresponding segmentation levels and segmenting, by the processor, the video segments based on the degrees of similarity and the segmentation levels to generate sets of scene-based video segments, each set of scene-based video segment associated with a corresponding segmentation level, as such,   the systems and methods relate generally to classification of video data, files, or streams, and more particularly to semantic classification of shots or sequences in videos based on video content for purposes of content-based video indexing and retrieval, as well as optimizing efficiency of further video analysis..—para. 0002.


In regarding to claim 2 Shetty and Dunlop teaches:

2. The method of claim 1, further Dunlop teaches the segmenting the video segments comprising: merging, for each segmentation level, video segments having degrees of similarity meeting a pre-configured similarity threshold, the pre-configured similarity threshold corresponding to each of the segmentation levels;
Dunlop, 0080-0086

 and generating, based on the merging, the sets of scene-based video segments corresponding to each of the segmentation levels.  
Dunlop, 0080-0086


In regarding to claim 3 Shetty and Dunlop teaches:

3. The method of claim 2, further Shetty teaches the merging the video segments further comprising: acquiring, for each segmentation level, a set of video segments based on temporal relationships among the video segments, 
In one embodiment, the video segmentation module 122 identifies segments of videos by using coherence of the frame features. The coherence measures similarity of features in a predetermined temporal segment. The predetermined temporal segment is a short segment of video for measuring similarities between the frames. This similarity provides a distance measure to an unsupervised clustering/segmentation algorithm, such as agglomerative clustering, affinity propagation, or spectral clustering. The output of this algorithm identifies segments of the video.
Shetty, 0039-0041, emphasis added


the set of video segments comprising video segments corresponding to a number of segment intervals within a pre- configured number range and having degrees of similarity meeting the pre- 48Attorney Docket No. 161095.024800 configured similarity threshold corresponding to a respective segmentation level. 
The frame selection module 124 identifies, for each video segment, a representative frame to represent and summarize the video segment. The representation frame is a frame that is most representative of the concepts in the video segment. When identifying a representative frame, the frame selection module 124 scores the frames of the segment according to the semantic features of the frames and compares the semantic features of the frames to those of the video segment. The frame selection module 124 may also generate an aesthetic score associated with the frames and generate a combined score for a frame. The combined score for a frame accounts for the semantic score and the aesthetic score. From among the combined scores of the frames for a segment, the frame selection module 124 selects the frame with the highest score as the representative frame for the video segment. 
Shetty, 0039-0041, emphasis added



In regarding to claim 4 Shetty and Dunlop teaches:

4. The method of claim 3, further Shetty teaches the merging the video segments comprising: merging, at each of the segmentation levels and based on the temporal relationships among the video segments, 
Representative segments are selected 520 from the segments determined to be relevant to the request. The segments that are relevant to the request are scored and selected based on relevance to the video metadata and the user's context (e.g., the user's search query or user interests). For example, the segments relevant to the request are scored based on the match between the segment and the semantic concepts associated with the query. The segments with the highest score and reflecting a diversity of semantic concepts are selected. The representative frames associated for the selected representative segments can be determined from the segment table. The video summary module 126 generates a video summary 530 using the representative frames for the selected representative segments. The video summary chronologically combines the representative frames and may present a series of the representative frames to the user, for example, in a static "storyboard" or by combining the frames into an animation that sequentially transitions from one frame to another. The video summary is provided to the user who determines whether or not to view the entire the video.
Shetty, 0039-0041, 0061,  emphasis added



video segments having degrees of similarity meeting the pre-configured similarity threshold corresponding to a respective segmentation level to obtain scene-based video segments corresponding to the respective segmentation level; 
Representative segments are selected 520 from the segments determined to be relevant to the request. The segments that are relevant to the request are scored and selected based on relevance to the video metadata and the user's context (e.g., the user's search query or user interests). For example, the segments relevant to the request are scored based on the match between the segment and the semantic concepts associated with the query. The segments with the highest score and reflecting a diversity of semantic concepts are selected. The representative frames associated for the selected representative segments can be determined from the segment table. The video summary module 126 generates a video summary 530 using the representative frames for the selected representative segments. The video summary chronologically combines the representative frames and may present a series of the representative frames to the user, for example, in a static "storyboard" or by combining the frames into an animation that sequentially transitions from one frame to another. The video summary is provided to the user who determines whether or not to view the entire the video.
Shetty, 0039-0041, 0061,  emphasis added


computing degrees of similarity associated with the scene-based video segments; 
Representative segments are selected 520 from the segments determined to be relevant to the request. The segments that are relevant to the request are scored and selected based on relevance to the video metadata and the user's context (e.g., the user's search query or user interests). For example, the segments relevant to the request are scored based on the match between the segment and the semantic concepts associated with the query. The segments with the highest score and reflecting a diversity of semantic concepts are selected. The representative frames associated for the selected representative segments can be determined from the segment table. The video summary module 126 generates a video summary 530 using the representative frames for the selected representative segments. The video summary chronologically combines the representative frames and may present a series of the representative frames to the user, for example, in a static "storyboard" or by combining the frames into an animation that sequentially transitions from one frame to another. The video summary is provided to the user who determines whether or not to view the entire the video.
Shetty, 0039-0041, 0061,  emphasis added


designating the scene-based video segments corresponding to the respective segmentation level as video segments at a subsequent segmentation level; and merging video segments at the subsequent segmentation level based on the computed degrees of similarity obtained and the designated video segments. 
Shetty, 0039-0041, 0061 and fig. 3
 


In regarding to claim 5 Shetty and Dunlop teaches:

5. The method of claim 1, further Shetty teaches the computing degrees of similarity comprsing: acquiring multi-modal features for each of the video segments; 
Shetty, 0050-0052, 


and acquiring the degrees of similarity associated with the video segments based on the multi-modal feature of each of the video segments.  
Shetty, 0050-0052, 



In regarding to claim 6 Shetty and Dunlop teaches:

6. The method of claim 5, further Shetty teaches the acquiring multi-modal features for each of the video segments comprising: acquiring at least one of: a visual feature, a speech feature, and a textual feature for each of the video segments; 
The video segmentation module 122 may identify video segments by tracking visual features across frames. The video segmentation module 122 identifies a frame as a segment boundary when more than a threshold number or fraction of features change between those frames including the frame. The video segmentation module 122 may use one or combination of the techniques described above to identify video segments. Subsequently, the video segmentation module 122 provides the identified segments to the frame selection module 124.
Shetty, 0039-0041, emphasis added


and 49Attorney Docket No. 161095.024800 combining the at least one of the visual feature, the speech feature, and the textual feature to obtain the multi-modal feature of each of the video segments.  
Shetty, 0039-0041, 




In regarding to claim 7 Shetty and Dunlop teaches:

7. The method of claim 1, further comprising: further Shetty teaches acquiring respective scene purity degrees corresponding to the segmentation levels; 
Shetty, 0050-0052 and Fig.3 
 
determining, among the scene purity degrees, a segmentation level having a scene purity degree closest to a pre-configured scene purity degree as a recommended segmentation level; 
Shetty, 0050-0052 and Fig.3 
 
and determining a longest time duration of a set of scene-based video segments at the recommended segmentation level as a recommended segment time duration.
Shetty, 0050-0052 and Fig.3.

Claims 8-13 list all similar elements of claims 1-6, but in apparatus form rather than method form.  Therefore, the supporting rationale of the rejection to claims 1-6 applies equally as well to claims 8-13.
Claims 14-20 list all similar elements of claims 1-7, but in non-transitory computer readable medium form rather than method form.  Therefore, the supporting rationale of the rejection to claims 1-7 applies equally as well to claims 14-20.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL T TEKLE whose telephone number is (571)270-1117. The examiner can normally be reached Monday-Friday 8:00-4:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached on 571-272-3922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL T TEKLE/Examiner, Art Unit 2481