DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Response to Arguments
3.	Applicant’s arguments with respect to the rejections of claims 1-6 and 8-20 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, new grounds of rejection are made below.
	
Response to Amendment
4.	In response to the amendment, the rejection of claim 15 under 35 U.S.C. 112 is withdrawn.        

Claim Rejections - 35 USC § 103
5.	The text of those sections of Title 35, U.S. Code not included in this section can be found in a prior Office action.

6.	Claims 1, 9, and 16 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Xiao et al. (US Publication 2018/0025079, hereinafter Xiao). 
	Regarding claim 1, Xiao discloses a method, comprising:
obtaining a video from a video repository (Xiao, Abstract and para’s 0007-0008, obtain a video for labeling); 
performing semantic recognition on the video in a one or more semantic recognition dimensions to obtain one or more video label data items corresponding to the video in the one or more semantic recognition dimensions (Xiao, Abstract and para’s 0007-0008, performing model classification such as semantic recognition of features to predict feature tags/labels respectively for video frames in the video, see para. 0028, search for “a kiss scene” or “a funny shot” in the specified video);
generating at least one candidate label combination based on at least one of the one or more video label data items (Xiao, para’s 0007-0008, one or more  candidate videos including the obtained video have been labeled with tags/labels as described above; para. 0028, tags/labels includes “a kiss scene” or “a funny shot”); 
determining, based on a target label combination selected by a user from the at least one candidate label combination, one or more video clips in the video corresponding to at least one video label in the target label combination (Xiao, Abstract and para’s 0007-0008, and determining a search tag in response to a search request, searching one or more candidate videos or a specified video that have been labeled with feature tags according to the search tag; para. 0028, return video having “a kiss scene” or “a funny shot”); and
generating at least one target video clip corresponding to the target label combination based on at least one of the one or more video clips (Xiao, para’s 0007-0008, presenting an output video when the output video includes a matching feature tag to the search tag; para. 0028, presenting the video).
Xiao does not explicitly disclose label data items corresponding to the video in the plurality of semantic recognition dimensions; however Xiao specifically teaches (see para. 0028) a search request that searches for “a kiss scene in episode A" or “a funny shot in episode B", and determines, based on a video frame tag marking result, a video frame tag that corresponds to the search request.  Then, the video search apparatus finds, from a candidate video such as a specified video a target video that is marked with the video frame tag, and finally presents the target video. The tags “a kiss scene in episode A" and “a funny shot in episode B" indicate a plurality of semantic dimensions. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use a plurality of semantic dimensions as disclosed above in Xiao’s invention to allow for performing semantic recognition on the video in a plurality of semantic recognition dimensions and for obtaining video label data items corresponding to the video in the plurality of semantic recognition dimensions because doing so would enhance user’s search experience by semantically segmenting video more effectively.

Regarding claims 9 and 16, these claims comprise limitations substantially the same as claim 1; therefore they are rejected by the same rationale. Xiao further discloses processors, memory modules, and computer readable medium for implementing the invention (see Xiao, para. 0146 and fig. 4).   

7.	Claims 1, 2, 5, 6, 9-11, 16, 17, and 20 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Xiao et al. (US Publication 2018/0025079, hereinafter Xiao) in view of Cowburn et al. (US Publication 2021/0150719, hereinafter Cowburn).
	Regarding claim 1, Xiao discloses a method, comprising:
obtaining a video from a video repository (Xiao, Abstract and para’s 0007-0008, obtain a video for labeling); 
performing semantic recognition on the video in a one or more semantic recognition dimensions to obtain one or more video label data items corresponding to the video in the one or more semantic recognition dimensions (Xiao, Abstract and para’s 0007-0008, performing model classification such as semantic recognition of features to predict feature tags/labels respectively for video frames in the video);
generating at least one candidate label combination based on at least one of the one or more video label data items (Xiao, para’s 0007-0008, one or more  candidate videos including the obtained video have been labeled with tags/labels as described above; para. 0028, tags/labels includes “a kiss scene” or “a funny shot”); 
determining, based on a target label combination selected by a user from the at least one candidate label combination, one or more video clips in the video corresponding to at least one video label in the target label combination (Xiao, Abstract and para’s 0007-0008, and determining a search tag in response to a search request, searching one or more candidate videos or a specified video that have been labeled with feature tags according to the search tag; para. 0028, search apparatus finds from a specified video a target video that is marked with the video frame tag, and finally presents the target video); 
generating at least one target video clip corresponding to the target label combination based on at least one of the one or more video clips (Xiao, para’s 0007-0008, presenting an output video when the output video includes a matching feature tag to the search tag; para. 0028, presenting the target video).
Xiao does not explicitly disclose but Cowburn discloses obtaining label data items corresponding to the video in the plurality of semantic recognition dimensions (Cowburn, para. 0097, the semantic segmentation neural network may be used to segment image frames in screen-space into more than two labelled categories). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Cowburn’s features into Xiao’s invention for enhancing user’s search experience by semantically segmenting video more effectively.

Regarding claim 2, Xiao-Cowburn discloses the method of claim 1, the video label data items recording video labels of the plurality of semantic recognition dimensions and playback time periods in the video corresponding to the video labels (Xiao, para’s 0007-0008, obtain tags/labels of the video; para. 0081, presents attribute information for a user to select including segment duration and time points).

Regarding claim 5, Xiao-Cowburn discloses the method of claim 2, for one or more video clips in the video corresponding to the video labels, a start frame of the video clip including an image frame in the video corresponding to a start time point of the Xiao, para. 0081, providing attribute information including segment duration and time points. It is obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that start frame and end frame corresponding to time points can obviously be determined to provide convenience for user to obtain high quality search results).

Regarding claim 6, Xiao-Cowburn discloses the method of claim 5, generating at least one target video clip corresponding to the target label combination based on at least one of the one or more video clips comprising: 
extracting one or more frame sequences corresponding to the one or more video clips from the video based on one or more playback time periods corresponding to the one or more video clips; and generating the target video clip based on the one or more frame sequences and a time sequence of playback time periods corresponding to the extracted one or more frame sequences (Xiao, para. 0081, presents attribute information for a user to select including segment duration and time points; para’s 0007-0008, presenting an output video includes extracting one or more frame sequences based on segment durations).

Regarding claims 9 and 16, these claims comprise limitations substantially the same as claim 1; therefore they are rejected by the same rationale. Xiao-Cowburn further discloses processors, memory modules, and computer readable medium for implementing the invention (see Xiao, para. 0146 and fig. 4).



Regarding claims 10, 11, 17, and 20, these claims comprise limitations substantially the same as claims 2 and 5; therefore they are rejected by similar rationale.

8.	Claims 3, 12-14, and 18 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Xiao-Cowburn, as applied to claims 2, 11, and 17 above, in view of O’Brien et al. (US Publication 2010/0169786, hereinafter O’Brien). 
Regarding claim 3, Xiao-Cowburn discloses the method of claim 2, the generating at least one candidate label combination based on at least one of the one or more video label data items comprising: 
extracting video labels in video label data corresponding to at least two semantic recognition dimensions in the plurality of semantic recognition dimensions and corresponding playback time periods in the video, respectively (Xiao, para’s 0007-0008, 0079; performing model classification may include semantic recognition of features of one or more types/dimensions in the video; para. 0081, presents attribute information for a user to select, such as segment duration and time points, or players). 
Xiao-Cowburn does not explicitly disclose but O’Brien discloses determining whether the playback time periods corresponding to the video labels that coincide in time; and responsive to determining that the playback time periods corresponding to the video labels that coincide in time, generating label combinations based on the video labels as the candidate label combinations (O’Brien, para’s 0098-0099, provide deep tags that may overlap time segments of the same video, therefore a determination can be made whether segments of a video overlaps in time, and a combination of tags/labels of the segments that coincide in time can be obtained).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate O’Brien’s features into Xiao-Cowburn’s invention for enhancing user’s search experience by providing user high quality search results. 

Regarding claims 12 and 18, these claim comprise limitations substantially the same as claim 3; therefore they are rejected by similar rationale.

Regarding claim 13, Xiao-Cowburn-O’Brien discloses the apparatus of claim 12, for one or more video clips in the video corresponding to the video labels, a start frame of the video clip including an image frame in the video corresponding to a start time point of the playback time period, and an end frame thereof including an image frame in the video corresponding to an end time point of the playback time period (Xiao, para. 0081, providing attribute information including segment duration and time points. It is obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that start frame and end frame corresponding to time points can obviously be determined to provide convenience for user to obtain high quality search results).

Regarding claim 14, Xiao-Cowburn-O’Brien discloses the apparatus of claim 13, the logic for generating at least one target video clip corresponding to the target label 
logic, executed by the processor, for extracting one or more frame sequences corresponding to the one or more video clips from the video based on one or more playback time periods corresponding to the one or more video clips, and logic, executed by the processor, for generating the target video clip based on the one or more frame sequences and a time sequence of playback time periods corresponding to the extracted one or more frame sequences (Xiao, para. 0081, presents attribute information for a user to select including segment duration and time points; para’s 0007-0008, presenting an output video includes extracting one or more frame sequences based on segment durations).

9.	Claims 4 and 19 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Xiao-Cowburn-O’Brien, as applied to claims 3 and 18 above, in view of Jindal et al. (US Publication 2021/0103615, hereinafter Jindal). 
Regarding claim 4, Xiao-Cowburn-O’Brien discloses the method of claim 3, further comprising, after generating at least one candidate label combination based on at least one of the one or more video label data items, and before determining, based on a target label combination selected by a user from the at least one candidate label combination, one or more video clips in the video corresponding to at least one video label in the target label combination, obtain the one or more video clips as disclosed by Xiao-Cowburn-O’Brien above.
Xiao-Cowburn-O’Brien does not explicitly disclose but Jindal discloses calculating a label combination score corresponding to each candidate label combination based on Jindal, para’s 0092, the keyframe selector 112 can use one or more content tags provided by the tag generator 118 to select one or more keyframes with a highest matching score relative to a predetermined set of criterion; label weights can be a default values as known in the art);
filtering the at least one candidate label combination to obtain candidate label combinations having label combination scores satisfying a preset score threshold range (Jindal, para. 0092, select one or more keyframes with a highest matching score); and 
displaying the obtained candidate label combinations to the user in a descending display order of the label combination scores (Jindal, at least para. 0074, display keyframes; the descending order of displaying is design choice and as well known in the art ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Jindal’s features into Xiao-Cowburn-O’Brien’s invention for enhancing user’s search experience by providing user high quality search results.

Regarding claim 19, this claim comprises limitations substantially the same as claim 4; therefore it is rejected by similar rationale.
	
10.	Claims 8 and 15 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Xiao-Cowburn, as applied to claims 1 and 9 above, in view of Jindal et al. (US Publication 2021/0103615, hereinafter Jindal).
Regarding claim 8, Xiao-Cowburn discloses the method of claim 1, further comprising, after generating at least one target video clip corresponding to the target label combination based on at least one of the one or more video clips is performed: 
responsive to a plurality of target video clips generated, determining video labels in the target label combination corresponding to each target video clip, respectively (Xiao, para’s 0007-0008, labeled associated with video segments can be obtained).
Xiao-Cowburn does not explicitly disclose but Jindal discloses:
calculating a video clip score of each target video clip based on label weights and label scores of the determined video labels, respectively (Jindal, para’s 0092, the keyframe selector 112 can use one or more content tags provided by the tag generator 118 to select one or more keyframes with a highest matching score relative to a predetermined set of criterion; label weights can be a default values as known in the art); 
filtering the plurality of target video clips to obtain target video clips having video clip scores satisfying a preset clip score threshold range (Jindal, para. 0092, select one or more keyframes with a highest matching score); and 
outputting the obtained target video clips to the user in a descending order of the video clip scores (Jindal, at least para. 0074, display keyframes; the descending order of displaying is design choice and as well known in the art).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Jindal’s features into Xiao-Cowburn’s invention for enhancing user’s search experience by providing user high quality search results.

Regarding claim 15, this claim comprises limitations substantially the same as claim 8; therefore it is rejected by Xiao-Cowburn-Jindal using similar rationale.
Xiao-Cowburn-Jindal further discloses logic, executed by the processor, for extracting at least one frame sequence in the video corresponding to at least one deduplicated playback time period (Note: since the claim does not specifically define how “one de-duplicated playback time period” is determined, it is interpreted as a playback time period; Xiao, para’s 0007-0008, presenting an output video includes extracting one or more frame sequences based on segment duration/period).    

Allowable Subject Matter
11. 	Claim 7 is rejected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all the limitations of the base claim and any intervening claims. 

Conclusion
12.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension 
13.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOI H TRAN whose telephone number is (571)270-5645. The examiner can normally be reached 8:00AM-5:00PM PST FIRST FRIDAY OF BIWEEK OFF.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI TRAN can be reached on 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LOI H TRAN/           Primary Examiner, Art Unit 2484