Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
                                                                  Examiner’s amendment
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Rebecca Rudolph on 06/29/2022.
                                                                        Claims
1.(Currently amended) A method for selecting a video clip, the method comprising:
determining at least two video clips from a video;
for each video clip, performing following excitement determination steps:
inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video; and
determining an excitement of the video clip, based on the relevance between the video frame in the video clip and the title; and
the method further comprising:
determining a target video clip from the video clips, based on the excitement of each of the video clips, 
wherein inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video comprises:
inputting a feature sequence into a first fully connected network module, and outputting a dimension-reduced feature sequence;
inputting the dimension-reduced feature sequence into a forward GRU (gated recurrent unit) module and a reverse GRU module respectively, and splicing outputs of the forward GRU module and the reverse GRU module to obtain an encoded feature sequence; and
inputting the encoded feature sequence and title information of the video into an attention module to obtain the relevance between the video frame and the title, the attention module comprising a second fully connected network module and a Softmax processing module, an output dimension of the second fully connected network module being 1




4. Currently Amended) The method according to claim 3, wherein the pre-established prediction model is obtained by training as follows:
acquiring a training video;
acquiring a feature sequence of a video frame in the training video, title information of the training video, and a relevance between the video frame in the training video and the title information of the training video, based on the acquired video; and
using the acquired feature sequence of the video frame in the training video and the title information of the training video as inputs, and using the relevance between the video frame in the training video and the title information of the training video as an expected output, training a pre-established initial prediction model to obtain the prediction model after training.
7. (Currently amended) A server, comprising:
one or more processors; and
a storage apparatus, storing one or more programs thereon,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform operations, the operations comprising:
determining at least two video clips from a video;
for each video clip, performing following excitement determination steps:
inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video; and
determining an excitement of the video clip, based on the relevance between the video frame in the video clip and the title; and
the operations further comprising:
determining a target video clip from the video clips, based on the excitement of each of the video clips, 
wherein inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video comprises:
inputting a feature sequence into a first fully connected network module, and outputting a dimension-reduced feature sequence;
inputting the dimension-reduced feature sequence into a forward GRU (gated recurrent unit) module and a reverse GRU module respectively, and splicing outputs of the forward GRU module and the reverse GRU module to obtain an encoded feature sequence; and
inputting the encoded feature sequence and title information of the video into an attention module to obtain the relevance between the video frame and the title, the attention module comprising a second fully connected network module and a Softmax processing module, an output dimension of the second fully connected network module being 1




10.(Currently Amended) The server according to claim 9, wherein the pre-established prediction model is obtained by training as follows:
acquiring a training video;
acquiring a feature sequence of a video frame in the training video, title information of the training video, and a relevance between the video frame in the training video and the title information of the training video, based on the acquired video; and
using the acquired feature sequence of the video frame in the training video and the title information of the training video as inputs, and using the relevance between the video frame in the training video and the title information of the training video as an expected output, training a pre-established initial prediction model to obtain the prediction model after training.

13. (Currently amended) A non-transitory computer readable medium, storing a computer program thereon, the program, when executed by a processor, cause the processor to perform operations, the operations comprising:
determining at least two video clips from a video;
for each video clip, performing following excitement determination steps:
inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video; and
determining an excitement of the video clip, based on the relevance between the video frame in the video clip and the title; and
the operations further comprising:
determining a target video clip from the video clips, based on the excitement of each of the video clips, 
wherein inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video comprises:
inputting a feature sequence into a first fully connected network module, and outputting a dimension-reduced feature sequence;
inputting the dimension-reduced feature sequence into a forward GRU (gated recurrent unit) module and a reverse GRU module respectively, and splicing outputs of the forward GRU module and the reverse GRU module to obtain an encoded feature sequence; and
inputting the encoded feature sequence and title information of the video into an attention module to obtain the relevance between the video frame and the title, the attention module comprising a second fully connected network module and a Softmax processing module, an output dimension of the second fully connected network module being 1



16.(Currently Amended) The non-transitory computer readable medium according to claim 15, wherein the pre-established prediction model is obtained by training as follows:
acquiring a training video;
acquiring a feature sequence of a video frame in the training video, title information of the training video, and a relevance between the video frame in the training video and the title information of the training video, based on the acquired video; and
using the acquired feature sequence of the video frame in the training video and the title information of the training video as inputs, and using the relevance between the video frame in the training video and the title information of the training video as an expected output, training a pre-established initial prediction model to obtain the prediction model after training.

                                                         Allowable Subject Matter
Claims 1; 3-7; 9-13; 15-18 are allowed. The following is an examiner’s statement of reasons for allowance: 

“inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video; and wherein inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video comprises:
 inputting a feature sequence into a first fully connected network module, and outputting a dimension-reduced feature sequence; inputting the dimension-reduced feature sequence into a forward GRU (gated recurrent unit) module and a reverse GRU module respectively, and splicing outputs of the forward GRU module and the reverse GRU module to obtain an encoded feature sequence; and inputting the encoded feature sequence and title information of the video into an attention module to obtain the relevance between the video frame and the title, the attention module comprising a second fully connected network module and a Softmax processing module, an output dimension of the second fully connected network module being 1” as recited in claim 1.

“inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video; and wherein inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video comprises: inputting a feature sequence into a first fully connected network module, and outputting a dimension-reduced feature sequence; inputting the dimension-reduced feature sequence into a forward GRU (gated recurrent unit) module and a reverse GRU module respectively, and splicing outputs of the forward GRU module and the reverse GRU module to obtain an encoded feature sequence; and inputting the encoded feature sequence and title information of the video into an attention module to obtain the relevance between the video frame and the title, the attention module comprising a second fully connected network module and a Softmax processing module, an output dimension of the second fully connected network module being 1” as recited in claim 7.

“ inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video; and wherein inputting a feature sequence of a video frame in the video clip and title information of the video into a pre-established prediction model to obtain a relevance between the inputted video frame and a title of the video comprises: inputting a feature sequence into a first fully connected network module, and outputting a dimension-reduced feature sequence; inputting the dimension-reduced feature sequence into a forward GRU (gated recurrent unit) module and a reverse GRU module respectively, and splicing outputs of the forward GRU module and the reverse GRU module to obtain an encoded feature sequence; and inputting the encoded feature sequence and title information of the video into an attention module to obtain the relevance between the video frame and the title, the attention module comprising a second fully connected network module and a Softmax processing module, an output dimension of the second fully connected network module being 1” as recited in claim 13.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
                                                      Conclusions
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEAN D SAINT CYR whose telephone number is (571)270-3224. The examiner can normally be reached 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Pendleton can be reached on 5712727527. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JEAN D SAINT CYR/Examiner, Art Unit 2425             

/Brian T Pendleton/Supervisory Patent Examiner, Art Unit 2425