DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Von Sneidern
Claims 1, 4, 5, 7-10, 12-14, 17, 18, and 20 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Von Sneidern et al.(USPubN 2016/0071549; hereinafter Von Sneidern).
As per claim 1, Von Sneidern teaches a system comprising: at least one processor; and at least one memory having stored thereon instructions that, when executed by the at least one processor(“The controller 120 may be communicatively coupled with the camera 110 and the microphone 115 and/or may control the operation of the camera 110 and the microphone 115. The controller 120 may also be 120 may also perform various types of processing, filtering, compression, etc. of video data and/or audio data prior to storing the video data and/or audio data into the memory 125.” In Para.[0034]), control the at least one processor to: 
receive a video file(“A source video is a video or a collection of videos recorded by a video camera or multiple video cameras. A source video may include one or more video frames (a single video frame may be a photograph) and/or may include metadata” in Para.[0025]); 
pre-process the video file to provide a timestamped transcript(“The metadata have been collected when the video was recorded or created from the video (or audio) data during post processing” in Para.[0034], “metadata comprises at least one of: geolocation data, motion data, people tag data, voice tag data, motion tag data, time data, or audio data” in Claim 16, The post processing can be interpreted as pre-process because the post processing is processing video after recorded.); 
sample across the timestamped transcript to generate a plurality of timestamped fragments; analyze the plurality of timestamped fragments to identify a likelihood of each fragment containing a highlight(“the processing logic may compare the metadata and/or additional features of the source video with a baseline feature set that indicates interesting metadata. The baseline feature set may include baseline data pertaining to some or all of the metadata and some or all of the additional features computed at block 515. For example, the baseline feature set may include features that may be useful to predict interestingness for the video frames … The processing logic may analyze the metadata against the baseline feature set to identify any video frames that are associated with interesting metadata” in Para.[0071]); 
extract, from the video file, a plurality of video clips corresponding to the fragments having a likelihood of containing a highlight greater than a threshold(“the processing logic may determine whether at least one video frame of the source video includes a feature value that exceeds an 
compile the plurality of video clips to generate a highlight video of the video file(”creating a compilation video from one or more source videos. The compilation video may be created to be a manageable length that may highlight many of the interesting parts of the one or more source videos while filtering out the less interesting parts” in Para.[0023]).
As per claim 4, Von Sneidern teaches wherein sampling across the timestamped transcript comprises applying at least from among a neural network to fragment the timestamped transcript, smart text fragmentation, boundary identification, beam search fragmentation, and peak extraction(“In some embodiments, a machine learning system may iteratively identify features that are common to videos that are selected as being “interesting.” The machine learning system may include those features in the baseline feature set while excluding other features from the baseline feature set. For example, the machine learning system may include the top 10 features in the baseline feature set, which, in some embodiments, may include an average magnitude of user acceleration vector, a ratio of panning frames to total frames, a median magnitude of user acceleration vector, a shake value, a ratio of tilting frames to total frames, a first DCT component of user acceleration, a third DCT component of user acceleration, a maximum value of a tilting speed, a first DCT component of roll, and a maximum distance from vertical region of interest” in Para.[0071], Machine learning system can be interpreted as a neural network.).
As per claim 5, Von Sneidern teaches wherein analyzing the plurality of timestamped fragments comprises applying a neural network to each timestamped fragment to generate respective likelihoods that each fragment contains a highlight(“In some embodiments, a machine learning system may iteratively identify features that are common to videos that are selected as being “interesting.” The machine learning system may include those features in the baseline feature set while excluding other 
As per claim 7, Von Sneidern teaches wherein analyzing the plurality of timestamped fragments further comprises cross-checking the fragments against designated attributes for desired highlights and identifying as highlights fragments that both have a high likelihood of containing a highlight and correspond to the designated attributes(“the baseline feature set may include machine-learned data that indicates metadata that is likely to be selected by a user for inclusion in a compilation video. For example, when a threshold number of features are for video clips that include people riding bicycles and for video clips that were captured a threshold distance away from buildings, the machine-learned data can determine that any video clips that include people riding bicycles away from buildings are likely to be relevant and are to be included in the compilation video” in Para.[0080]).
As per claim 8, Von Sneidern teaches wherein only fragments identified as having a high likelihood of each containing a highlight are cross-checked against designated attributes(Para.[0080]).
As per claim 9, Von Sneidern teaches wherein only fragments cross-checked against designated attributes are analyzed to determine whether they have a high likelihood of each containing a highlight (Para.[0080]).
As per claim 10, Von Sneidern teaches wherein extracting the plurality of video clips comprises: constructing a superset of highlights by merging overlapping identified fragments; and extracting the superset of highlights as the plurality of video clips(“A video clip is a collection of one or more 
As per claim 12, Von Sneidern teaches wherein receiving the video file comprises retrieving the video file from a designated location(“A relevance score may indicate, for example, a level of interestingness of the content in a video clip, which may include a level of excitement occurring with the source video as represented by motion data, the location where the source video was recorded, the time or date the source video was recorded, the words used in the source video, the tone of voices within the source video, and/or the faces of individuals within the source video, among others” in Para.[0024]).
As per claim 13, Von Sneidern teaches wherein analyzing the plurality of timestamped fragments comprises converting words within the timestamped fragments into embeddings(“The relevance score may be used to designate the interestingness of a video clip. The relevance score may be represented as a feature vector or as a mathematical manipulation of the feature vector (e.g., a 
As per claim 14, the limitations in the claim 14 has been discussed in the rejection claim 1 and rejected under the same rationale. 	
As per claim 17, the limitations in the claim 17 has been discussed in the rejection claim 4 and rejected under the same rationale.
As per claim 18, the limitations in the claim 18 has been discussed in the rejection claim 5 and rejected under the same rationale.
As per claim 20, Von Sneidern teaches A non-transitory computer readable medium having stored thereon computer program code(“a non-transitory computer readable storage medium having encoded therein programming code executable by a processor” in Para.[0007]) that, when executed by one or more processors and the other limitations in the claim 20 has been discussed in the rejection claim 1 and rejected under the same rationale.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 


Von Sneidern in view of Al-Shameri
Claims 2 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Von Sneidern et al.(USPubN 2016/0071549; hereinafter Von Sneidern) in view of Al-Shameri et al.(USPubN 2010/0223276; hereinafter Al-Shameri).
As per claim 2, Von Sneidern teaches all of limitation of claim 1. 
Von Sneidern is silent about wherein pre-processing the video file comprises transcribing the video file with punctuation and stemming the transcription.
Al-Shameri teaches wherein pre-processing the video file comprises transcribing the video file with punctuation and stemming the transcription(“All end-of-sentence punctuation, other than a period, such as question mark and exclamation points are converted to a period. Remove XML code. Denoise by removing stopper words, e.g. words that have little meaning such as "of", "the", "a", "an" and so on. Stern words to root, for example words like move, moved, moving will be "mov" after stemming.” in Para.[0467]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Von Sneidern with the above teachings of Al-Shameri in order to improve processing speed of video easily and simplify.
As per claim 15, the limitations in the claim 15 has been discussed in the rejection claim 2 and rejected under the same rationale.

Von Sneidern in view of Chen
Claims 3 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Von Sneidern et al.(USPubN 2016/0071549; hereinafter Von Sneidern) in view of Chen et al.(USPubN 2020/0026767; hereinafter Chen).
As per claim 3, Von Sneidern teaches all of limitation of claim 1. 
Von Sneidern is silent about wherein sampling across the timestamped transcript comprises sampling the timestamped transcript across minimum and maximum sentence count limits.
Chen teaches wherein sampling across the timestamped transcript comprises sampling the timestamped transcript across minimum and maximum sentence count limits(“preliminary titles may be generated by selecting the first sentence of the post with a sentence length of between a minimum and maximum number of words at 220. For example, a minimum of 4 words and a maximum of 12 words may be used” in Para.[0030]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Von Sneidern with the above teachings of Chen in order to enhance the user experience of media content by facilitating sampling desire video segments.
As per claim 16, the limitations in the claim 16 has been discussed in the rejection claim 3 and rejected under the same rationale.

Von Sneidern in view of Mehrseresht
Claims 6 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Von Sneidern et al.(USPubN 2016/0071549; hereinafter Von Sneidern) in view of Mehrseresht(USPubN 2019/0188866)
As per claim 6, Von Sneidern teaches all of limitation of claim 5. 
Von Sneidern is silent about where the neural network comprises a Long Short-term Memory model (LSTM) with attention.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Von Sneidern with the above teachings of Mehrseresht in order to improve analyzing highlights of video efficiently and accurately.
As per claim 19, the limitations in the claim 19 has been discussed in the rejection claim 6 and rejected under the same rationale.

Von Sneidern in view of Kim
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Von Sneidern et al.(USPubN 2016/0071549; hereinafter Von Sneidern) in view of Kim et al.(USPubN 2009/0077034; hereinafter Kim).
As per claim 11, Von Sneidern teaches all of limitation of claim 1. 
Von Sneidern is silent about wherein extracting the plurality of video clips comprises performing boundary detection within the identified fragments and extracting, from the video file, a plurality of video clips corresponding to the fragments without crossing detected boundaries.
Kim teaches wherein extracting the plurality of video clips comprises performing boundary detection within the identified fragments and extracting, from the video file, a plurality of video clips corresponding to the fragments without crossing detected boundaries(“a scene change detecting unit which detects the scene-change boundaries between the images of the multimedia to partition the multimedia into small sized meaning groups of scene-images; a representative frame extracting unit which extracts the representative frames of the partitioned scene-images” in Para.[0018]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUNGHYOUN PARK whose telephone number is (571)270-1333.  The examiner can normally be reached on M - Thur 6:00 am - 4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI Q TRAN can be reached on (571)272-7382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.