Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
35 USC § 112, Sixth Paragraph
Independent claim 1 and dependent claims 2-10 do not invoke 35 USC 112, sixth paragraph as the language used includes a specific structure, such as “a processor”, that performs the steps. When a specific structure is used in the claim language to perform the very same steps, it fails the requirements of 35 USC 112, sixth paragraph.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-7, 9, 11-17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Novikoff (US 2018/0068019) in view of Kang et al. (US 2018/0115706) and further in view of Chun et al. (US 2019/0364211).
Regarding claim 1, Novikoff teaches a video processing system (see system in Fig. 1), comprising: 
a user interface component, executed by at least one processor (Figs. 6A-6G teaches a user interface), configured to: 
accept user sourced video as input (paragraph 172 teaches user’s video being used as input for the system); 
display editing operations, including at least one automatic editing function (paragraph 172-177 at least teaches an automatic editing function in the form of selecting themes 604, or using screen 620 to start an automatic editing function); 
a video processing component, executed by at least one processor (Figs. 1 and 7, the client device uses a processor 702 to performs the functions of the system), configure to: 
analyze video segments of the video input, the video segments each having a duration (paragraphs 40-46 and 122 teaches wherein video input (images) are processed to analyze it. Each of the video images innately has a length associated with it);
	transform the video input into a semantic embedding space (at least paragraphs 40-46 and 122 teaches wherein video input (images) are processed to identify image characteristics and thereafter identify the image criteria, which thereby places the images in accordance with their image criteria, including various semantics commensurate with the applicant’s specification in page 10, of objects, scene attributes, scene categories and object categories, etc. Therefore, the input video is transformed into a semantic space because the input videos are categories/examined to be in accordance with their characteristics); and 
classify the transformed video into at least one of contextual categories or spatial layout categories (paragraphs 12 and 122 teaches wherein the input video (images) are categorized into various when “the images may be retrieved, received, etc. and their image characteristics examined by the system”. Therefore, the examination and determination of the image characteristics meets the claimed classifying the transformed video in contextual categories or spatial layout categories since it is a crucial part of the process in Fig. 4 to determine which subset of video (images) qualifies for a particular theme); 
edit automatically at least one video segment of the video input, (Fig. 4, step 416-420 teaches automatic editing the video input (images), including selecting at least one of the plurality of images selected); 
link or interleave video segments including the edited segments based at least in part on the contextual categories to generate a sequencing of video (Fig. 4, steps 416 to 420 results in an automatic video editing system which links all the plurality of video (images) together for a selected theme. The process is also repeated such that when a plurality of themes (Yes in step 418) is present, the system generates and automatically edits and selects a subset of images that meets the requirements of the plurality of themes); and 
wherein the user interface component is further configured to generate a rough-cut video output including the sequence of video (Fig. 4, step 422 and Fig. 6, section 677 results in the generated version of the movie being generated, which is the same as a rough cut video output as claimed).
Novikoff teaches the claimed as discussed above the ability to transform the video segment into an embedding space and for classifying the transformed video segments, however, fails to explicitly teach a numerical representation, but Kang teaches the claimed: transform the video segments into a semantic embedding space comprising a numerical representation of the respective video segments (paragraph 74 teaches wherein incoming video is transformed using the neural network to generate  semantic scores and visual saliency scores. These scores represent a numerical representation of the incoming video, which is then used to assist in assigning semantic labels to the video sections);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Kang into the system of Novikoff such that the video segments are transformed into a semantic embedding space that utilizes a numerical representation, because, it would allow the system of Novikoff to benefit from improving its system by helping it find interesting regions within the video data (Kang: paragraphs 74).
However, while Novikoff teaches the selection of a plurality of video segments to be linked and interleaved to generate the rough-cut video output, fails to explicitly teach, however Chun teaches that the editing automatically also includes: the edit including at least one of altering the duration of the at least one video segment or introducing at least one visual effect into the at least one video segment (paragraphs 74, 76 and 128 wherein after a number of video sequences are decided upon to be used to create an output video, transition effects are applied to the video sequences).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Chun into the system of Novikoff because such an incorporation allows for the benefit of improving the quality of the output video (paragraph 74).
Regarding claim 3, Novikoff teaches the claimed wherein the video processing component includes at least a first neural network configured to transform the video input into a semantic embedding space (at least paragraphs 40-46 and 122 teaches wherein video input (images) are processed to identify image characteristics and thereafter identify the image criteria, which thereby places the images in accordance with their image criteria, including various semantics commensurate with the applicant’s specification in page 10, of objects, scene attributes, scene categories and object categories, etc. Therefore, the input video is transformed into a semantic space because the input videos are categories/examined to be in accordance with their characteristics. With regards to the first neural network, attention is directed to paragraphs 197-200 teaches wherein machine learning algorithms (neural networks) are used to determine image content types in images, providing suggested image criteria and themes, etc. Paragraph 199 is more specific to the neural networks can be divided into multiple layers/parts to perform each of the processes of the machine learning application). 
Regarding claim 4, Novikoff teaches the claimed wherein the first neural network comprises a convolutional neural network (at least paragraphs 40-46 and 122 teaches wherein video input (images) are processed to identify image characteristics and thereafter identify the image criteria, which thereby places the images in accordance with their image criteria, including various semantics commensurate with the applicant’s specification in page 10, of objects, scene attributes, scene categories and object categories, etc. Therefore, the input video is transformed into a semantic space because the input videos are categories/examined to be in accordance with their characteristics. With regards to the first neural network, attention is directed to paragraphs 197-200 teaches wherein machine learning algorithms (neural networks) are used to determine image content types in images, providing suggested image criteria and themes, etc. Paragraph 199 is more specific to the neural networks can be divided into multiple layers/parts to perform each of the processes of the machine learning application. Paragraph 199 teaches a CNN specifically).
Regarding claim 5, Novikoff teaches the claimed wherein the first neural network is configured to classify user video into visual concept categories (at least paragraphs 40-46 and 122 teaches wherein video input (images) are processed to identify image characteristics and thereafter identify the image criteria, which thereby places the images in accordance with their image criteria, including various semantics commensurate with the applicant’s specification in page 10, of objects, scene attributes, scene categories and object categories, etc. Therefore, the input video is transformed into a semantic space because the input videos are categories/examined to be in accordance with their characteristics and/or categories, such as Winter fun, Wedding Celebrations, etc. With regards to the first neural network, attention is directed to paragraphs 197-200 teaches wherein machine learning algorithms (neural networks) are used to determine image content types in images, providing suggested image criteria and themes, etc. Paragraph 199 is more specific to the neural networks can be divided into multiple layers/parts to perform each of the processes of the machine learning application).
Regarding claim 6, Novikoff teaches the claimed wherein the video processing component further comprises a second neural network configured to determine a narrative goal associated with the user sourced video or the sequence of video to be displayed (Figs. 6A at least teaches a narrative goal in the form a selected theme that tells a particular story based on the selected theme. With regards to the second neural network, attention is directed to paragraphs 197-200 teaches wherein machine learning algorithms (neural networks) are used to determine image content types in images, providing suggested image criteria and themes, etc. Paragraph 199 is more specific to the neural networks can be divided into multiple layers/parts to perform each of the processes of the machine learning application).
Regarding claim 7, Novikoff teaches the claimed wherein the second neural network comprises a long-term short-term memory recurrent network (paragraph 200).
Regarding claim 9, Novikoff teaches the claimed wherein the video processing component is further configured to automatically select at least one soundtrack for the user sourced video (paragraphs 81-89. With regards to the second neural network, attention is directed to paragraphs 197-200 teaches wherein machine learning algorithms (neural networks) are used to determine image content types in images, providing suggested image criteria and themes, etc. Paragraph 199 is more specific to the neural networks can be divided into multiple layers/parts to perform each of the processes of the machine learning application).
Claims 11-17 and 19 are rejected for the same reasons as discussed in claims 1-7 and 9, respectively.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Novikoff (US 2018/0068019) in view of Kang et al. (US 2018/0115706) and further in view of Chun et al. (US 2019/0364211) and further in view of Newman et al. (US 2019/0208124).
Regarding claim 2, Novikoff, Kang and Chun teaches the claimed further comprising a narrative component configured to: automatically identify a narrative goal based on analysis of the user sourced video input; and define the sequencing of video to convey the narrative goal based on a machine learning algorithm trained on film-based categorization, the film based categorizations including at least cinematic style (Novikoff partially teaches this in Figs. 6A, wherein a narrative goal in the form a selected theme is used to edit the video).
However, Novikoff, Kang and Chun are silent towards define the sequencing of video to convey the narrative goal based on a machine learning algorithm trained on film-based categorization, the film based categorizations including at least cinematic style.
Newman teaches the claimed in paragraph 69 and 78 wherein machine learning is used to implement a particular cinematic style to the output video.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Newman into the proposed combination of Novikoff, Kang and Chun because said incorporation allows for the benefit of improving the user experience by not overwhelming the user with choices (paragraph 78).


Claims 8, 10, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Novikoff (US 2018/0068019) in view of Kang et al. (US 2018/0115706) and further in view of Chun et al. (US 2019/0364211) in view of Chen et al. (US 2012/0033132).
	Regarding claims 8 and 10, Novikoff in its combination with Kang and Chun teaches selecting a soundtrack to the video (paragraphs 81-89) and teaches the second neural network as discussed in claim 7 above, however, fails to teach “to classify visual beats within user sourced video, and wherein the video processing component is configured to re-time user sourced video based on aligning the visual beats with music beats of the at least one soundtrack”.
In an analogous art, Chen teaches the claimed in paragraph 44, 92 and 97, wherein the system first analyzes the video to determine the visual beat/tempo rate and then the system works to “align the audio beats of the new piece of music with the visual beats” after an audio track is selected to be played back in synchronization with a video.
It would have been obvious to one of ordinary skill in the art at the time of the invention to incorporate the teachings of Chen into the second neural network of Novikoff (in its proposed combination with Kang and Chun) because said incorporation allows for the benefit of improving the user experience by helping improve the perception of beats or rhythmic events (paragraph 0030). 
Claims 18 and 20 are rejected for the same reasons as discussed in claims 8 and 10.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GELEK W TOPGYAL whose telephone number is (571)272-8891. The examiner can normally be reached M-F (9:30-6 PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached on 571-272-3922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GELEK W TOPGYAL/Primary Examiner, Art Unit 2481