DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendments, filed 2/25/2021, have been entered and made of record. Claims 1, 8-10, 14, and 16 have been amended. Claim 2 has been cancelled. Claims 1, 3-20 are pending.
Response to Arguments
Applicant’s arguments in the Remarks filed on 2/25/2021 have been considered but are moot in view of the new ground(s) of rejection.
In re page 9, the applicant states “For completeness, the proposed combination of references is flawed. All the rejection has to say about combining Carmichael with Peng is that somehow Peng could be modified by Carmichael "to enhance an end user's experience with user's preference ending of video" (sic). This makes little sense in the context of Peng, which seeks to provide summaries of objects such as videos. The rejection fails to state how or why the skilled artisan would somehow use a few summary frames of a video to create an ending for that video, as the summary itself is taken from the video and is meant to summarize the entire video … The present rejection fails to rise to comply with Nuvasive, nowhere explaining why the PHOSITA implementing the summary concept of Peng would want to try to use that summary as an ending for the thing being summarized, much less how that might be accomplished”.
In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, Peng teaches generate personalized summary of each content object further based on one or more machine-learning models with multi-modal user input includes voice, text, image and video. Peng is silent about presenting at least one user interface (UI) allowing a user to select between generating an ending in video format and an ending in cartoon format and generating at least one ending to a movie provided by a movie producer other than an end user. Carmichael teaches collaboration that allows a user to write their own ending, their own chapter; their own videos or animations and graphics for use by other users through a user interface. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng with the above teachings of Carmichael in order to provide different ending of video summary so viewers can improve focusing the video content efficiently.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Peng in view of Carmichael, Chen and Haro
Claims 1, 3-7 and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Peng et al.(USPubN 2019/0325084; hereinafter Peng) in view of Carmichael(USPubN 2012/0054813) further in view of Chen et al.(WO 2014/134801; Published September 12 2014; hereinafter Chen) further in view of Haro et al.(USPubN 2007/0162873; hereinafter Haro).
As per claim 16, Peng teaches a method, comprising: 
receiving video and/or photographic input generated by at least one end user(“The assistant system 140 may enable the user to interact with it with multi-modal user input (such as voice, text, image, video) in stateful and multi-turn conversations to get assistance. The assistant system 140 may create and store a user profile comprising both personal and contextual information associated with the user” in Para.[0038], “the assistant system 140 may receive a user input from the assistant application 136 in the client system 130 associated with the user … If the user input is based on an image or video modality, the assistant system 140 may process it using optical character recognition techniques within the messaging platform 205 to convert the user input into text” in Para.[0039]); and 
based at least in part on the video and/or photographic input and selection from the user interface (UI), generating at least one video(“The assistant system 140 may create and store a user profile comprising both personal and contextual information associated with the user. In particular embodiments, the assistant system 140 may analyze the user input using natural-language understanding.” in Para.[0038], “The assistant system 140 may receive a user request for a summarization of contents, identify user interests based on the user profile and contextual information of the user, determine one or more modalities for the summarization based on the contextual information of the user request and the user's client system 130, generate a summary for each of the contents in a personalized and context-aware manner, generate a digest of all the summaries based on the identified interests in a personalized and context-aware manner, and send the digest via the determined modalities to the user” in Para.[0062], “the assistant system 140 to efficiently process a content object may comprise a video and the summary of the content object may comprise a predetermined number of frames associated with the video generated by the one or more machine-learning models” in Para.[0069]).
Peng is silent about presenting at least one user interface (UI) allowing a user to select between generating an ending in video format and an ending in cartoon format, the UI comprising a cartoon ending selector and a movie style selector, wherein selection of the cartoon ending selector indicates a cartoon ending to be generated, the cartoon ending comprising an image hand drawn or graphically produced and comprising no frame of filmed video, and selection of the movie style selector indicates a movie style ending to be generated and generating at least one ending to a movie provided by a movie producer other than an end user, wherein the images being used to generate objects and background in the ending, video being used to generate object models to be incorporated into the ending.
Carmichael teaches presenting at least one user interface (UI) allowing a user to select between generating an ending in video format and an ending in animation format(“The user interface as shown as 290 may be any conventional kind of input device including wireless, wired, or any other kind of input device” in Para.[0030], “The processor can be part of a computer system that also has a user interface port that communicates with a user interface, and which receives commands entered by a user, has at least one memory (e.g., hard drive or other comparable storage, and random access memory) that stores electronic information including a program that operates under control of the processor and with communication via the user interface port, and a video output that produces its output via any kind of video output format, e.g., VGA, DVI, HDMI, displayport, or any other form” in Para.[0051], “One embodiment includes collaboration that allows a user to write their own ending, their own chapter; their own videos or animations and graphics for use by other users. As examples of users own videos, may include videos of events about the subject of the media, or recreations such as 
generating at least one ending to a movie provided by a movie producer other than an end user(“One embodiment includes collaboration that allows a user to write their own ending, their own chapter; their own videos or animations and graphics for use by other users. As examples of users own videos, may include videos of events about the subject of the media, or recreations such as "cosplay" about the subject of the media” in Para.[0035]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng with the above teachings of Carmichael in order to provide different ending of video summary so viewers can improve focusing the video content efficiently.
Chen teaches the UI comprising a cartoon style selector and a movie style selector, wherein selection of the cartoon style selector indicates a cartoon style to be generated, the cartoon ending comprising an image hand drawn or graphically produced and comprising no frame of filmed video, and selection of the movie style selector indicates a movie style to be generated(“Various implementations relate to providing a pictorial summary, also referred to as a comic book or a narrative abstraction. In one particular implementation, one or more parameters from a configuration guide are accessed. The configuration guide includes one or more parameters for configuring a pictorial summary of a video” in Abs, “- specifying whether the pictorial summary is to be generated with an animated look, using a cartoonization field 552” in Page 18, “the comic book section 510 of the screen 500 allows a user to specify, at least, one or more of (i) a range from a video that is to be used in generating a pictorial summary, (ii) a width for a picture in the generated pictorial summary, (iii) a height for a picture in the generated pictorial summary, (iv) a horizontal gap for separating pictures in the generated pictorial 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng and Carmichael with the above teachings of Chen in order to incorporate different style to video segment for providing that the user can also easily view the specific segment of video to which the user is referring.
Haro teaches wherein the images being used to generate objects and background in the ending, video being used to generate object models to be incorporated into the ending(“identify one or more objects in one or more of the video transitions based upon the measured motion and identified planes of texturally similar regions, as shown in block 34. In this regard, the identified texturally similar regions can be clustered based on textural similarity to acquire clusters of textured regions (e.g. outlined strawberries)” in Para.[0029], “one or more objects in one or more of the video transitions are 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng, Carmichael and Chen with the above teachings of Haro in order to enhance generating composite videos with desired image objects efficiently.
As per claim 17, Peng, Carmichael, Chen and Haro teach all of limitation of claim 16. 
Peng and Carmichael are silent about wherein the ending comprises at least one cartoon.
Chen teaches wherein the summary comprises at least one cartoon(“Various implementations relate to providing a pictorial summary, also referred to as a comic book or a narrative abstraction. In one particular implementation, one or more parameters from a configuration guide are accessed. The configuration guide includes one or more parameters for configuring a pictorial summary of a video” in Abs, “- specifying whether the pictorial summary is to be generated with an animated look, using a cartoonization field 552” in Page 18, “the comic book section 510 of the screen 500 allows a user to specify, at least, one or more of (i) a range from a video that is to be used in generating a pictorial summary, (ii) a width for a picture in the generated pictorial summary, (iii) a height for a picture in the generated pictorial summary, (iv) a horizontal gap for separating pictures in the generated pictorial summary, (v) a vertical gap for separating pictures in the generated pictorial summary, or (vi) a value indicating a desired number of pages for the generated pictorial summary … "Movie2Comic" tool 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng and Carmichael with the above teachings of Chen in order to incorporate different style to video segment for providing that the user can also easily view the specific segment of video to which the user is referring.	
As per claim 18, Peng, Carmichael, Chen and Haro teach all of limitation of claim 16. 
Peng and Carmichael are silent about wherein the ending comprises at least one movie clip.
Chen teaches wherein the summary comprises at least one movie clip(“Various implementations relate to providing a pictorial summary, also referred to as a comic book or a narrative abstraction. In one particular implementation, one or more parameters from a configuration guide are accessed. The configuration guide includes one or more parameters for configuring a pictorial summary of a video” in Abs, “- specifying whether the pictorial summary is to be generated with an animated look, using a cartoonization field 552” in Page 18, “the comic book section 510 of the screen 500 allows a user to specify, at least, one or more of (i) a range from a video that is to be used in generating a pictorial summary, (ii) a width for a picture in the generated pictorial summary, (iii) a height for a picture in the generated pictorial summary, (iv) a horizontal gap for separating pictures in the generated pictorial summary, (v) a vertical gap for separating pictures in the generated pictorial summary, or (vi) a value indicating a desired number of pages for the generated pictorial summary … "Movie2Comic" tool 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng and Carmichael with the above teachings of Chen in order to incorporate different style to video segment for providing that the user can also easily view the specific segment of video to which the user is referring.
As per claim 19, Peng, Carmichael, Chen and Haro teach all of limitation of claim 16. 
Peng teaches comprising generating the ending to the movie at least in part using a neural network (NN)(“the assistant system 140 may generate the summary of each content object further based on one or more machine-learning models. In particular embodiments, generating summaries based on machine-learning models may be suitable for content objects that are difficult to be associated with attributes. As an example and not by way of limitation, the one or more machine-learning models may be trained based on one or more neural networks. For example, the neural networks may comprise generative adversarial networks (GANs). GANs are a class of artificial intelligence algorithms, implemented by a system of two neural networks contesting with each other in a zero-sum game framework” in Para.[0069]).
As per claim 1, Peng teaches an apparatus, comprising: at least one processor configured with instructions executable by the at least one processor(“computer system 900 includes a processor 902, 904, storage 906, an input/output (I/O) interface 908, a communication interface 910, and a bus 912” in Para.[0108]) to: 
receive input from at least one input device(“The assistant system 140 may enable the user to interact with it with multi-modal user input (such as voice, text, image, video) in stateful and multi-turn conversations to get assistance. The assistant system 140 may create and store a user profile comprising both personal and contextual information associated with the user” in Para.[0038]); and 
based at least in part on the input, generate at least one video, the video being unique to the input(“the assistant system 140 may generate the summary of each content object further based on one or more machine-learning models. In particular embodiments, generating summaries based on machine-learning models may be suitable for content objects that are difficult to be associated with attributes. As an example and not by way of limitation, the one or more machine-learning models may be trained based on one or more neural networks. For example, the neural networks may comprise generative adversarial networks (GANs)  … the assistant system 140 to efficiently process various content objects to extract key information from them and generate summaries accordingly. As an example and not by way of limitation, a content object may comprise a video and the summary of the content object may comprise a predetermined number of frames associated with the video generated by the one or more machine-learning models“ in Para.[0069]),
wherein the input comprises photographs or video or photographs and video received from a user(“the assistant system 140 may receive a user input from the assistant application 136 in the client system 130 associated with the user … If the user input is based on an image or video modality, the assistant system 140 may process it using optical character recognition techniques within the messaging platform 205 to convert the user input into text” in Para.[0039]).
Peng is silent about receive, from a source of movies, at least one movie and generate an ending for at least one movie wherein the ending is generated at least in part based on selecting, from at least 
Carmichael teaches receive, from a source of movies, at least one movie(“The server 105 may be the provider of the published product shown as 120” in Para.[0022]) and 
generate an ending for at least one movie wherein the ending is generated(“One embodiment includes collaboration that allows a user to write their own ending, their own chapter; their own videos or animations and graphics for use by other users. As examples of users own videos, may include videos of events about the subject of the media, or recreations such as "cosplay" about the subject of the media” in Para.[0035]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng with the above teachings of Carmichael in order to provide different ending of video summary so viewers can improve focusing the video content efficiently.
Chen teaches wherein the summary is generated at least in part based on selecting, from at least one user interface, a first type of ending selector or a second type offending selector, wherein selection of the first type of ending selector indicates a first type of ending to be generated and selection of the second type of ending selector indicates a second type of ending to be generated, wherein the first type of ending comprises an image hand drawn or graphically produced and comprising no frame of filmed video(“Various implementations relate to providing a pictorial summary, also 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng and Carmichael with the above teachings of Chen in 
Haro teaches the images being used to generate objects and background in the ending to be generated, the images being of objects to be in the ending, video being used to generate object models to be incorporated into the ending(“identify one or more objects in one or more of the video transitions based upon the measured motion and identified planes of texturally similar regions, as shown in block 34. In this regard, the identified texturally similar regions can be clustered based on textural similarity to acquire clusters of textured regions (e.g. outlined strawberries)” in Para.[0029], “one or more objects in one or more of the video transitions are identified, the processing element 14 can thereafter rank or otherwise assign a priority ranking to the identified objects, and select a predetermined number of the objects based on the priority rankings, such as by selecting a predetermined number of the highest ranked objects” in Para.[0030], “After selecting a predetermined number of the objects, the processing element 14 can compose an image (e.g., thumbnail) representation of the video sequence based upon the selected objects, such as by extracting one or more of the selected objects from their respective frames of the video sequence and forming a composite image of the extracted objects” in Para.[0031], The extracted objects can be models by a priority ranking and is incorporated into the image representation.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng, Carmichael and Chen with the above teachings of Haro in order to enhance generating composite videos with desired image objects efficiently.
As per claim 3, Peng, Carmichael, Chen and Haro teach all of limitation of claim 1. 
Peng and Carmichael are silent about wherein the second type of ending comprises at least one movie clip.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng and Carmichael with the above teachings of Chen in order to incorporate different style to video segment for providing that the user can also easily view the specific segment of video to which the user is referring.
As per claim 4, Peng, Carmichael, Chen and Haro teach all of limitation of claim 1. 
Peng teaches wherein the input comprises voice-generated input(“The assistant system 140 may enable the user to interact with it with multi-modal user input (such as voice, text, image, video) in stateful and multi-turn conversations to get assistance” in Para.[0038]).
As per claim 5 Peng, Carmichael, Chen and Haro teach all of limitation of claim 4. 
140 may enable the user to interact with it with multi-modal user input (such as voice, text, image, video) in stateful and multi-turn conversations to get assistance” in Para.[0038]).
As per claim 6, Peng, Carmichael, Chen and Haro teach all of limitation of claim 4. 
Peng teaches wherein the input comprises at least one video clip(“The assistant system 140 may enable the user to interact with it with multi-modal user input (such as voice, text, image, video) in stateful and multi-turn conversations to get assistance” in Para.[0038]).
As per claim 7, Peng, Carmichael, Chen and Haro teach all of limitation of claim 1. 
Peng teaches wherein the ending is generated by at least one neural network (NN)(“the assistant system 140 may generate the summary of each content object further based on one or more machine-learning models. In particular embodiments, generating summaries based on machine-learning models may be suitable for content objects that are difficult to be associated with attributes. As an example and not by way of limitation, the one or more machine-learning models may be trained based on one or more neural networks. For example, the neural networks may comprise generative adversarial networks (GANs). GANs are a class of artificial intelligence algorithms, implemented by a system of two neural networks contesting with each other in a zero-sum game framework” in Para.[0069]).

Peng in view of Carmichael, Chen, Haro and Gemello
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Peng et al.(USPubN 2019/0325084; hereinafter Peng) in view of Carmichael(USPubN 2012/0054813) further in view of Chen et al.(WO 2014/134801; Published September 12 2014; hereinafter Chen) further in view of Haro et al.(USPubN 2007/0162873; hereinafter Haro) further in view of  Gemello et al.(USPubN 2009/0216528; hereinafter Gemello).
As per claim 20, Peng, Carmichael, Chen and Haro teach all of limitation of claim 16.
136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text.” in Para.[0039]); 
a sound module receiving an output of the interpretation module to recognize terms in the voice signals(Para.[0039], “the assistant xbot 215 may send the textual user input to a natural-language understanding (NLU) module 220 to interpret the user input” in Para.[0040]); 
a building block module comprising at least a second NN receiving output of the interpretation module to create scene elements for the video(Para.[0036], [0040], [0041], "generate candidate entities associated with the proactive task based on user profile. The generation may be based on a straightforward backend query using deterministic filters to retrieve the candidate entities from a structured data store. The generation may be alternatively based on a machine-learning model that is trained based on user profile, entity attributes, and relevance between users and entities. As an example and not by way of limitation, the machine-learning model may be based on support vector machines (SVM). As another example and not by way of limitation, the machine-learning model may be based on a regression model. As another example and not by way of limitation, the machine-learning model may be based on a deep convolutional neural network (DCNN). In particular embodiments, the proactive agent 285 may also rank the generated candidate entities based on user profile and the content associated with the candidate entities.” In Para.[0049]); and 
a composition module receiving input from the interpretation module, the building blocks module, the sound module to generate the video(Para.[0050], “the assistant system 140 may generate the summary of each content object further based on one or more machine-learning models. In particular embodiments, generating summaries based on machine-learning models may be suitable for 
Peng is silent about an interpretation model comprising at least a first neural network (NN) and create scene elements for the ending of the movie.
Carmichael teaches create scene elements for an ending of the movie(“One embodiment includes collaboration that allows a user to write their own ending, their own chapter; their own videos or animations and graphics for use by other users. As examples of users own videos, may include videos of events about the subject of the media, or recreations such as "cosplay" about the subject of the media” in Para.[0035]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng with the above teachings of Carmichael in order to provide different ending of video summary so viewers can improve focusing the video content efficiently.
Gemello teaches an interpretation model comprising at least a first neural network (NN)(“Such sentences are usually uttered by different speakers, so that the network is trained in recognizing voice signals uttered with different voice tones, accents, or the like. Besides, different phonic channels are usually employed, such as different fixed or mobile telephones, or the like. Besides, the sentences are uttered in different environments (car, street, train, or the like), so that the neural network is trained in recognising voice signals affected by different types of noise” in Para.[0028]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Peng and Carmichael with the above teachings of Gemello in order to incorporate the neural network for improve accuracy of recognizing voice signal by different types of noise.
Allowable Subject Matter
Claims 8-15 are allowed.
The prior art of record (in particular, Peng et al.(USPubN 2019/0325084)) does not disclose, with respect to claim 8, providing input regarding at least one movie to an interpretation model comprising at least a first neural network (NN) to recognize voice signals of the input; provide an output of the interpretation module to a sound module to recognize terms in the voice signals of the input; provide an output of the interpretation module to a building block module comprising at least a second NN to create scene elements for an ending of the movie; and provide an output of the interpretation module, an output of the building blocks module, and an output of the sound module to a composition module to generate the ending of the movie, wherein first portions of the voice input are classified as narrations describing what characters in the ending would say, second portions of the voice input are classified as instructions for a final composition of the ending, the instructions comprising one or more of instructions for layering/relative positioning of generated images to be merged into a single frame, instructions for camera location, instructions for speech timing as claimed.  Rather, Peng et al. discloses a method includes receiving a user request for a summarization of a particular type of content objects from a client system associated with a first user, determining one or more modalities associated with the user request, selecting a plurality of content objects of the particular type based on a user profile of the first user, wherein the user profile comprises one or more confidence scores associated with one or more subjects associated with the first user, respectively, and wherein the plurality of content objects are selected based on the one or more confidence scores, generating a summary of each content object based on the user profile and the determined modalities, and sending, to the client system in response to the user request, instructions for presenting the summaries of the plurality of content objects, wherein the summaries are presented via one or more of the determined modalities.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUNGHYOUN PARK whose telephone number is (571)270-1333.  The examiner can normally be reached on M - Thur 6:00 am - 4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI Q TRAN can be reached on (571)272-7382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/SUNGHYOUN PARK/Examiner, Art Unit 2484