DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed May 10, 2021 have been fully considered but they are not persuasive.

With regard to claim 1, Applicant submits that Kamdar does not teach collecting usage information for video summaries relating to the user's interaction with a particular area of the video summary window and then feeding that usage information to a machine learning model. Remarks, p. 9.
Claim 1 recites, in part,
collecting, using the at least one processor, video summary usage information based on the viewing of the at least one video summary by the user of the user device, wherein the video summary usage information comprises information relating to the user's interaction with a particular area of the video summary window during the viewing of the at least one video summary; and
feeding, using the at least one processor, the video summary usage information into a machine learning model to update the machine learning model, wherein the updated machine learning model is configured to create improved video summaries of input videos.

Claim 1 is rejected over a combination of Riveiro Insua et al. (US 2013/0081082), Kamdar (US 2014/0075463), and Emery et al. (US 9615136).
As presented in the claim rejections under 35 USC § 103, Kamdar teaches:

wherein the usage information comprises information relating to the user's interaction with a particular area of the video summary window during viewing of the at least one video summary ([0032], “A user 502 is depicted viewing and listening to a television program, which may be, for example, an Internet TV program, an IPTV program, or a television based Internet-streamed or otherwise presented or downloaded video, etc.” [0033], “A volume indicator 506 is depicted, including a pointer that indicates the current volume level. The volume level is controllable by the user.” [0036], “Block 512 represents information stored in one or more databases, including volume and volume change related information, information about the content being played during the detected volume change, and information about the user who viewed the content and changed the volume or who is presumed to or is determined to be likely to be the user who has viewed the content and changed the volume.”); and
feeding, using the at least one processor, the usage information into a machine learning model to update the machine learning model ([0042], “The machine learning model(s) may be used in some embodiments of the invention, 
wherein the updated machine learning model is configured to create improved content selections ([0042], “The machine learning model(s) may be used in some embodiments of the invention, such as when machine learning techniques are used in advertisement targeting. For example, in some embodiments, features of advertisements and features of users, as well as volume-related information, historical advertisement performance information, and other information, may be used as input into a machine learning model. The model may then be used in advertisement selection, optimization, etc.” Fig. 5).
In view of Kamdar’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Riveiro Insua to include collecting, using the at least one processor, video summary usage information based on the viewing of the at least one video summary by the user of the user device, wherein the video summary usage information comprises information relating to the user's interaction with a particular area of the video summary window during the viewing of the at least one video summary, and feeding, using the at least one processor, the video summary usage information into a machine learning model to update the machine learning model, wherein the updated machine learning model is 

Applicant additionally submits that Emery does not teach that the updated machine learning model may be further configured to optimize the grouping of input videos. Remarks, pp. 9-10.
Claim 1 is rejected over a combination of Riveiro Insua et al. (US 2013/0081082), Kamdar (US 2014/0075463), and Emery et al. (US 9615136).
As presented in the claim rejections under 35 USC § 103, Emery teaches a machine learning model configured to optimize grouping of content (Col. 13, line 62 to col. 14, line 12, “Referring now to FIG. 8, a flow diagram of a method for classifying content items, such as books, videos, songs, albums, etc. is illustrated in accordance with an example of the present technology. … The classification scheme may include classification rules to classify evaluation attributes into classification categories, as has been described. The classification rules may include machine-learned rules, which may be machine-learned based on trusted classification categories for a set of content items. For example, a set of attributes for content items may be hand-curated and classified as a trusted base from which machine learning may learn and apply classification rules.” Col. 16, line 54 to col. 17, line 4).
In view of Emery’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination .

In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “This insight and linkage between a user's interaction with video summaries and the user's inferred interaction and responses to video summaries to be made for videos in a similar group or category….” Remarks, p. 10) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

Claims 1-2, 6, 8-12, 15, 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over a combination of Riveiro Insua et al. (US 2013/0081082) and Kamdar (US 2014/0075463).

Regarding claim 1, Riveiro Insua teaches a method of creating and utilizing video summaries, comprising:
causing at least one processor to analyze an input video, the input video comprising a plurality of frames, to detect a plurality of parameters associated with the input video ([0012], “In another general aspect, the present invention relates to a computer-assisted method for producing a space time summary for one or more original videos. The method includes storing feature vectors each associated with a person, an object, an occasion, an event, an action, or a location in a database; … extracting feature vectors of the at least two elements from the original video by a computer processor; ….” [0036], “the video processing engine (140, 240 in FIGS. 1, 2) extracts feature vectors…” [0038], [0045]);
creating, using the at least one processor, at least one video summary based on the detected plurality of parameters associated with the input video (Abstract, [0015], “The disclosed system and methods provide a summary of a video with short video bits of the key moments with full audio and motion features,….” [0034], “The database 120 can store feature vectors in association 
wherein each of the at least one video summary comprises one or more sequences of frames created based on a subset of video frames from the input video (Abstract, [0015], “The disclosed system and methods provide a summary of a video with short video bits of the key moments with full audio and motion features,….” [0036]);
publishing, using the at least one processor, the at least one video summary ([0031], “FIG. 5A-5C illustrates a user interface configured to display and play video bits as a summary for one or more original videos.” [0054], [0056], “The video bits 510-530 can be played in a window in the user interface.” Figs. 5A-5C),
wherein publishing comprises making the at least one video summary available to be viewed by a user of a user device, wherein the at least one video summary is viewable by the user within a video summary window displayed at the user device ([0054], “FIGS. 5A-5C illustrates a user interface 500 configured to play plurality of video bits in a summary for one or more original videos. Video bits 510-530 produced by steps in FIG. 3 can be automatically displayed in the 
Riveiro Insua does not expressly teach collecting, using the at least one processor, video summary usage information based on the viewing of the at least one video summary by the user of the user device, wherein the video summary usage information comprises information relating to the user's interaction with a particular area of the video summary window during the viewing of the at least one video summary; and feeding, using the at least one processor, the video summary usage information into a machine learning model to update the machine learning model, wherein the updated machine learning model is configured to create improved video summaries of input videos.
Kamdar teaches:
collecting, using at least one processor ([0011]), usage information based on viewing of at least one video by a user of a user device ([0003], “Some embodiments provide techniques that include monitoring user-initiated changes of volume during a television based advertisement. Based at least in part on such changes, a user's interest level in the advertisement may be assessed. Based at least in part on the assessed interest level, a second advertisement may be targeted to the user.” Figs. 2-5),
wherein the usage information comprises information relating to the user's interaction with a particular area of the video summary window during viewing of 
feeding, using the at least one processor, the usage information into a machine learning model to update the machine learning model ([0042], “The machine learning model(s) may be used in some embodiments of the invention, such as when machine learning techniques are used in advertisement targeting. For example, in some embodiments, features of advertisements and features of users, as well as volume-related information, historical advertisement performance information, and other information, may be used as input into a machine learning model. The model may then be used in advertisement selection, optimization, etc.” Fig. 5),
wherein the updated machine learning model is configured to create improved content selections ([0042], “The machine learning model(s) may be used in some embodiments of the invention, such as when machine learning 
In view of Kamdar’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Riveiro Insua to include collecting, using the at least one processor, video summary usage information based on the viewing of the at least one video summary by the user of the user device, wherein the video summary usage information comprises information relating to the user's interaction with a particular area of the video summary window during the viewing of the at least one video summary, and feeding, using the at least one processor, the video summary usage information into a machine learning model to update the machine learning model, wherein the updated machine learning model is configured to create improved video summaries of input videos. By utilizing video summary usage information with a machine learning model, the modification would serve to improve video summaries for users, thereby enhancing the overall user experience.
The combination teaches the limitations specified above; however, the combination does not expressly teach that the updated machine learning model is configured to optimize grouping of input videos.
Emery teaches a machine learning model configured to optimize grouping of content (Col. 13, line 62 to col. 14, line 12, “Referring now to FIG. 8, a flow diagram of a 
In view of Emery’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination such that the updated machine learning model is further configured to optimize grouping of input videos. By optimizing grouping of input videos (e.g. classification), the modification would serve organize input videos. The modification would thereby facilitate input video processing for video summary generation.

Regarding claim 11, Riveiro Insua teaches a non-transitory computer readable medium encoded with codes for directing at least one processor ([0033]). The rejection of claim 1 is similarly applied to the remaining limitations of claim 11.

Regarding claim 20, Riveiro Insua teaches a server device, comprising: a memory; a network interface for communicating over the Internet with one or more user devices; and one or more processors configured to execute operations ([0033], [0035], 

Regarding claims 2 and 12, the combination further teaches further comprising: making a decision regarding an advertisement to present to the user, using the at least one processor, based, at least in part, upon the video summary usage information (Kamdar: [0042], “The machine learning model(s) may be used in some embodiments of the invention, such as when machine learning techniques are used in advertisement targeting. For example, in some embodiments, features of advertisements and features of users, as well as volume-related information, historical advertisement performance information, and other information, may be used as input into a machine learning model. The model may then be used in advertisement selection, optimization, etc.” Fig. 5).

Regarding claims 6 and 15, the combination teaches the limitations specified in claims 2 and 12, and teaches wherein making the decision regarding an advertisement to present to the user is further based, at least in part, upon the video usage information (Kamdar: [0042], “The machine learning model(s) may be used in some embodiments of the invention, such as when machine learning techniques are used in advertisement targeting. For example, in some embodiments, features of advertisements and features of users, as well as volume-related information, historical advertisement performance information, and other information, may be used as input into a machine learning model. The model may then be used in advertisement selection, optimization, etc.” Fig. 5).

Kamdar teaches collecting, using at least one processor, video usage information based on the viewing of video (Kamdar: [0003], “Some embodiments provide techniques that include monitoring user-initiated changes of volume during a television based advertisement. Based at least in part on such changes, a user's interest level in the advertisement may be assessed. Based at least in part on the assessed interest level, a second advertisement may be targeted to the user.” Figs. 2-5).
In view of Kamdar, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination to collecting, using the at least one processor, video usage information based on the viewing of the input video. The modification would serve to further improve advertisement selection for users, thereby enhancing the user experience.

Regarding claims 8 and 17, the combination further teaches
wherein creating at least one video summary comprises creating a plurality of video summaries (Riveiro Insua: [0034], “The video processing engine 140 can automatically generate video summaries for the videos stored in the data storage 130 using information stored in the database 120.”), and
wherein publishing comprises making the plurality of video summaries available to be viewed by the user within the video summary window displayed at the user device (Riveiro Insua: [0054], “FIGS. 5A-5C illustrates a user interface 

Regarding claims 9 and 18, the combination further teaches
wherein creating at least one video summary comprises creating a plurality of video summaries (Riveiro Insua: [0034], “The video processing engine 140 can automatically generate video summaries for the videos stored in the data storage 130 using information stored in the database 120.”), and
wherein publishing comprises publishing a different video summary of the plurality of video summaries to each of a least two different users (Riveiro Insua: [0060], “Detailed views of the video bits can thus be presented in the user interface 500 to viewers.”).

Regarding claims 10 and 19, the combination further teaches wherein the information relating to the user's interaction with a particular area of the video summary window during the viewing of the at least one video summary comprises one or more items from the set consisting of: an area within the video summary window that is clicked (Kamdar: [0033], “A volume indicator 506 is depicted, including a pointer that indicates the current volume level. The volume level is controllable by the user. For .

Claims 4 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over a combination of Riveiro Insua, Kamdar, Emery, and Lee (US 2015/0106842).

Regarding claims 4 and 14, the combination teaches the limitations specified above; however, the combination does not expressly teach that the creation of at least one video summary for a given input video is further based, at least in part on video summaries already created for videos in a same group as the given input video.
Lee teaches creation of at least one content summary for a given input content based, at least in part on a template already created for content in a same group as the given input content ([0047], “In particular, in response to the genre of the content being sport, the content summarization server 200 may extract a summary template of the content from a pre-stored content image according to a rule which corresponds to a sport content using caption information. For example, in response to the genre of the content being the sport of soccer, the content summarization server 200 may analyze 
In view of Lee’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination such that that the creation of at least one video summary for a given input video is further based, at least in part on video summaries already created for videos in a same group as the given input video. By utilizing information relating to previously created video summaries for a group, the modification would serve to facilitate processing for the creation of video summaries of other input videos within the same group.

Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over a combination of Riveiro Insua, Kamdar, Emery, and Kummer et al. (US 2013/0243402).

Regarding clam 5, the combination teaches the limitations specified above; however, the combination does not expressly teach predicting a popularity, using the at least one processor, of the input video based, at least in part, upon the video summary usage information.
Kummer teaches predicting a popularity of content based, at least in part, upon usage information ([0068], “In some embodiments, usage data analysis engine 370 may also gather DVR settings from television receivers. The likely popularity of a television channel may be determined based on how many television receivers have a timer set to record the television channel for one or more times in the future. The greater number of 
In view of Kummer’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination to include predicting a popularity, using the at least one processor, of the input video based, at least in part, upon the video summary usage information. The modification would serve to improve content recommendations for users, and would additionally serve to facilitate selections for advertisement insertion.

Claims 7 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over a combination of Riveiro Insua, Kamdar, Emery, and Cox et al. (US 2008/0059390).

Regarding claims 7 and 16, the combination further teaches wherein the machine learning model comprises an machine learning algorithm that operates on audience video consumption information (Kamdar: [0042], “The machine learning model(s) may be used in some embodiments of the invention, such as when machine learning techniques are used in advertisement targeting. For example, in some embodiments, features of advertisements and features of users, as well as volume-related information, historical advertisement performance information, and other information, may be used as input into a machine learning model. The model may then be used in advertisement selection, optimization, etc.” Fig. 5). However, the combination does not expressly teach that the machine learning algorithm is unsupervised.

In view of Cox’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination such that the machine learning algorithm is unsupervised. The modification would serve to enable a combined system to create improved video summaries substantially autonomously without the need for supervised set-up or retraining (see Cox: [0011]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Ivanyi (US 20040031045) discloses a system for monitoring viewer or user activities in viewing or using a television ([0045]).
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL R TELAN whose telephone number is (571)270-5940. The examiner can normally be reached on 9:30AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on (571) 272-4195. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status 






/MICHAEL R TELAN/Primary Examiner, Art Unit 2426