DETAILED ACTION
This office action is in response to submission of application on 4/10/2014					
Priority
Applicant’s claim for the benefit of a prior-filed application 13041457 (PAT 9189137) filed on 3/7/2011, which further claims benefit of provisional application 61311524 filed on 3/8/2010 is acknowledged and admitted.  

Response to Amendment
In the response filed 12/14/2020, Applicant amends claims 8, 9, 17, 18, and 21-23.  Claim 24 has been added.  Accordingly, claims 8-9 and 17, 18, and 21-24 stand pending.

Response to Arguments
Applicant's arguments filed 12/14/2020 have been fully considered but they are moot in view of new grounds of rejection.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claim(s) 8-9, 17-18, 21-22, and 24 is rejected under 35 U.S.C. 103 as being unpatentable over Schneiderman et al. (US2008/0080743), hereinafter Schneiderman, in view of Casares et al. (“Simplifying Video Editing Using Metadata”), hereinafter Casares.

Regarding Claim 8:
Schneiderman teaches:
A method comprising: obtaining at least one user-captured video footage (Schneiderman, figure 2, [0012], note receiving video data from user);
automatically computing, by a computer processor, at least one image descriptor from the at least one user-captured video footage (Schneiderman, [0012, 0015], note detecting human faces from the video);
using said at least one image descriptor to compute, by a computer processor, visual metadata describing a plurality of clusters of face images detected in the video footage, wherein each of said clusters comprise face images of a common person (Schneiderman, [0012, 0015], note detecting human faces from video and grouping to unique people based on the faces);
automatically selecting, by a computer processor, from said at least one user-captured video footage a sequence of media portions, wherein the selecting results in at least two selected media portions taken from a common video footage wherein a start 
allowing a user to apply modification operations to said sequence of selected media portions, wherein said modification operations comprise selecting, by the user, at least one cluster of face images from said plurality of clusters (Schneiderman, abstract, [0012, 0015, 0048], note the user can select to view only the person/face-specific video segments, e.g. selecting at least one cluster of face images, of the video);
automatically generating, by said computer processor, and responsive to said modification operations, an automatically edited video, by filtering out, using said computer processor, at least one of said media portion that corresponds to a face that is not included in the at least one cluster selected by the user (Schneiderman, abstract, [0012, 0015, 0048], note the user can select to view only the person/face-specific video segments of the video and remove people from the face database).
Schneiderman doesn’t specifically teach:
wherein the automatically generating of the automatically edited video is further carried out by applying to the selected media portions effects and transitions, of which, at least some are determined according to the selected media portions and the computed visual metadata.
Casares is in the same field of endeavor, video editing;

automatically generating, by said computer processor, and responsive to said modification operations, an automatically edited video, by filtering out, using said computer processor, at least one of said media portion that corresponds to a face that is not included in the at least one cluster selected by the user (Casares, page 165 3rd column, note the application could incorporate face detection and automatically filter out video with a selected face.  When combined with the previously cited reference this would be for the selected face and video as taught by Schneiderman).
wherein the automatically generating of the automatically edited video is further carried out by applying to the selected media portions effects and transitions, of which, at least some are determined according to the selected media portions and the computed visual metadata (Casares, page 162, column 2 and 3, note applying special effects between segments based on the selected media portions and visual metadata such as a gap in video; page 165 column 3, note using a special effect for a transition based on visual metadata such as a video gap).
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Casares because this would improve the ease and efficiency of video editing (abstract and introduction).

Regarding Claim 9:
Schneiderman shows the method as disclosed above:
Schneiderman further teaches:


Claim 17 discloses substantially the same limitations as claim 8 respectively, except claim 17 is directed to a system comprising a computer processor, a computer readable medium, and a display device (Schneiderman, [0011, 0029], note processor, storage medium, and monitor display screen), while claim 8 is directed to a method. Therefore claim 17 is rejected under the same rationale set forth for claim 8.

Claim 18 discloses substantially the same limitations as claim 9 respectively, except claim 18 is directed to a system comprising a computer processor, a computer readable medium, and a display device (Schneiderman, [0011, 0029], note processor, storage medium, and monitor display screen), while claim 9 is directed to a method. Therefore claim 18 is rejected under the same rationale set forth for claim 9.

Regarding Claim 21:
Schneiderman teaches:

automatically computing at least one image descriptor from the at least one user- captured video footage (Schneiderman, [0012, 0015], note detecting human faces from the video); 
using said at least one image descriptor to compute visual metadata describing a plurality of clusters of face images detected in the video footage, wherein each of said clusters comprise face images of a common person (Schneiderman, [0012, 0015], note detecting human faces from video and grouping to unique people based on the faces);
automatically selecting from said at least one user-captured video footage a sequence of media portions, wherein the selecting results in at least two selected media portions taken from a common video footage, wherein a start time and an end time of each said selected media portion are determined based on said visual meta data (Schneiderman, [0012, 0015], note detecting human faces from video, note grouping found faces to unique people, note grouping video segments which the individual is present into separate indices and since these segments are only the portions which the person/face is present the start time and the end time are automatically determined based on the visual metadata, e.g. face); 
selecting a set of single representative images, wherein each single representative image corresponds to one of said clusters (Schneiderman, figure 5, [0050], note thumbnail images of person from the clusters); 

allowing the user to select at least one of the set of single representative images (Schneiderman, abstract, [0012, 0015, 0048-0049], note the user can select to view only the person/face-specific video segments, e.g. selecting at least one cluster of face images, of the video); and 
automatically generating, responsive to said user selection automatically edited video that emphasizes the at least one selected cluster over the rest of the clusters (Schneiderman, abstract, [0012, 0015, 0048], note the user can select to view only the person/face-specific video segments of the video).
Schneiderman doesn’t specifically teach:
wherein the automatically generating of the automatically edited video is further carried out by applying to the selected media portions effects and transitions, of which, at least some are determined according to the selected media portions and the computed visual metadata.
Casares is in the same field of endeavor, video editing;
Casares teaches:
automatically generating, responsive to said user selection automatically edited video that emphasizes the at least one selected cluster over the rest of the clusters (Casares, page 165 3rd column, note the application could incorporate face detection and automatically filter out video with a selected face.  When combined with the 
wherein the automatically generating of the automatically edited video is further carried out by applying to the selected media portions effects and transitions, of which, at least some are determined according to the selected media portions and the computed visual metadata (Casares, page 162, column 2 and 3, note applying special effects between segments based on the selected media portions and visual metadata such as a gap in video; page 165 column 3, note using a special effect for a transition based on visual metadata such as a video gap).
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Casares because this would improve the ease and efficiency of video editing (abstract and introduction).

Regarding Claim 22:
Schneiderman teaches:
A method comprising: obtaining at least two user-captured video footages (Schneiderman, figure 2, [0012], note receiving video data from user);
automatically computing at least one image descriptor from the at least two user- captured video footages (Schneiderman, [0012, 0015], note detecting human faces from the video); 
using said at least one image descriptor to compute visual metadata describing a plurality of clusters of face images detected in the video footages, wherein each of said 
automatically selecting from said at least two user-captured video footages a sequence of media portions, wherein the selecting results in at least two selected media portions taken from a common video footage, and wherein a start time and an end time of each said selected media portion are determined based on said visual meta data (Schneiderman, [0012, 0015], note detecting human faces from video, note grouping found faces to unique people, note grouping video segments which the individual is present into separate indices and since these segments are only the portions which the person/face is present the start time and the end time are automatically determined based on the visual metadata, e.g. face); 
allowing a user to apply modification operations to said sequence of selected media portions, wherein said modification operations comprise selecting at least one cluster of face images from said plurality of clusters (Schneiderman, abstract, [0012, 0015, 0048], note the user can select to view only the person/face-specific video segments, e.g. selecting at least one cluster of face images, of the video); and 
automatically generating, responsive to said modification operations, an automatically edited video that emphasizes the at least one selected cluster over the rest of the clusters (Schneiderman, abstract, [0012, 0015, 0048], note the user can select to view only the person/face-specific video segments of the video).
Schneiderman doesn’t specifically teach:

Casares is in the same field of endeavor, video editing;
Casares teaches:
automatically generating, responsive to said modification operations, an automatically edited video that emphasizes the at least one selected cluster over the rest of the clusters (Casares, page 165 3rd column, note the application could incorporate face detection and automatically filter out video with a selected face.  When combined with the previously cited reference this would be for the selected face and video as taught by Schneiderman).
wherein the automatically generating of the automatically edited video is further carried out by applying to the selected media portions effects and transitions, of which, at least some are determined according to the selected media portions and the computed visual metadata (Casares, page 162, column 2 and 3, note applying special effects between segments based on the selected media portions and visual metadata such as a gap in video; page 165 column 3, note using a special effect for a transition based on visual metadata such as a video gap).
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Casares because this would improve the ease and efficiency of video editing (abstract and introduction).

Regarding Claim 24:
Schneiderman teaches:
A method comprising: obtaining at least one user-captured video footage (Schneiderman, figure 2, [0012], note receiving video data from user);
automatically computing, by a computer processor, at least one image descriptor from the at least one user-captured video footage (Schneiderman, [0012, 0015], note detecting human faces from the video);
using said at least one image descriptor to compute, by a computer processor, visual metadata describing a plurality of clusters of face images detected in the video footage, wherein each of said clusters comprise face images of a common person (Schneiderman, [0012, 0015], note detecting human faces from video and grouping to unique people based on the faces);
automatically selecting, by a computer processor, from said at least one user-captured video footage a sequence of media portions, wherein the selecting results in at least two selected media portions taken from a common video footage, wherein a start time and an end time of each said selected media portion are determined based on said visual meta data (Schneiderman, [0012, 0015], note detecting human faces from video, note grouping found faces to unique people, note grouping video segments which the individual is present into separate indices and since these segments are only the portions which the person/face is present the start time and the end time are automatically determined based on the visual metadata, e.g. face);

automatically generating, by said computer processor, and responsive to said modification operations, an automatically edited video, by filtering out, using said computer processor, at least one of said media portion that corresponds to a face that is not included in the at least one cluster selected by the user (Schneiderman, abstract, [0012, 0015, 0048], note the user can select to view only the person/face-specific video segments of the video).
Schneiderman doesn’t specifically teach:
wherein the automatically edited video, the selected media portions are synchronized with a soundtrack added to the automatically edited video.
Casares is in the same field of endeavor, video editing;
Casares teaches:
automatically generating, by said computer processor, and responsive to said modification operations, an automatically edited video, by filtering out, using said computer processor, at least one of said media portion that corresponds to a face that is not included in the at least one cluster selected by the user (Casares, page 165 3rd column, note the application could incorporate face detection and automatically filter out video with a selected face.  When combined with the previously cited reference this would be for the selected face and video as taught by Schneiderman).

It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Casares because this would improve the ease and efficiency of video editing (abstract and introduction).

Claim Rejections - 35 USC § 103
Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Schneiderman in view of Casares and Ayaki (US2007/0159533), hereinafter Ayaki.

Regarding Claim 23:
Schneiderman teaches:
A method comprising: obtaining at least one user-captured video footage (Schneiderman, figure 2, [0012], note receiving video data from user);
automatically computing at least one image descriptor from the at least one user-captured video footage (Schneiderman, [0012, 0015], note detecting human faces from the video); 

automatically selecting from said at least one user-captured video footage a sequence of media portions, wherein the selecting results in at least two selected media portions taken from a common video footage wherein a start time and an end time of each said selected media portion are determined based on said visual meta data (Schneiderman, [0012, 0015], note detecting human faces from video, note grouping found faces to unique people, note grouping video segments which the individual is present into separate indices and since these segments are only the portions which the person/face is present the start time and the end time are automatically determined based on the visual metadata, e.g. face); 
automatically selecting a subset of clusters from said clusters, wherein the subset of clusters is associated with faces that have the largest representation in the video footage (Schneiderman, figure 5, [0050, 0055], note a subset of clusters are selected to be displayed to the user with the ability to sort, when combined with the other cited references they would be sorted so the faces that have the largest representation would be selected to be displayed to the user as taught by Ayaki); 
displaying said subset of clusters to a user (Schneiderman, figure 5, [0050, 0055], note the subset is displayed to the user, when combined with the other cited references they would be sorted so the faces that have the largest representation would be selected to be displayed to the user as taught by Ayaki); 

automatically generating, responsive to said user selection, an automatically edited video that emphasizes the at least one selected cluster over the rest of the clusters (Schneiderman, abstract, [0012, 0015, 0048], note the user can select to view only the person/face-specific video segments of the video).
Schneiderman doesn’t specifically teach:
wherein the subset of clusters is associated with faces that have the largest representation in the video footage;
wherein the automatically generating of the automatically edited video is further carried out by applying to the selected media portions effects and transitions, of which, at least some are determined according to the selected media portions and the computed visual metadata.
Ayaki is in the same field of endeavor, information retrieval;
Ayaki teaches:
 wherein the subset of clusters is associated with faces that have the largest representation in the video footage (Ayaki, abstract, [0013], note determining which faces most frequently appear in a group, e.g. largest representation, when combined with the previously cited reference this would be used for the person/face-specific video segments from the video file that are selected to be displayed to the user as taught by Schneiderman).

Casares is in the same field of endeavor, video editing;
Casares teaches:
automatically generating, responsive to said user selection, an automatically edited video that emphasizes the at least one selected cluster over the rest of the clusters (Casares, page 165 3rd column, note the application could incorporate face detection and automatically filter out video with a selected face.  When combined with the previously cited reference this would be for the selected face and video as taught by Schneiderman).
wherein the automatically generating of the automatically edited video is further carried out by applying to the selected media portions effects and transitions, of which, at least some are determined according to the selected media portions and the computed visual metadata (Casares, page 162, column 2 and 3, note applying special effects between segments based on the selected media portions and visual metadata such as a gap in video; page 165 column 3, note using a special effect for a transition based on visual metadata such as a video gap).
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Casares because this would improve the ease and efficiency of video editing (abstract and introduction).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Singer et al. (US20110142420) teaches transforming and editing video files; Ishizaka (US2010/0026842) teaches face detection; Toyama (US738508) teaches processing video segments based on points of interests; Trivedi (US20060187305) teaches face recognition of persons in videos; 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN J MORRIS whose telephone number is (571)272-3314.  The examiner can normally be reached on M-F 6:30-2:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on 571-270-0474.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JOHN J MORRIS/Examiner, Art Unit 2152                                                                                                                                                                                                        3/11/2021

/NEVEEN ABEL JALIL/Supervisory Patent Examiner, Art Unit 2152