DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over Hou et al., “Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos” (Hou) and further in view of Singh et al., US 10,057,651 B1 (Singh).
Regarding claim 1, Hou teaches a method for detecting a temporal action of a video (action detection; which is used to detect every occurrence of a given action within a long video, and to localize each detection both in space and time) (p. 5822; Section 1., 1st paragraph), comprising: 
acquiring an initial temporal feature sequence of a video to be detected (obtaining a video that is divided into equal length clips) (p. 5822; Figure 1 and Abstract); 
acquiring, by a pre-trained video-temporal-action detecting module (3D CNN for action proposal generation and action recognition) (p. 5824; Section 3., 2nd paragraph), implicit features and explicit features of a plurality of configured temporal anchor boxes (bounding boxes) (p. 5824; Section 3., 2nd and 3rd paragraphs) based on the initial temporal feature sequence (acquiring features of a plurality of bounding boxes on the initial video, such as sizes, aspect ratios, and positions; while learning the features using the R-CNN) (p. 5824; Section 3., 2nd and 3rd paragraphs); and 
acquiring, by the video-temporal-action detecting module (T-CNN pipeline) (p. 5822; Figure 1 and p. 5825; Section 4., 1st paragraph), the category of the specified action (classifying the linked tube proposal sequences) (p. 5826; Section 4.3, and p. 5829; Figure 6) and the probability that the specified action belongs to the category from the plural temporal anchor boxes (each bounding box is associated with an “actionness” score, which measures the probability that the content in the box corresponds to a valid action) (p. 5825; right column, 1st paragraph) according to the explicit features and the implicit features of the plural temporal anchor boxes (according to the features of the bounding boxes that are learned) (p. 5824; Section 3., 2nd and 3rd paragraphs and p. 5825; Section 4.1).  
Hou teaches taking all the frames of a specified action (linked as a tube) that the sequence of frames would obviously have a start and a finish for that specified action (pages 5825-5826; Sections 4.2 and 4.3 and p. 5829; Figure 6).
However, Hou does not explicitly teach acquiring “the starting position and the ending position of a video clip containing a specified action”.
Singh teaches a system for implementing a video clip service (Abstract); and wherein acquiring the starting position and the ending position of a video clip containing a specified action (more accurately determine the start and end points of a video clip of a relevant event) (col. 4, liens 20-23).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Hou to include detecting the start and end points of the video clip since it allows for automatically determining highly relevant, quality video clips (Singh; col. 4, lines 30-31).

Regarding claim 2, Hou teaches wherein the acquiring an initial temporal feature sequence of a video to be detected (obtaining a video that is divided into equal length clips) (p. 5822; Figure 1 and Abstract) comprises: 
successively extracting a plurality of images from the video (extracting frames from the video) (p. 5822; Figure 1, and p. 5824; Section 3., 1st and 2nd paragraphs) to be detected in an extraction order from front to back (in a temporal order) (p. 5822; Figure 1, and p. 5824; Section 3., 1st and 2nd paragraphs) at a preset sampling period (each plurality of frames extracted is at 8 frames) (p. 5822; Figure 1, and p. 5824; Section 3., 1st and 2nd paragraphs), so as to form an image sequence (forming fixed length, 8 frames, clips; thus generating video clips formed of 8 frames each in sequence) (p. 5822; Figure 1, and p. 5824; Section 3., 1st and 2nd paragraphs); and 
extracting a feature of each image in the image sequence (extracting a spatio-temporal feature cube for an 8-frame video clip) (p. 5825; Section 4.1, 1st paragraph), and splicing the features into the initial temporal feature sequence in the order in the image sequence (linking tube proposals that are a sequence of images for spatio-temporal action localization) (p. 5825-5826; Figures 3 and 4, and Section 4.2).  

Regarding claim 3, Hou teaches wherein the acquiring, by a pre-trained video-temporal-action detecting module (3D CNN for action proposal generation and action recognition) (p. 5824; Section 3., 2nd paragraph), implicit features of a plurality of configured temporal anchor boxes based on the initial temporal feature sequence (acquiring features of a plurality of bounding boxes on the initial video, such as sizes, aspect ratios, and positions; while learning the features using the R-CNN) (p. 5824; Section 3., 2nd and 3rd paragraphs, and p. 5825; Section 4.1, 3rd-5th paragraphs) comprises: 
reducing, by the video-temporal-action detecting module (T-CNN pipeline) (p. 5822; Figure 1 and p. 5825; Section 4., 1st paragraph), the temporal length of the initial temporal feature sequence according to a plurality of preset temporal length reduction ways (the temporal size is reduced to 1 frame) (p. 5825; Section 4.1, 2nd paragraph), so as to obtain a plurality of adjusted reference temporal feature sequences (generating adjusted/refined feature tubes) (p. 5825; Section 4.1); 
configuring a plurality of temporal anchor boxes (anchor bounding boxes) (p. 5825; Section 4.1, 3rd paragraph) with a preset number (such as 9 or 12 boxes) (p. 5825; Section 4.1, 3rd paragraph) and different scales (such as 3 or 5 scales) (p. 5825; Section 4.1, 3rd-5th paragraphs) for temporal positions of each reference temporal feature sequence (apply k-means clustering on the training set bounding boxes to learn 12 anchor boxes) (p. 5825; Section 4.1); and 
configuring the implicit feature of each temporal anchor box as the feature of the corresponding temporal position in the corresponding reference temporal feature sequence (apply k-means clustering on the training set bounding boxes to learn 12 anchor boxes and their position (i.e. clustering centroids)) (p. 5825; Section 4.1).  

Regarding claim 4, Hou teaches wherein the acquiring, by a pre-trained video-temporal-action detecting module (3D CNN for action proposal generation and action recognition) (p. 5824; Section 3., 2nd paragraph), explicit features of a plurality of configured temporal anchor boxes based on the initial temporal feature sequence (acquiring features of a plurality of bounding boxes on the initial video, such as sizes, aspect ratios, and positions) (p. 5824; Section 3., 2nd and 3rd paragraphs and p. 5825; Section 4.1, 3rd paragraph) comprises: 
taking the reference temporal feature sequence with the longest length in the plural reference temporal feature sequences as a sample (selecting the longest tube proposal, by linking the tube proposals to encapsulate the action and at the same time have temporal consistency) (pages 5825-5826; Section 4.2), and 
acquiring the explicit feature of each temporal anchor box by multiplying the sample by a sampling feature constructed for each temporal anchor box (determining the explicit scales and ratios based on multiplying the size for each bounding box) (p. 5825; Section 4.1).  

Regarding claim 5, Hou teaches wherein the acquiring, by the video-temporal-action detecting module (T-CNN pipeline) (p. 5822; Figure 1 and p. 5825; Section 4., 1st paragraph), the category of the specified action (classifying the linked tube proposal sequences) (p. 5826; Section 4.3, and p. 5829; Figure 6) and the probability that the specified action belongs to the category from the plural temporal anchor boxes (each bounding box is associated with an “actionness” score, which measures the probability that the content in the box corresponds to a valid action) (p. 5825; right column, 1st paragraph) according to the explicit features and the implicit features of the plural temporal anchor boxes (according to the features of the bounding boxes that are learned) (p. 5824; Section 3., 2nd and 3rd paragraphs and p. 5825; Section 4.1) comprises: 
splicing, by the video-temporal-action detecting module (T-CNN pipeline) (p. 5822; Figure 1 and p. 5825; Section 4., 1st paragraph), the explicit feature and the implicit feature of each temporal anchor box to obtain a comprehensive feature of each temporal anchor box (linking tube proposals that are a sequence of images for spatio-temporal action localization) (p. 5825-5826; Figures 3 and 4, and Section 4.2); 
detecting each temporal anchor box based on the comprehensive feature thereof (linking tube proposals) (pages 5825-5826; Section 4.2), and screening a target temporal anchor box containing the specified action (determining the target anchor box for the specified action as shown in Figure 6) (pages 5825-5826; Sections 4.2 and 4.3, and p. 5829; Figure 6), the category (classifying the linked tube proposal sequences) (p. 5826; Section 4.3, and p. 5829; Figure 6) of the specified action and the probability that the specified action belongs to the category (each bounding box is associated with an “actionness” score, which measures the probability that the content in the box corresponds to a valid action) (p. 5825; right column, 1st paragraph); and 
detecting boundaries of the target temporal anchor box containing the specified action (detecting the boundaries of the bounding box containing the specified action) (p. 5829; Figure 6) in the initial temporal feature sequence (bounding box proposals generated from con5 feature tube can be used for frame-level action detection by bounding box regression) (p. 5825; right column, 2nd paragraph).  
Hou teaches taking all the frames of a specified action (linked as a tube) that the sequence of frames would obviously have a start and a finish for that specified action (pages 5825-5826; Sections 4.2 and 4.3 and p. 5829; Figure 6).
However, Hou does not explicitly teach acquiring “the starting position and the ending position of a video clip containing a specified action”.
Singh teaches a system for implementing a video clip service (Abstract); and wherein acquiring the starting position and the ending position of a video clip containing a specified action (more accurately determine the start and end points of a video clip of a relevant event) (col. 4, liens 20-23).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Hou to include detecting the start and end points of the video clip since it allows for automatically determining highly relevant, quality video clips (Singh; col. 4, lines 30-31).

Regarding claim 6, Hou teaches wherein after the detecting boundaries of the target temporal anchor box containing the specified action (detecting the boundaries of the bounding box containing the specified action) (p. 5829; Figure 6) in the initial temporal feature sequence (bounding box proposals generated from con5 feature tube can be used for frame-level action detection by bounding box regression) (p. 5825; right column, 2nd paragraph): 
performing logistic regression analysis on the starting position and the ending position of the video clip (bounding box proposals generated from con5 feature tube can be used for frame-level action detection by bounding box regression of the start, first frame, and the ending, last frame of the 8 frames) (p. 5825; right column, 2nd paragraph) corresponding to the target temporal anchor box in conjunction with a plurality of temporal features in the initial temporal feature sequence in front of and behind the boundaries of the target temporal anchor box (bounding box proposals generated from con5 feature tube can be used for frame-level action detection by bounding box regression) (p. 5825; right column, 2nd paragraph).
Hou teaches taking all the frames of a specified action (linked as a tube) that the sequence of frames would obviously have a start and a finish for that specified action (pages 5825-5826; Sections 4.2 and 4.3 and p. 5829; Figure 6).
However, Hou does not explicitly teach acquiring “the starting position and the ending position of a video clip containing a specified action”.
Singh teaches a system for implementing a video clip service (Abstract); and wherein acquiring the starting position and the ending position of a video clip containing a specified action (more accurately determine the start and end points of a video clip of a relevant event) (col. 4, liens 20-23).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Hou to include detecting the start and end points of the video clip since it allows for automatically determining highly relevant, quality video clips (Singh; col. 4, lines 30-31).

Regarding claim 7, see the rejection made to claim 1, as well as prior art Singh for an electronic device (computing device 800) (Fig. 8; col. 19, lines 18-24), comprising: at least one processor (processor 802) (Fig. 8; col. 19, lines 18-24); and a memory communicatively connected with the at least one processor (wherein the data storage device 818 is connected with processing device 802 through bus 830) (Fig. 8): wherein the memory stores instructions executable by the at least one processor (data storage device 818 on which is stored one or more set of instructions 826) ((Fig. 8; col. 19, lines 52-54), and the instructions are executed by the at least one processor (wherein the processing device 802 is configured to execute instructions 826) (Fig. 8; col. 19, lines 37-40), for they teach all the limitations within this claim.
Regarding claim 8, see the rejection made to claim 2, as well as prior art Singh for an electronic device (computing device 800) (Fig. 8; col. 19, lines 18-24), comprising: at least one processor (processor 802) (Fig. 8; col. 19, lines 18-24); and a memory communicatively connected with the at least one processor (wherein the data storage device 818 is connected with processing device 802 through bus 830) (Fig. 8): wherein the memory stores instructions executable by the at least one processor (data storage device 818 on which is stored one or more set of instructions 826) ((Fig. 8; col. 19, lines 52-54), and the instructions are executed by the at least one processor (wherein the processing device 802 is configured to execute instructions 826) (Fig. 8; col. 19, lines 37-40), for they teach all the limitations within this claim.
Regarding claim 9, see the rejection made to claim 3, as well as prior art Singh for an electronic device (computing device 800) (Fig. 8; col. 19, lines 18-24), comprising: at least one processor (processor 802) (Fig. 8; col. 19, lines 18-24); and a memory communicatively connected with the at least one processor (wherein the data storage device 818 is connected with processing device 802 through bus 830) (Fig. 8): wherein the memory stores instructions executable by the at least one processor (data storage device 818 on which is stored one or more set of instructions 826) ((Fig. 8; col. 19, lines 52-54), and the instructions are executed by the at least one processor (wherein the processing device 802 is configured to execute instructions 826) (Fig. 8; col. 19, lines 37-40), for they teach all the limitations within this claim.
Regarding claim 10, see the rejection made to claim 4, as well as prior art Singh for an electronic device (computing device 800) (Fig. 8; col. 19, lines 18-24), comprising: at least one processor (processor 802) (Fig. 8; col. 19, lines 18-24); and a memory communicatively connected with the at least one processor (wherein the data storage device 818 is connected with processing device 802 through bus 830) (Fig. 8): wherein the memory stores instructions executable by the at least one processor (data storage device 818 on which is stored one or more set of instructions 826) ((Fig. 8; col. 19, lines 52-54), and the instructions are executed by the at least one processor (wherein the processing device 802 is configured to execute instructions 826) (Fig. 8; col. 19, lines 37-40), for they teach all the limitations within this claim.
Regarding claim 11, see the rejection made to claim 5, as well as prior art Singh for an electronic device (computing device 800) (Fig. 8; col. 19, lines 18-24), comprising: at least one processor (processor 802) (Fig. 8; col. 19, lines 18-24); and a memory communicatively connected with the at least one processor (wherein the data storage device 818 is connected with processing device 802 through bus 830) (Fig. 8): wherein the memory stores instructions executable by the at least one processor (data storage device 818 on which is stored one or more set of instructions 826) ((Fig. 8; col. 19, lines 52-54), and the instructions are executed by the at least one processor (wherein the processing device 802 is configured to execute instructions 826) (Fig. 8; col. 19, lines 37-40), for they teach all the limitations within this claim.
Regarding claim 12, see the rejection made to claim 6, as well as prior art Singh for an electronic device (computing device 800) (Fig. 8; col. 19, lines 18-24), comprising: at least one processor (processor 802) (Fig. 8; col. 19, lines 18-24); and a memory communicatively connected with the at least one processor (wherein the data storage device 818 is connected with processing device 802 through bus 830) (Fig. 8): wherein the memory stores instructions executable by the at least one processor (data storage device 818 on which is stored one or more set of instructions 826) ((Fig. 8; col. 19, lines 52-54), and the instructions are executed by the at least one processor (wherein the processing device 802 is configured to execute instructions 826) (Fig. 8; col. 19, lines 37-40), for they teach all the limitations within this claim.

Regarding claim 13, see the rejection made to claim 1, as well as prior art Singh for a non-transitory computer readable storage medium with computer instructions stored thereon (a non-transitory computer readable medium with a computer program stored thereon) (col. 20, lines 56-57), wherein the computer instructions are used for causing a computer to perform a method (wherein the computer program causes a general purpose computer to perform the operations herein) (col. 20, lines 51-56), for they teach all the limitations within this claim.
Regarding claim 14, see the rejection made to claim 2, as well as prior art Singh for a non-transitory computer readable storage medium with computer instructions stored thereon (a non-transitory computer readable medium with a computer program stored thereon) (col. 20, lines 56-57), wherein the computer instructions are used for causing a computer to perform a method (wherein the computer program causes a general purpose computer to perform the operations herein) (col. 20, lines 51-56), for they teach all the limitations within this claim.
Regarding claim 15, see the rejection made to claim 3, as well as prior art Singh for a non-transitory computer readable storage medium with computer instructions stored thereon (a non-transitory computer readable medium with a computer program stored thereon) (col. 20, lines 56-57), wherein the computer instructions are used for causing a computer to perform a method (wherein the computer program causes a general purpose computer to perform the operations herein) (col. 20, lines 51-56), for they teach all the limitations within this claim.
Regarding claim 16, see the rejection made to claim 4, as well as prior art Singh for a non-transitory computer readable storage medium with computer instructions stored thereon (a non-transitory computer readable medium with a computer program stored thereon) (col. 20, lines 56-57), wherein the computer instructions are used for causing a computer to perform a method (wherein the computer program causes a general purpose computer to perform the operations herein) (col. 20, lines 51-56), for they teach all the limitations within this claim.
Regarding claim 17, see the rejection made to claim 5, as well as prior art Singh for a non-transitory computer readable storage medium with computer instructions stored thereon (a non-transitory computer readable medium with a computer program stored thereon) (col. 20, lines 56-57), wherein the computer instructions are used for causing a computer to perform a method (wherein the computer program causes a general purpose computer to perform the operations herein) (col. 20, lines 51-56), for they teach all the limitations within this claim.
Regarding claim 18, see the rejection made to claim 6, as well as prior art Singh for a non-transitory computer readable storage medium with computer instructions stored thereon (a non-transitory computer readable medium with a computer program stored thereon) (col. 20, lines 56-57), wherein the computer instructions are used for causing a computer to perform a method (wherein the computer program causes a general purpose computer to perform the operations herein) (col. 20, lines 51-56), for they teach all the limitations within this claim.

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J VANCHY JR whose telephone number is (571)270-1193. The examiner can normally be reached Monday - Friday 9am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached on (571) 270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J VANCHY  JR/Primary Examiner, Art Unit 2666                                                                                                                                                                                                        Michael.Vanchy@uspto.gov