DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

References:
 	R1: TIAN YONGHONG ET AL: "Video Copy-Detection and Localization with a Scalable Cascading Framework",  IEEE MULTIMEDIA, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 20, no. 3, 1 July 2013 (2013-07-01), pages 72-86, XP011525049, ISSN: 1070-986X, DOI: 10.1109/MMUL.2012.62  [retrieved on 2013-08-23].


Allowable Subject Matter
Claims 1, 4-9, 12-17 and 20 are allowed.

			 Statement of Reasons for Allowance
The following is an Examiner’s statement of reasons for allowance:
          
With respect to the allowed independent claim 1:
R1 teaches,
 	A video detection method, comprising: 
 	“obtaining, by a server, first video data to be detected, and decoding, by the server, the first video data, to obtain audio data of the first video data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors
"WASF Detector" (page 77) and figure 2(b)]), analyzing and identifying, by the server, the audio data, to obtain audio fingerprint data corresponding to the audio data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors "WASF Detector" (page 77) and figure 2(b)]), querying, by the server based on the audio data, an audio fingerprint library ([see e.g. subsection "WASF Detector" (page 77), indicating "feature matching", and figure 2(b)]), obtaining, by the server, a video label and a time parameter corresponding to the audio fingerprint data (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection paradigm, where the final result is obtained by assembling frame-level similarity search results into video-level matches [see e.g. subsections "Detection Using Frame Fusion" (page 78) and "Localization Using Multiscale Sequence Matching" (page 79)]), querying, by the server, a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracting a second picture that is in the first video data and that satisfies the time parameter (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection para-digm,3 where the final result is obtained by assembling frame-level similarity search results into video-level matches, [see e.g. subsection "Detection Using Frame Fusion" (page 78)]), separately extracting, by the server, a first feature parameter of the first picture and a second feature parameter of the second picture (computationally efficient. In our system, a new DCT feature is designed by utilizing the relationship between low-frequency DCT coefficients of adjacent image blocks. It differs from the original DCT feature in that subband energy is used as an alternative to DCT coefficient. Then the invariance of its relative magnitude [see e.g. subsections "DCT Detector" (page 77) and "DC-SIFT Detector" (page 77)])”; and comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library (efficient but relatively simple detectors should be placed in the front, while effective but complex detectors should be located in the rear. In an N-stage cascade of detectors, DN ¼ d1; d2; ... ; dNhi, a query is processed by each detector successively until one determines it as a copy or all determine it as a noncopy, [see e.g. subsections "Detection with a Scalable Cascading Framework" (pages 75-76) and "Complementary Detectors" (pages 76-77) and figure 2(b)]).”
Bhagavathy et al. (US 20130054645, hereinafter “Bhaga”), teaches,
“extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright (The data collector/fingerprint extractor 110 is operative to receive the encoded bitstream from the reference content item. The data collector/fingerprint extractor 110 is further operative to derive, extract, determine, or otherwise obtain characteristic video fingerprint data (such data also referred to herein as "reference fingerprint(s)") from a plurality of video frames (such frames also referred to herein as "reference frame(s)") contained in the encoded bitstream of the reference content item, Paras. [0025]-[0026]), and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library (The data collector/fingerprint extractor 110 is further operative to provide the reference fingerprints, and reference content information including, indexes for the reference frames (such indexes also referred to herein as a/the "reference frame index(es)") from which the reference fingerprints were obtained, and at least one identifier of the reference content item containing the reference frames, for storage in the reference content database 112. For example, each reference frame index may be implemented as a presentation time stamp (such presentation time stamp also referred to herein as a/the " time stamp"), Paras. [0025]-[0026]).”
Oztaskent et al. (US 20150365722, hereinafter “Ozta”) teaches, 
“wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data (the media subsystem includes a fingerprint module 324, which computes one or more audio fingerprints for each video program. In some implementations, an audio fingerprint is a small representation of an audio sample, and is relatively unique, Fig. 6 and Para. [0042]) and video labels (the media subsystem 126 includes a fingerprint selection module 328 (which may also be referred to as a video program selection module), which selects specific audio fingerprints and corresponding video programs based on relevance to a user, Para. [0042) and time parameters corresponding to the audio fingerprint data (the timestamp represents a server generated timestamp indicating when the query was received. Some server systems 114 include one or more servers 300 that accurately manage timestamps in order to guarantee both accuracy of the data as well as sequential consistency, Para. [0062]); and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library (For a video program that includes multiple episodes (e.g., a TV series), the matching module 326 identifies theme music or jingles by comparing and matching audio fingerprints from multiple episodes. This matching process thus identifies audio portions that uniquely identify the video program, Paras. [0066]-[0070]).”
However, R1, Bhaga and Ozta, whether taken alone or combination, do not teach or suggest the following novel features:
“A video detection method, comprising extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright, further including: performing, by the server, scene identification on the second video data, and identifying and filtering a first picture collection representing the predefined scene switching in the second video data, to obtain a second picture collection; analyzing and identifying, by the server, a picture in the second picture collection, to obtain edge feature information of the picture in the second picture collection; and extracting, by the server, a picture of which a quantity of the edge feature information reaches a preset threshold”, in combination with all the recited limitations of the claim 1.

With respect to the allowed independent claim 9:
R1 teaches,
	“A server, comprising at least one processor and a memory storing a processor-executable instruction, the instruction, when executed by the at least one processor, causing the server to perform the following operations: obtaining first video data to be detected, and decoding the first video data, to obtain audio data of the first video data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors "WASF Detector" (page 77) and figure 2(b)]), analyzing and identifying the audio data, to obtain audio fingerprint data corresponding to the audio data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors "WASF Detector" (page 77) and figure 2(b)]), querying, based on the audio data, an audio fingerprint library ([see e.g. subsection "WASF Detector" (page 77), indicating "feature matching", and figure 2(b)]), obtaining, by the server, a video label and a time parameter corresponding to the audio fingerprint data (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection para-digm,3 where the final result is obtained by assembling frame-level similarity search results into video-level matches [see e.g. subsections "Detection Using Frame Fusion" (page 78) and "Localization Using Multiscale Sequence Matching" (page 79)]), querying a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracting a second picture that is in the first video data and that satisfies the time parameter (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection para-digm,3 where the final result is obtained by assembling frame-level similarity search results into video-level matches, [see e.g. subsection "Detection Using Frame Fusion" (page 78)]), separately extracting a first feature parameter of the first picture and a second feature parameter of the second picture (computationally efficient. In our system, a new DCT feature is designed by utilizing the relationship between low-frequency DCT coefficients of adjacent image blocks. It differs from the original DCT feature in that subband energy is used as an alternative to DCT coefficient. Then the invariance of its relative magnitude [see e.g. subsections
"DCT Detector" (page 77) and "DC-SIFT Detector" (page 77)]) and  comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library (efficient but relatively simple detectors should be placed in the front, while effective but complex detectors should be located in the rear. In an N-stage cascade of detectors, DN ¼ d1; d2; ... ; dNhi, a query is processed by each detector successively until one determines it as a copy or all determine it as a noncopy, [see e.g. subsections "Detection with a Scalable Cascading Framework" (pages 75-76) and "Complementary Detectors" (pages 76-77) and figure 2(b)]).”
Bhagavathy et al. (US 20130054645, hereinafter “Bhaga”), teaches,
“extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright (The data collector/fingerprint extractor 110 is operative to receive the encoded bitstream from the reference content item. The data collector/fingerprint extractor 110 is further operative to derive, extract, determine, or otherwise obtain characteristic video fingerprint data (such data also referred to herein as "reference fingerprint(s)") from a plurality of video frames (such frames also referred to herein as "reference frame(s)") contained in the encoded bitstream of the reference content item, Paras. [0025]-[0026]), and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library (The data collector/fingerprint extractor 110 is further operative to provide the reference fingerprints, and reference content information including, indexes for the reference frames (such indexes also referred to herein as a/the "reference frame index(es)") from which the reference fingerprints were obtained, and at least one identifier of the reference content item containing the reference frames, for storage in the reference content database 112. For example, each reference frame index may be implemented as a presentation time stamp (such presentation time stamp also referred to herein as a/the " time stamp"), Paras. [0025]-[0026]).”
Oztaskent et al. (US 20150365722, hereinafter “Ozta”) teaches, 
“wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data (the media subsystem includes a fingerprint module 324, which computes one or more audio fingerprints for each video program. In some implementations, an audio fingerprint is a small representation of an audio sample, and is relatively unique, Fig. 6 and Para. [0042]) and video labels (the media subsystem 126 includes a fingerprint selection module 328 (which may also be referred to as a video program selection module), which selects specific audio fingerprints and corresponding video programs based on relevance to a user, Para. [0042) and time parameters corresponding to the audio fingerprint data (the timestamp represents a server generated timestamp indicating when the query was received. Some server systems 114 include one or more servers 300 that accurately manage timestamps in order to guarantee both accuracy of the data as well as sequential consistency, Para. [0062]); and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library (For a video program that includes multiple episodes (e.g., a TV series), the matching module 326 identifies theme music or jingles by comparing and matching audio fingerprints from multiple episodes. This matching process thus identifies audio portions that uniquely identify the video program, Paras. [0066]-[0070]).”	  	However, R1, Bhaga and Ozta, whether taken alone or combination, do not teach or suggest the following novel features:
“A server, comprising at least one processor and a memory storing a processor-executable instruction, the instruction, when executed by the at least one processor, causing the server to perform the following operations extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright, further including: performing, by the server, scene identification on the second video data, and identifying and filtering a first picture collection representing the predefined scene switching in the second video data, to obtain a second picture collection; analyzing and identifying, by the server, a picture in the second picture collection, to obtain edge feature information of the picture in the second picture collection; and extracting, by the server, a picture of which a quantity of the edge feature information reaches a preset threshold”, in combination with all the recited limitations of the claim 9.
	
With respect to the allowed independent claim 17:
R1 teaches,
	“A non-transitory storage medium, configured to store one or more computer programs, the computer programs comprising an instruction that can be run by a possessor comprising one or more memories, the instruction, when executed by a computer, causing the computer to perform the following operations: obtaining first video data to be detected, and decoding the first video data, to obtain audio data of the first video data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors "WASF Detector" (page 77) and figure 2(b)]), analyzing and identifying the audio data, to obtain audio fingerprint data corresponding to the audio data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors "WASF Detector" (page 77) and figure 2(b)]), querying, based on the audio data, an audio fingerprint library ([see e.g. subsection "WASF Detector" (page 77), indicating "feature matching", and figure 2(b)]), obtaining, by the server, a video label and a time parameter corresponding to the audio fingerprint data (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection para-digm,3 where the final result is obtained by assembling frame-level similarity search results into video-level matches [see e.g. subsections "Detection Using Frame Fusion" (page 78) and "Localization Using Multiscale Sequence Matching" (page 79)]), querying a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracting a second picture that is in the first video data and that satisfies the time parameter (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection para-digm,3 where the final result is obtained by assembling frame-level similarity search results into video-level matches, [see e.g. subsection "Detection Using Frame Fusion" (page 78)]), separately extracting a first feature parameter of the first picture and a second feature parameter of the second picture (computationally efficient. In our system, a new DCT feature is designed by utilizing the relationship between low-frequency DCT coefficients of adjacent image blocks. It differs from the original DCT feature in that subband energy is used as an alternative to DCT coefficient. Then the invariance of its relative magnitude [see e.g. subsections
"DCT Detector" (page 77) and "DC-SIFT Detector" (page 77)]) and  comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library (efficient but relatively simple detectors should be placed in the front, while effective but complex detectors should be located in the rear. In an N-stage cascade of detectors, DN ¼ d1; d2; ... ; dNhi, a query is processed by each detector successively until one determines it as a copy or all determine it as a noncopy, [see e.g. subsections "Detection with a Scalable Cascading Framework" (pages 75-76) and "Complementary Detectors" (pages 76-77) and figure 2(b)]).”
 	Bhagavathy et al. (US 20130054645, hereinafter “Bhaga”), teaches,
“extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright (The data collector/fingerprint extractor 110 is operative to receive the encoded bitstream from the reference content item. The data collector/fingerprint extractor 110 is further operative to derive, extract, determine, or otherwise obtain characteristic video fingerprint data (such data also referred to herein as "reference fingerprint(s)") from a plurality of video frames (such frames also referred to herein as "reference frame(s)") contained in the encoded bitstream of the reference content item, Paras. [0025]-[0026]), and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library (The data collector/fingerprint extractor 110 is further operative to provide the reference fingerprints, and reference content information including, indexes for the reference frames (such indexes also referred to herein as a/the "reference frame index(es)") from which the reference fingerprints were obtained, and at least one identifier of the reference content item containing the reference frames, for storage in the reference content database 112. For example, each reference frame index may be implemented as a presentation time stamp (such presentation time stamp also referred to herein as a/the " time stamp"), Paras. [0025]-[0026]).”
Oztaskent et al. (US 20150365722, hereinafter “Ozta”) teaches, 
(the media subsystem includes a fingerprint module 324, which computes one or more audio fingerprints for each video program. In some implementations, an audio fingerprint is a small representation of an audio sample, and is relatively unique, Fig. 6 and Para. [0042]) and video labels (the media subsystem 126 includes a fingerprint selection module 328 (which may also be referred to as a video program selection module), which selects specific audio fingerprints and corresponding video programs based on relevance to a user, Para. [0042) and time parameters corresponding to the audio fingerprint data (the timestamp represents a server generated timestamp indicating when the query was received. Some server systems 114 include one or more servers 300 that accurately manage timestamps in order to guarantee both accuracy of the data as well as sequential consistency, Para. [0062]); and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library (For a video program that includes multiple episodes (e.g., a TV series), the matching module 326 identifies theme music or jingles by comparing and matching audio fingerprints from multiple episodes. This matching process thus identifies audio portions that uniquely identify the video program, Paras. [0066]-[0070]).”	  	However, R1, Bhaga and Ozta, whether taken alone or combination, do not teach or suggest the following novel features:
“A non-transitory storage medium, configured to store one or more computer programs, the computer programs comprising an instruction that can be run by a possessor comprising one or more memories, the instruction, when executed by a computer, causing the computer to perform the following operations: performing, by the server, scene identification on the second video data, and identifying and filtering a first picture collection representing the predefined scene switching in the second video data, to obtain a second picture collection; analyzing and identifying, by the server, a picture in the second picture collection, to obtain edge feature information of the picture in the second picture collection; and extracting, by the server, a picture of which a quantity of the edge feature information reaches a preset threshold”, in combination with all the recited limitations of the claim 17.
Any comments considered necessary by Applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion                                                                                      	Any inquiry concerning this communication or earlier communications from the examiner should be directed to GOLAM SOROWAR whose telephone number is (571)270-3761.  The examiner can normally be reached on Mon-Fri: 8:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/GOLAM SOROWAR/Primary Examiner, Art Unit 2641