DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/16/2020 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 3-9, 11-17, 19 and 20 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection.
  	
References:
 	R1: TIAN YONGHONG ET AL: "Video Copy-Detection and Localization with a Scalable Cascading Framework",  IEEE MULTIMEDIA, IEEE SERVICE CENTER, NEW 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 3-9, 11-17, 19 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 1; line 6 recites, “extracting, by the server, a picture corresponding to a predefined scene switching in second video…”. It appears that specification does not have enough support for this limitation. Although, in the remarks, Applicant indicated that support for the amendments can be found in paragraphs [0040] and [0041]. Step 104: Obtain a video label and a time parameter corresponding to the audio fingerprint data when the audio fingerprint library includes the audio fingerprint data” and Para. [0041] discloses, “Step 105: Query a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extract a second picture that is in the first video data and that satisfies the time parameter.” It is not clear, how cited paragraphs provides support for the feature “a predefined scene switching”. Therefore, the above claimed feature raises new matter issues. 
Claim 9; line 8 recites, “extracting a picture corresponding to a predefined scene switching in second video…”. It appears that specification does not have enough support for this limitation. Although, in the remarks, Applicant indicated that support for the amendments can be found in paragraphs [0040] and [0041]. However, Para. [0040] discloses, “Step 104: Obtain a video label and a time parameter corresponding to the audio fingerprint data when the audio fingerprint library includes the audio fingerprint data” and Para. [0041] discloses, “Step 105: Query a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extract a second picture that is in the first video data and that satisfies the time parameter.” It is not clear, how cited paragraphs provides support for the feature “a predefined scene switching”. Therefore, the above claimed feature raises new matter issues. 
Claim 17; line 9 recites, “extracting a picture corresponding to a predefined scene switching in second video…”. It appears that specification does not have enough support for this limitation. Although, in the remarks, Applicant indicated that support for Step 104: Obtain a video label and a time parameter corresponding to the audio fingerprint data when the audio fingerprint library includes the audio fingerprint data” and Para. [0041] discloses, “Step 105: Query a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extract a second picture that is in the first video data and that satisfies the time parameter.” It is not clear, how cited paragraphs provides support for the feature “a predefined scene switching”. Therefore, the above claimed feature raises new matter issues. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-9 and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over R1, in view of Bhagavathy et al. (US 20130054645, hereinafter “Bhaga”) and further Oztaskent et al. (US 20150365722, hereinafter “Ozta”).	Regarding claim 1, R1 discloses,
 	A video detection method, comprising: 
 	“obtaining, by a server, first video data to be detected, and decoding, by the server, the first video data, to obtain audio data of the first video data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors
"WASF Detector" (page 77) and figure 2(b)])”; 
 	“analyzing and identifying, by the server, the audio data, to obtain audio fingerprint data corresponding to the audio data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors
"WASF Detector" (page 77) and figure 2(b)])”; 
([see e.g. subsection "WASF Detector" (page 77), indicating "feature matching", and figure 2(b)])”;
 	“obtaining, by the server, a video label and a time parameter corresponding to the audio fingerprint data (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection paradigm, where the final result is obtained by assembling frame-level similarity search results into video-level matches [see e.g. subsections "Detection Using Frame Fusion" (page 78) and "Localization Using Multiscale Sequence Matching" (page 79)])”;
 	“querying, by the server, a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracting a second picture that is in the first video data and that satisfies the time parameter (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection para-digm,3 where the final result is obtained by assembling frame-level similarity search results into video-level matches, [see e.g. subsection "Detection Using Frame Fusion" (page 78)])”; 
 	“separately extracting, by the server, a first feature parameter of the first picture and a second feature parameter of the second picture (computationally efficient. In our system, a new DCT feature is designed by utilizing the relationship between low-frequency DCT coefficients of adjacent image blocks. It differs from the original DCT feature in that subband energy is used as an alternative to DCT coefficient. Then the invariance of its relative magnitude [see e.g. subsections
"DCT Detector" (page 77) and "DC-SIFT Detector" (page 77)])”; and 
 	“comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library (efficient but relatively simple detectors should be placed in the front, while effective but complex detectors should be located in the rear. In an N-stage cascade of detectors, DN ¼ d1; d2; ... ; dNhi, a query is processed by each detector successively until one determines it as a copy or all determine it as a noncopy, [see e.g. subsections "Detection with a Scalable Cascading Framework" (pages 75-76) and "Complementary Detectors" (pages 76-77) and figure 2(b)]).”
	However, R1 does not explicitly disclose, “extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright, and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library.”
	In a similar field of endeavor, Bhaga discloses, “extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright (The data collector/fingerprint extractor 110 is operative to receive the encoded bitstream from the reference content item. The data collector/fingerprint extractor 110 is further operative to derive, extract, determine, or otherwise obtain characteristic video fingerprint data (such data also referred to herein as "reference fingerprint(s)") from a plurality of video frames (such frames also referred to herein as "reference frame(s)") contained in the encoded bitstream of the reference content item, Paras. [0025]-[0026]), and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library (The data collector/fingerprint extractor 110 is further operative to provide the reference fingerprints, and reference content information including, indexes for the reference frames (such indexes also referred to herein as a/the "reference frame index(es)") from which the reference fingerprints were obtained, and at least one identifier of the reference content item containing the reference frames, for storage in the reference content database 112. For example, each reference frame index may be implemented as a presentation time stamp (such presentation time stamp also referred to herein as a/the " time stamp"), Paras. [0025]-[0026]).”
	Therefore, it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify R1 by specifically providing extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright, and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library, as taught by Bhaga for the purpose of providing systems/methods of identifying media content, such as video content, are disclosed that employ fingerprint matching at the level of video frames (such matching also referred to herein as "frame-level fingerprint matching").

	In a similar field of endeavor, Ozta discloses, “wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data (the media subsystem includes a fingerprint module 324, which computes one or more audio fingerprints for each video program. In some implementations, an audio fingerprint is a small representation of an audio sample, and is relatively unique, Fig. 6 and Para. [0042]) and video labels (the media subsystem 126 includes a fingerprint selection module 328 (which may also be referred to as a video program selection module), which selects specific audio fingerprints and corresponding video programs based on relevance to a user, Para. [0042) and time parameters corresponding to the audio fingerprint data (the timestamp represents a server generated timestamp indicating when the query was received. Some server systems 114 include one or more servers 300 that accurately manage timestamps in order to guarantee both accuracy of the data as well as sequential consistency, Para. [0062]); and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library (For a video program that includes multiple episodes (e.g., a TV series), the matching module 326 identifies theme music or jingles by comparing and matching audio fingerprints from multiple episodes. This matching process thus identifies audio portions that uniquely identify the video program, Paras. [0066]-[0070]).”
	Therefore, it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify the combination of R1 and Bhaga by specifically providing wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data and video labels and time parameters corresponding to the audio fingerprint data; and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library, as taught by Ozta for the purpose of providing technique that locally detect what video programs a user is watching, and provide context-aware information to the user based on knowledge of those programs. 	
 	Regarding claim 5, the combination of R1, Bhaga and Ozta discloses everything claimed as applied above (see claim 1), further R1 discloses,
 	“wherein the analyzing and identifying, by the server, the audio data, to obtain audio fingerprint data corresponding to the audio data comprises: extracting, by the server, a feature parameter of the audio data, and obtaining, based on the feature parameter, the audio fingerprint data corresponding to the audio data (A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors. For reference videos, all the WASF descriptors are indexed by locality sensitive hashing (LSH) for efficient feature matching, where the indexing tables are generated using 16 spherical hashing functions, [see e.g. subsection "WASF Detector" (page 77)]).”
Regarding claim 6, the combination of R1, Bhaga and Ozta discloses everything claimed as applied above (see claim 1), further R1 discloses,
 	“wherein the separately extracting, by the server, a first feature parameter of the first picture and a second feature parameter of the second picture comprises: separately extracting, by the server, the first feature parameter of the first picture and the second feature parameter of the second picture according to at least one of the following manners: a scale-invariant feature transform SIFT manner and a histogram of oriented gradient HOG manner (DC-SIFT Detector. In our system, DC-SIFT is adopted to cope with content-altering visual transformations (such as camcording, PiP, and postproduction). We use it to replace the SIFT and speeded up robust feature (SURF) in our TRECVID-CBCD 2010 system, which can obtain high detection, [see e.g. subsection "DC-SIFT Detector" (page 77)]).”
 	Regarding claim 7, the combination of R1, Bhaga and Ozta discloses everything claimed as applied above (see claim 1), further R1 discloses,
 	“wherein the comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library comprises: calculating, by the server, similarity between the first feature parameter and the second feature parameter; and determining that the first video data is consistent with the video in the video copyright library when the similarity reaches a preset threshold ([see e.g. subsection "Detection with a Scalable Cascading Framework" (pages 75-76), the last paragraph of which discloses the claimed "preset threshold"]).”
Regarding claim 8, the combination of R1, Bhaga and Ozta discloses everything claimed as applied above (see claim 1), further R1 discloses,
“wherein the audio fingerprint data is quantized data representing a feature parameter of the audio data (A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors. For reference videos, all the WASF descriptors are indexed by locality sensitive hashing (LSH) for efficient feature matching, where the indexing tables are generated using 16 spherical hashing functions, [see e.g. subsection "WASF Detector" (page 77)]).”
Regarding claim 9, R1 discloses,
  	A server, comprising at least one processor and a memory storing a processor-executable instruction, the instruction, when executed by the at least one processor, causing the server to perform the following operations:
 	“obtaining first video data to be detected, and decoding the first video data, to obtain audio data of the first video data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors
"WASF Detector" (page 77) and figure 2(b)])”; 
 	“analyzing and identifying the audio data, to obtain audio fingerprint data corresponding to the audio data ([To meet the complementarity requirement, three features are used in our implementation, including an audio feature named weighted audio spectrum flatness (WASF),5 a global visual feature based on a discrete cosine transform (DCT), and a local visual feature of dense-color scale-invariant feature transform (DC-SIFT), see e.g. subsections "Complementary Detectors" (pages 76-77) and As an extension of a MPEG-7 descriptor, we use WASF to cope with audio transformations such as MP3 compression and multiband companding. A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors
"WASF Detector" (page 77) and figure 2(b)])”; 
 	“querying, based on the audio data, an audio fingerprint library ([see e.g. subsection "WASF Detector" (page 77), indicating "feature matching", and figure 2(b)])”;
 	“obtaining, by the server, a video label and a time parameter corresponding to the audio fingerprint data (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection para-digm,3 where the final result is obtained by assembling frame-level similarity search results into video-level matches [see e.g. subsections "Detection Using Frame Fusion" (page 78) and "Localization Using Multiscale Sequence Matching" (page 79)])”;
 	“querying a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracting a second picture that is in the first video data and that satisfies the time parameter (To maintain robustness to various temporal transformations, all detectors in our system follow the frame-based copy detection para-digm,3 where the final result is obtained by assembling frame-level similarity search results into video-level matches, [see e.g. subsection "Detection Using Frame Fusion" (page 78)])”; 
 	“separately extracting a first feature parameter of the first picture and a second feature parameter of the second picture (computationally efficient. In our system, a new DCT feature is designed by utilizing the relationship between low-frequency DCT coefficients of adjacent image blocks. It differs from the original DCT feature in that subband energy is used as an alternative to DCT coefficient. Then the invariance of its relative magnitude [see e.g. subsections
"DCT Detector" (page 77) and "DC-SIFT Detector" (page 77)])”; and 
 	“comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library (efficient but relatively simple detectors should be placed in the front, while effective but complex detectors should be located in the rear. In an N-stage cascade of detectors, DN ¼ d1; d2; ... ; dNhi, a query is processed by each detector successively until one determines it as a copy or all determine it as a noncopy, [see e.g. subsections "Detection with a Scalable Cascading Framework" (pages 75-76) and "Complementary Detectors" (pages 76-77) and figure 2(b)]).”
	However, R1 does not explicitly disclose, “extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright, and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library.”
	In a similar field of endeavor, Bhaga discloses, “extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright (The data collector/fingerprint extractor 110 is operative to receive the encoded bitstream from the reference content item. The data collector/fingerprint extractor 110 is further operative to derive, extract, determine, or otherwise obtain characteristic video fingerprint data (such data also referred to herein as "reference fingerprint(s)") from a plurality of video frames (such frames also referred to herein as "reference frame(s)") contained in the encoded bitstream of the reference content item, Paras. [0025]-[0026]), and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library (The data collector/fingerprint extractor 110 is further operative to provide the reference fingerprints, and reference content information including, indexes for the reference frames (such indexes also referred to herein as a/the "reference frame index(es)") from which the reference fingerprints were obtained, and at least one identifier of the reference content item containing the reference frames, for storage in the reference content database 112. For example, each reference frame index may be implemented as a presentation time stamp (such presentation time stamp also referred to herein as a/the " time stamp"), Paras. [0025]-[0026]).”
	Therefore, it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify R1 by specifically providing extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright, and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library, as taught by Bhaga for the purpose of providing systems/methods of identifying media content, such as video content, are disclosed that employ fingerprint matching at the level of video frames (such matching also referred to herein as "frame-level fingerprint matching").
	Further, the combination of R1 and Bhaga does not explicitly disclose, “wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data and video labels and time parameters corresponding to the audio fingerprint data; and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library.”
	In a similar field of endeavor, Ozta discloses, “wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data (the media subsystem includes a fingerprint module 324, which computes one or more audio fingerprints for each video program. In some implementations, an audio fingerprint is a small representation of an audio sample, and is relatively unique, Fig. 6 and Para. [0042]) and video labels (the media subsystem 126 includes a fingerprint selection module 328 (which may also be referred to as a video program selection module), which selects specific audio fingerprints and corresponding video programs based on relevance to a user, Para. [0042) and time parameters corresponding to the audio fingerprint data (the timestamp represents a server generated timestamp indicating when the query was received. Some server systems 114 include one or more servers 300 that accurately manage timestamps in order to guarantee both accuracy of the data as well as sequential consistency, Para. [0062]); and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library (For a video program that includes multiple episodes (e.g., a TV series), the matching module 326 identifies theme music or jingles by comparing and matching audio fingerprints from multiple episodes. This matching process thus identifies audio portions that uniquely identify the video program, Paras. [0066]-[0070]).”
	Therefore, it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify the combination of R1 and Bhaga by specifically providing wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data and video labels and time parameters corresponding to the audio fingerprint data; and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library, as taught by Ozta for the purpose of providing technique that locally detect what video programs a user is watching, and provide context-aware information to the user based on knowledge of those programs. 	
Regarding claim 13, the combination of R1, Bhaga and Ozta discloses everything claimed as applied above (see claim 9), further R1 discloses,
 “wherein the server is further configured to perform the following operations: extracting a feature parameter of the audio data, and obtaining, based on the feature parameter, the audio fingerprint data corresponding to the audio data. (A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors. For reference videos, all the WASF descriptors are indexed by locality sensitive hashing (LSH) for efficient feature matching, where the indexing tables are generated using 16 spherical hashing functions, [see e.g. subsection "WASF Detector" (page 77)]).”
Regarding claim 14, the combination of R1, Bhaga and Ozta discloses everything claimed as applied above (see claim 1), further R1 discloses,
 “wherein the server is further configured to perform the following operations: separately extracting the first feature parameter of the first picture and the second feature parameter of the second picture according to at least one of the following manners: a scale-invariant feature transform SIFT manner and a histogram of oriented gradient HOG manner (DC-SIFT Detector. In our system, DC-SIFT is adopted to cope with content-altering visual transformations (such as camcording, PiP, and postproduction). We use it to replace the SIFT and speeded up robust feature (SURF) in our TRECVID-CBCD 2010 system, which can obtain high detection, [see e.g. subsection "DC-SIFT Detector" (page 77)]).”
Regarding claim 15, the combination of R1, Bhaga and Ozta discloses everything claimed as applied above (see claim 1), further R1 discloses,
 “wherein the server is further configured to perform the following operations: calculating similarity between the first feature parameter and the second feature parameter; and determining that the first video data is consistent with the video in the video copyright library when the similarity reaches a preset threshold ([see e.g. subsection "Detection with a Scalable Cascading Framework" (pages 75-76), the last paragraph of which discloses the claimed "preset threshold"]).”
Regarding claim 16, the combination of R1 and Ozta discloses everything claimed as applied above (see claim 1), further R1 discloses,
“wherein the audio fingerprint data is quantized data representing a feature parameter of the audio data (A descriptor with 72D WASF features is extracted from each 6-second audio frame, and we adopt Euclidean distance to measure the dissimilarity between two descriptors. For reference videos, all the WASF descriptors are indexed by locality sensitive hashing (LSH) for efficient feature matching, where the indexing tables are generated using 16 spherical hashing functions, [see e.g. subsection "WASF Detector" (page 77)]).”
Regarding claim 17, R1 discloses,
 A non-transitory storage medium, configured to store one or more computer programs, the computer programs comprising an instruction that can be run by a possessor comprising one or more memories, the instruction, when executed by a computer, causing the computer to perform the following operations
([see e.g. subsections "Complementary Detectors" (pages 76-77) and "WASF Detector" (page 77) and figure 2(b)])”; 
 “analyzing and identifying the audio data, to obtain audio fingerprint data corresponding to the audio data ([see e.g. subsections "Complementary Detectors" (pages 76-77) and "WASF Detector" (page 77) and figure 2(b)])”; 
 “querying, based on the audio data, an audio fingerprint library ([see e.g. subsection "WASF Detector" (page 77), indicating "feature matching", and figure 2(b)])”;
 “obtaining  a video label and a time parameter corresponding to the audio fingerprint data  ([see e.g. subsections "Detection Using Frame Fusion" (page 78) and "Localization Using Multiscale Sequence Matching" (page 79)])”;
 “querying a video copyright library based on the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracting a second picture that is in the first video data and that satisfies the time parameter ([see e.g. subsection "Detection Using Frame Fusion" (page 78)])”; 
 “separately extracting, a first feature parameter of the first picture and a second feature parameter of the second picture ([see e.g. subsections "DCT Detector" (page 77) and "DC-SIFT Detector" (page 77)])”; and 
 “comparing the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library ([see e.g. subsections "Detection with a Scalable Cascading Framework" (pages 75-76) "Complementary Detectors" (pages 76-77) and figure 2(b)]).”
However, R1 does not explicitly disclose, “extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright, and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library.”
	In a similar field of endeavor, Bhaga discloses, “extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright (The data collector/fingerprint extractor 110 is operative to receive the encoded bitstream from the reference content item. The data collector/fingerprint extractor 110 is further operative to derive, extract, determine, or otherwise obtain characteristic video fingerprint data (such data also referred to herein as "reference fingerprint(s)") from a plurality of video frames (such frames also referred to herein as "reference frame(s)") contained in the encoded bitstream of the reference content item, Paras. [0025]-[0026]), and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library (The data collector/fingerprint extractor 110 is further operative to provide the reference fingerprints, and reference content information including, indexes for the reference frames (such indexes also referred to herein as a/the "reference frame index(es)") from which the reference fingerprints were obtained, and at least one identifier of the reference content item containing the reference frames, for storage in the reference content database 112. For example, each reference frame index may be implemented as a presentation time stamp (such presentation time stamp also referred to herein as a/the " time stamp"), Paras. [0025]-[0026]).”
	Therefore, it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify R1 by specifically providing extracting, by the server, a picture corresponding to a predefined scene switching in second video data having copyright, and storing the extracted picture, a time parameter corresponding to the extracted picture, and a video label corresponding to the extracted picture in a video copyright library, as taught by Bhaga for the purpose of providing systems/methods of identifying media content, such as video content, are disclosed that employ fingerprint matching at the level of video frames (such matching also referred to herein as "frame-level fingerprint matching").
Further, the combination of R1 and Bhaga does not explicitly disclose, “wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data and video labels and time parameters corresponding to the audio fingerprint data; and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library.”
In a similar field of endeavor, Ozta discloses, “wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data (the media subsystem includes a fingerprint module 324, which computes one or more audio fingerprints for each video program. In some implementations, an audio fingerprint is a small representation of an audio sample, and is relatively unique, Fig. 6 and Para. [0042]) and video labels (the media subsystem 126 includes a fingerprint selection module 328 (which may also be referred to as a video program selection module), which selects specific audio fingerprints and corresponding video programs based on relevance to a user, Para. [0042) and time parameters corresponding to the audio fingerprint data (the timestamp represents a server generated timestamp indicating when the query was received. Some server systems 114 include one or more servers 300 that accurately manage timestamps in order to guarantee both accuracy of the data as well as sequential consistency, Para. [0062]); and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library (For a video program that includes multiple episodes (e.g., a TV series), the matching module 326 identifies theme music or jingles by comparing and matching audio fingerprints from multiple episodes. This matching process thus identifies audio portions that uniquely identify the video program, Paras. [0066]-[0070]).”
Therefore, it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify the combination of R1 and Bhaga by specifically providing wherein the audio fingerprint library stores audio fingerprint data of copyrighted video data and video labels and time parameters corresponding to the audio fingerprint data; and in accordance with a determination that the audio fingerprint data exists in the audio fingerprint library, as taught by Ozta for the purpose of providing technique that locally detect what video programs a user is watching, and provide context-aware information to the user based on knowledge of those programs. 	


Allowable Subject Matter
Claims 3, 4, 11, 12, 19 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding claims 3 and 4, the following is a statement of reasons for the indication of allowable subject matter:  R1, Bhaga and Ozta, whether taken alone or combination, do not teach or suggest the following novel features,  	“wherein the performing, by the server, picture extraction on second video data having copyright comprises: performing, by the server, scene identification on the second video data, and identifying and filtering a first picture collection representing scene switching in the second video data, to obtain a second picture collection; analyzing and identifying, by the server, a picture in the second picture collection, to obtain edge feature information of the picture in the second picture collection; and extracting, by the server, a picture of which a quantity of the edge feature information reaches a preset threshold”.
Regarding claims 11 and 12, the following is a statement of reasons for the indication of allowable subject matter:  R1, Bhaga and Ozta, whether taken alone or combination, do not teach or suggest the following novel features,  	“wherein the server is further configured to perform the following operations: performing scene identification on the second video data, and identifying and filtering a first picture collection representing the predefined scene switching in the second video data, to obtain a second picture collection; analyzing and identifying a picture in the second picture collection, to obtain edge feature information of the picture in the second picture collection; and extracting a picture of which a quantity of the edge feature information reaches a preset threshold”.
Regarding claims 19 and 20, the following is a statement of reasons for the indication of allowable subject matter:  R1, Bhaga and Ozta, whether taken alone or combination, do not teach or suggest the following novel features,  	“wherein performing picture extraction on the second video data having copyright comprises performing scene identification on the second video data, and identifying and filtering a first picture collection representing the predefined scene switching in the second video data, to obtain a second picture collection; analyzing and identifying a picture in the second picture collection, to obtain edge feature information of the picture in the second picture collection; and extracting a picture of which a quantity of the edge feature information reaches a preset threshold”.

Relevant references
US 20140199050: The present invention relates to systems and methods for efficiently extracting, storing and displaying video streams with object(s) of interest superimposed over static panoramic images.
US 20160286171: the present invention relates to a method and a motion data extraction and vectorization system (MDEVS) extract and vectorize motion data of an object in motion with optimized data storage and data transmission bandwidth.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GOLAM SOROWAR whose telephone number is (571)270-3761.  The examiner can normally be reached on Mon-Fri: 8:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on (571) 272-7904.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.