DETAILED ACTION
Response to Amendment
Claims 1-11 and 20-22 are pending. Claims 1-11 and 20 were previously presented. Claims 21-22 are new.
Response to Arguments
Applicant's arguments filed February 15, 2021 have been fully considered but they are not persuasive. 
With respect to the argument on pages 10-12 that Gao and Kasutani select a keyframe from the video to represent the video, where all the groups are of the same content type, i.e. video, therefore they cannot disclose a transition from one content type to another content type, the examiner notes the applicant is giving content type a limited definition. By the applicant’s own specification (see publication US 20170330041 A1), “However, a one-sided sentinel frame sequence may require a larger number of observations to detect the pattern and sentinel frame than the back-to-back pattern due to the statistical uncertainties about whether a black frame adjacent to two dissimilar sequences represents a program break including a sentinel frame sequence or an unmarked scene transition” ([0021]). Therefore, examiner is interpreting different content type as different scenes. For instance, one scene could be violent and thereby contain rated R content, while another could be for children and therefore contain only G content. 
Petersohn and Deng are disputed by the same reasoning above pertaining to the definition of content type, and again examiner asserts that different scene content can be interpreted as different content types.
All other arguments are by similarity or dependency and are addressed by the above.
Examiner applied US 20160337691 A1 to teach the subject matter of new claims 21-22. The following art is also cited as relevant and teaching the limitations in claims 21-22: 
US 20140196085 A1 (“For example, the methods and systems may identify within fingerprints a frame of video content that is associated with a transition (e.g., a black screen) between a first portion of content within the video content and a second portion of content within the video content, and insert an advertisement at or proximate to the identified frame of video content”); US 20090282454 A1 (The advertising identification unit 242 may, for example, identify a transition between program /movie content and advertising content in response to identifying a black video frame occurring within the time windows 610a-c, and/or in response to identifying a threshold amount of change between the content of a present video frame and the content of a previous video frame occurring within the time windows 610a-c (which may be measured at the output of a MPEG decoder which compares video frames to identify changed content)).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

Claims 1-8, 10, 11, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Petersohn (US 20080316307 A1) in view of Gao et al. (US 20110064318 A1) in view of Kasutani (US 20090066838 A1) in view of Deng (US 20130259323 A1).

Regarding claims 1 and 20, Petersohn discloses a method and apparatus one or more computer processors; and a non-transitory computer-readable storage medium comprising instructions ([0149], [0150]) that, when executed, control the one or more computer processors to be configured for: extracting, by a computing device, frame features for a plurality of frames from a video (the color or edge similarity may then be used as a similarity measure, [0069]); identifying, by the computing device, a pattern from a sequence of frames of the plurality of frames, the pattern being identified based on a pattern analysis using the frame features for frames in the sequence of frames (performing a similarity analysis between two frame sequences A and B, [0069], video frames directly before and after the transition, that is, outside of the transition, may be utilized for the similarity analysis, [0072], visually similar frame sequences are designated with the same character, [0101]); clustering, by the computing device, a set of candidate frames into one or more groups based on the frame features for the set of candidate frames (a high value holds the frame sequences together (no scene boundary), and a low value separates the frame sequences into individual scenes, [0077], similarity-based scene segmentation, [0080], shot similarity values are checked to decide whether the current shot can be merged into an existing group or scene, if the shot similarity values are too small and merging is not possible a new scene is declared [0136] [scenes considered groups, frames that can be considered part of that group/scene or a new group/scene considered candidate frames]); selecting, by the computing device, sentinel features for each of the one or more 

Petersohn does not give details on the limitation selecting, by the computing device, sentinel features from the frame features for frames in the sequence of frames based on the pattern, generating a sentinel frame for each of the one or more groups, using the sentinel features for the frames in each of the one or more groups, or outputting, by the computing device, a set of sentinel frames for each of the one or more groups using the sentinel features.

Gao et al. teach selecting, by the computing device, sentinel features from the frame features for frames in the sequence of frames based on the pattern (“Thus, a candidate key frame whose dominant feature or features have higher weight values in the visual theme model will be ranked higher than another candidate key frame whose dominant features have lower 

Petersohn and Gao et al. are in the same art of analyzing key frames (Petersohn, abstract; Gao et al., abstract). The combination of Gao et al. with Petersohn will enable the selecting, by the 

To the extent that Petersohn and Gao et al. do not make totally explicit outputting, by the computing device, a set of sentinel frames for each of the one or more groups using the sentinel features, an additional reference is provided to teach this feature.

Kasutani teaches clustering, by the computing device, a set of candidate frames into one or more groups based on the frame features for the set of candidate frames (S101, S103, Fig. 3, [0100]) and generating a sentinel frame for each of the one or more groups, using the sentinel features for the frames in each of the one or more groups, and outputting, by the computing device, a set of sentinel frames for each of the one or more groups using the sentinel features (display representative images, Fig 3, a representative image selection program and a representative image group selection program, [0019], plurality of key frames is extracted from one video as a combination of representative images, [0052], Fig. 4, identifies a group of key frames of the videos corresponding to the video identifiers output from the video selector 21, from the key frame storage unit, [0100], extracting a plurality of key frames from one video as one representative image combination, [0107], “The representative image combination extractor 62 identifies a group of key frames of the respective videos belonging to the specific th  video is cp (1≤ p ≤ n), the representative image group combination extractor 82 extracts d key frames from each video, i.e., n x d key frames in all from one representative image group combination, [0150], if the evaluation values are calculated using color features of the respective key frames, the representative image group selector 84 selects the representative image group combination in which colors of key frames selected from within the same video are similar and, at the same time, a color difference from the other videos is most emphasized, [0176] [colors/color distance interpreted as sentinel features], see also Fig. 4, key frame group of video picture a - key frame group of video picture d for a set of sentinel frames [key frame group interpreted as set of sentinel frames; videos a-d interpreted as “for each of the one or more groups”]).

Petersohn and Gao et al. and Kasutani are in the same art of analyzing key frames (Petersohn, abstract; Gao et al., abstract; Kasutani, Fig. 4, [0052]). The combination of Kasutani with Petersohn and Gao et al. will enable the use of a set of key frames. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the set described by Kasutani 

To the extent the transition in the video from a first content type to a second content type is not made entirely clear by the previous references, another reference is provided herein.

Deng teaches clustering, by the computing device, a set of candidate frames into one or more groups based on the frame features for the set of candidate frames (image grouping for scene detection, [0036], cluster scenes into scene clusters, [0042]) generating a sentinel frame for each of the one or more groups, using the sentinel features for the frames in each of the one or more groups (representative key image formed or selected from the key frames of the scenes included in the cluster, [0042]) and outputting, by the computing device, a set of sentinel frames for each of the one or more groups using the sentinel features, the set of sentinel frames identifying a transition in the video from a first content type to a second content type, wherein the sentinel frame includes the sentinel features (determines key frame(s) and an image signature of the key frame(s) that are to be representative of the scene, [0030], scene change detector 225 of FIG. 3 employs image signatures and image signature comparison to segment the image frames stored in the image store 220 into a sequence of scenes corresponding to detected changes in the audience area, [0035], scene change detector 225 then segments the image frames 405 into an example sequence of scenes 410 represented by respective example key frames, [0041], compare the key image signatures for the key frames of the scenes determines by the scene change detector, [0042], key frames representing the 

Petersohn and Gao et al. and Kasutani and Deng are in the same art of key frames (Petersohn, abstract; Gao et al., abstract; Kasutani, Fig. 4, [0052]; Deng, [0042]). The combination of Deng with Petersohn and Gao et al. and Kasutani will enable the use of a transition detection. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the transition detection of Deng with the invention of Petersohn and Gao et al. and Kasutani as this was known at the time of filing, the combination would have predictable results, and as Deng indicates “Scene-based people metering, as disclosed herein, employs passive people metering, which can improve measurement accuracy and reduce reliance on audience compliance relative to prior active people metering approaches.  However, unlike prior passive people metering approaches, scene-based people metering as disclosed herein further focuses audience recognition processing on a subset of the captured image frames (e.g., the key frames described below) corresponding to changes in the audience environment being metered, then backtracks or otherwise propagates the results of the recognition processing performed on these particular frames to the other captured image frames.  Accordingly, in at least some examples, scene-based people metering as disclosed herein does not incur the costs associated with prior passive metering techniques that require facial recognition to be performed on each captured image of the audience” ([0016]) and this process can be performed offline ([0031]) thus saving processing resources.

Regarding claim 2, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 1. Petersohn further indicates identifying the pattern comprises: selecting a transitional 

Regarding claim 3, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 2. Petersohn further indicates selecting the sentinel features comprises using a frame in the sequence of frames on both sides of the transitional frame in the sequence of frames to select the sentinel features (video frames directly before and after the transition, that is, outside of the transition, may be utilized for the similarity analysis, or may be postulated that the video frames to be chosen should lie a number x, e.g. x=5, of video frames from the frame sequence boundary [0072]).

Regarding claim 4, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 2. Petersohn and Gao et al. further indicate identifying the pattern comprises: comparing a first frame on a first side of the transitional frame to a second frame on a second side of the transitional frame to determine if the first frame and the second frame are similar within a threshold; and when the first frame and the second frame are similar within the threshold, identifying the sequence of frames as including the pattern (Petersohn, if there is great similarity (in relation to a predetermined threshold value), this implies that both the shots belong to a common scene, [0011], scene boundaries are inserted for all coherence values below a fixed threshold value, [0012], only the coherence value for the DISSOLVE type 

Regarding claim 5, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 2. Petersohn and Gao et al. further indicate grouping sequential transitional frames in the sequence of frames into a group, wherein the at least one frame in the sequence of frames is near the group within a threshold (Petersohn, if there is great similarity (in relation to a predetermined threshold value), this implies that both the shots belong to a common scene, [0011], scene boundaries are inserted for all coherence values below a fixed threshold value, [0012], only the coherence value for the DISSOLVE type transition between the both frame sequences C and D is still below the threshold value and leads to the setting of a scene boundary, which corresponds to the correct relations, [0101], uniting may involve the use of a threshold value in order to identify frame sequence transitions as scene boundaries, [0144]; Gao et al., iteratively merge color clusters in Ds with color distances smaller than a threshold T1, until all the remaining color clusters in Ds are mutually distant from each other according to the threshold, [0037]).

Regarding claim 6, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 1. Petersohn further indicates the transitional frame is a solid color frame (transition frame is a black frame, [0005]).

Regarding claim 7, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 1. Petersohn further indicates the pattern comprises a first sequence of frames and a transitional frame (frame sequences with transitions, abstract, [0005], [0016], [0023]-[0025]).

Regarding claim 8, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 7. Petersohn further indicates locating a second sequence of frames in addition to the first sequence of frames and the transitional frame, wherein: a first frame in the first sequence of frames matches a second frame in the second sequence of frames within a threshold, and the sentinel features are selected from the first frame or the second frame pattern (Petersohn, if there is great similarity (in relation to a predetermined threshold value), this implies that both the shots belong to a common scene, [0011], scene boundaries are inserted for all coherence values below a fixed threshold value, [0012], only the coherence value for the DISSOLVE type transition between the both frame sequences C and D is still below the threshold value and leads to the setting of a scene boundary, which corresponds to the correct relations, [0101], uniting may involve the use of a threshold value in order to identify frame sequence transitions as scene boundaries, [0144]; Gao et al., candidate key frame that is predominantly black, orange, and perhaps brown would be ranked higher than a candidate key frame that is predominantly black with little,or no orange and brown, [0023], iteratively merge color clusters in Ds with color distances smaller than a threshold T1, until all the remaining color clusters in Ds are mutually distant from each other according to the threshold, [0037], feature type having the dominant feature would be the dominant feature type and would weigh most in ranking the key frames, [0044]).

Regarding claim 10, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 1. Petersohn and Gao et al. further indicate receiving an input identifying the pattern (Petersohn, computation of brightness or color histograms for the key-frames, [0134], Gao et al., examples of feature types include color, texture, shapes, and faces among many others, [0021], extracts the common principle color components Dpc for a given a set of images, [0035], in addition to a color feature type, the visual theme model may encompass a texture feature type that includes features such as smooth, rough, [0041], once the visual theme model is generated, each of a set of candidate key frames taken from the video file is distinguished according to similarities shared between that candidate key frame and the visual theme model, [0042], each key frame would be ranked according to similarities shared between that key frame and the weighted feature values of the color feature type, [0043]).

Regarding claim 11, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 1. Gao et al. further indicate the set of sentinel frames are not previously known before selecting the sentinel features (“Thus, a candidate key frame whose dominant feature or features have higher weight values in the visual theme model will be ranked higher than another candidate key frame whose dominant features have lower weight values.  Continuing with the Halloween example from above, a candidate key frame that is predominantly black would be ranked higher than a candidate key frame that is predominantly blue.  A candidate key frame that is predominantly black, orange, and perhaps brown would be ranked higher than a candidate key frame that is predominantly black with little,or no orange and brown or one that is predominantly orange with little or no black”, [0023], particular candidate key frame selected may be selected manually or automatically, if automatic, the candidate key frame from the set having the highest rank may be selected, [0024]).

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Petersohn (US 20080316307 A1) and Geo et al. (US 20110064318 A1) and Kasutani (US 20090066838 A1) and Deng (US 20130259323 A1) as applied to claim 1 above, further in view of Dimitrova et al. (US 6100941 A).

Regarding claim 9, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 1. Petersohn and Gao et al. and Kasutani and Deng do not explicitly disclose the sentinel features comprise features determined from decoding the frames of the video.

Dimitrova et al. teach sentinel features comprise features determined from decoding the frames of the video (Compressed signal Xn is decompressed by a decompression circuit 70, and then decoded by an entropy decoder, col. 5, lines 1-10, filters keyframes for similarity or unicolor, col. 5, lines 45-55, col. 5, line 65 – col. 6, line 10, if input starts off as a compressed signal, it must be decompressed, col. 8, lines 20-40, filter thread 84 uses the frame list buffer and composes a frame key list which lists only the frames which have "key" or important characteristics, col. 14, lines 25-40, signatures of key frames of known commercials are extracted and stored in a database, col. 17, lines 50-60, decompression, col. 20, lines 10-25).

Petersohn and Gao et al. and Dimitrova et al. are in the same art of analyzing key frames (Petersohn, abstract; Gao et al., abstract; Dimitrova et al., col. 14, lines 25-40). The combination of Dimitrova et al. with Petersohn and Gao et al. and Kasutani and Deng will enable the frame decoding. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the decoding described by Dimitrova et al. with the invention of .

Claims 21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Petersohn (US 20080316307 A1) and Gao et al. (US 20110064318 A1) and Kasutani (US 20090066838 A1) and Deng (US 20130259323 A1) as applied to claim 1 above, further in view of Prasad et al. (US 20160337691 A1).

Regarding claim 21, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 1. Petersohn and Gao et al. further indicate the frame features extracted include one or more of: a color layout descriptor, an edge histogram, on-screen textual markings, on-screen logos, and ticker information (Petersohn, computation of brightness or color histograms for the key-frames, [0134], Gao et al., examples of feature types include color, texture, shapes, and faces among many others, [0021], extracts the common principle color components Dpc for a given a set of images, [0035], each key frame would be ranked according to similarities shared between that key frame and the weighted feature values of the color feature type, [0043]), however, to make this more explicit, another reference is provided herein.

Prasad et al. teach frame features extracted include one or more of: a color layout descriptor, an edge histogram, on-screen textual markings, on-screen logos, and ticker information (“In one embodiment, the video feature detecting module identifies advertisement breaks that occur in a broadcast content in a near real-time by analyzing video frames of the broadcast content 

Regarding claim 22, Petersohn and Gao et al. and Kasutani and Deng disclose the method of claim 1. Petersohn and Gao et al. and Kasutani and Deng do not explicitly disclose the first content type is program content and the second content type is advertising content.

Prasad et al. teach the first content type is program content and the second content type is advertising content (“In one embodiment, the video feature detecting module identifies advertisement breaks that occur in a broadcast content in a near real-time by analyzing video frames of the broadcast content for a presence of a sequence of black frames, a scene cut, fades in scenes, advertisement start and end animation frames, a presence or an absence of a channel icon, a shift in a position or a change in a size of the channel icon, a presence of black bands on a top and a bottom, and/or a left and a right of a video frame, size of the black bands, a presence or an absence of text in commercial breaks, a presence or an absence of tickers in commercial breaks, a shift in a position of tickers in advertisements, and/or an advisory. In another embodiment, the video feature detecting module identifies a transition from program content to an advertisement break by analyzing a video frame for black bands on a top and a bottom of the video frame. In yet another embodiment, the video feature detecting module identifies a transition from program content to an advertisement break by analyzing a .

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084.  The examiner can normally be reached on 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661