DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s response, filed 16 June 2021, to the last office action has been entered and made of record. 
In response to the amendments to the specification and claims, they are acknowledged, supported by the original disclosure, and no new matter is added.
Amendments to the independent claims 1, 9, and 17 have necessitated a new ground of rejection over the applied prior art. Please see below for the updated interpretations and rejections.
In response to the addition of new claim 26, they are acknowledged and made of record.

Response to Arguments
Applicant’s arguments with respect to claims 1, 9, and 17 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory 
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 5-7, 22-23, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Sang et al. (“Rolling and Non-Rolling Subtitle Detection with Temporal and Spatial Analysis for News Video”), herein Sang, in view of Hirayama et al. (US 2010/0328529), herein Hirayama, Yusufu et al. (“A Video Text Detection and Tracking System”), herein Yusufu, Agnihotri et al. (US 6,731,788), herein Agnihotri, Sun et al. (“Robust Text Detection in Natural Scene Images by Generalized Color enhanced Contrasting Extremal Region and Neural Networks”), herein Sun, and Sun et al. (“A robust approach for text detection from natural scene images”, published 16 April 2015), herein Sun2015. 
Regarding claim 1, Sang discloses a subtitle extraction method, comprising: 
obtain video frames (see Sang sect. IV. The Proposed Algorithm, where a news video frame is obtained); 
performing adjacency operation on pixels in the video frames to obtain adjacency regions in the video frames (see Sang sect. IV. C. Location of Non-Rolling Subtitle Region in Single Video Frame and Fig. 3(e), where subtitle regions edges are connected as a whole region).
Sang does not explicitly disclose that the video frames are obtained by decoding a video; determining certain video frames including a same subtitle based on the adjacency regions in the video frames, wherein the step of determining includes obtaining a difference value between pixels of the adjacency regions; and upon determining the difference value between the pixels of the adjacency regions is less than a difference threshold, determining the certain video frames include the same subtitle.
(see Hirayama Abstract), where the video frames are obtained from decoded video data stream (see Hirayama [0027]), and that previous frames and current frames are checked to see if a subtitle part has changed by taking the sum of luminance level difference of corresponding pixels between the two neighboring frames in a small pixel block (see Hirayama [0039]), where the sum of luminance level difference is smaller than predetermined threshold, no change in the subtitle is determined, suggesting that the luminance levels of the corresponding pixels are the same / match (see Hirayama [0039]). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Hirayama’s teachings to the subtitle detection algorithm of Sang, such that the video frames are obtained from decoding a video data stream, and subtitle regions determined as taught by Sang in a previous and current frames are checked to see if the subtitle part has changed by taking the sum of luminance level difference of corresponding pixels between the two neighboring frames in a small pixel block of the subtitle regions. This modification is rationalized as use of a known technique to improve similar methods in the same way. In this instance, Sang teaches a base method for subtitle detection; Hirayama teaches in a comparable subtitle detection method known techniques of obtaining video frames from decoding video data streams and for determining if the subtitle of the image frames has changed or not by taking the sum of luminance level difference of corresponding pixels between the two neighboring frames; and one of ordinary skill in the art could have applied Hirayama’s technique to the teachings of Sang in the same way, predictably resulting in obtaining video frames by decoding video data streams and checking if the subtitle region detected by Sang has changed by taking the sum of luminance level difference of corresponding pixels between the two neighboring frames.
Sang and Hirayama do not explicitly disclose extracting feature points from the adjacency regions via a scale-invariant feature transform (SIFT) algorithm; and that upon determining the feature 
Yusufu teaches in a related and pertinent video text detection and tracking system (see Yusufu Abstract), where SIFT features are suggested to be detected in detected text regions of digital video to perform video text tracking (see Yusufu sect. III. Video Text Tracking and sect. III. A. Feature Selection; see also Yusufu sect. IV. B. Performance Evaluation of Text Tracking), for performing text tracking for static texts, text disappearing frames are determined based on the change of feature numbers in the text regions, where feature point numbers in corresponding regions in neighboring frames change drastically, the current frame can be determined as a text disappearing frame (see Yusufu sect. III. B. Text Tracking; see also Yusufu sect. IV. B. Performance Evaluation of Text Tracking)
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Yusufu’s teachings to the subtitle detection algorithm of Sang and Hirayama, such that the detected subtitle regions are also tracked using SIFT features to determine text disappearing frames and subtitle regions in a previous and current frames are checked to see if the subtitle part has changed by taking the sum of luminance level difference of corresponding pixels between the two neighboring frames in a small pixel block of the subtitle regions and comparing if the number of SIFT features match. This modification is rationalized as an application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Sang and Hirayama teaches a base method for subtitle detection and determining if the subtitle of the image frames has changed or not by taking the sum of luminance level difference of corresponding pixels between the two neighboring frames. Yusufu teaches a known technique for text tracking in video images, where SIFT features are suggested to be detected in detected text regions, and text disappearing frames are determined based on the change of feature numbers in the text regions, where feature point numbers in corresponding regions in neighboring frames change drastically, the current frame can be determined as a text disappearing frame. One of 
Sang, Hirayama, and Yusufu do not explicitly disclose after determining the certain video frames include the same subtitle, superimposing subtitle regions of the certain video frames including the same subtitle and averaging the subtitle regions as superimposed to form a new subtitle region, wherein the subtitle regions are averaged to obtain a mean value as the new subtitle region. 
Agnihotri teaches in a related and pertinent method for classifying symbols such as text in video streams (see Agnihotri Abstract), where in a text extraction and recognition operation, a multiple of subsequent frames containing the same text regions can be integrated together and averaged to make the text regions clearer and cause the text to be better set off against the background, where the complexity of a background in a moving image is reduced (see Agnihotri col. 6, ln. 55 – col. 7, ln . 20).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Agnihotri’s teachings to the subtitle text detection algorithm of Sang, Hirayama, and Yusufu, such that the detected subtitle regions that are determined to be not changing, equivalent to determining that the subtitle regions are the same in subsequent frames, are integrated and averaged to make the text regions clearer and reduce the complexity of the background of the moving images. This modification is rationalized as an application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Sang, Hirayama, and Yusufu teach a base method for detecting a change in extracted subtitle regions. Agnihotri teaches a known technique for a signal averaging procedure where subsequent frames containing the same text regions can be integrated together and averaged to make the text regions clearer and cause the text to be better set off against the background. One of ordinary skill in the art would have recognized that by applying Agnihotri’s 
Although Sang describes performing connected component analysis (see Sang sect. IV. C. Location of Non-Rolling Subtitle Region in Single Video Frame), Sang, Hirayama, Yusufu, and Agnihotri  do not explicitly disclose constructing a component tree for at least two channels of the new subtitle region in the certain video frames, and using the constructed component tree to extract contrasting extremal regions respectively corresponding to the at least two channels; performing color enhancement processing on the contrasting extremal regions of the at least two channels, to form color-enhanced contrasting extremal regions; and extracting the same subtitle by merging the color-enhanced contrasting extremal regions of the at least two channels.
Sun teaches in a related and pertinent method of text detection from natural scenes using color-enhanced contrasting extremal regions (see Sun Abstract), where component trees are built for hue and saturation channel images in the PII color space (see Sun sect. II. A. Overview), where the component trees are pruned and the contrasting extremal region (CER) criterion is used to extract CERs from remaining extremal regions on each component-tree (see Sun Sect. II. A. Overview), where the (see Sun sect. II. C. Generalized Color-enhanced CER and Fig. 5), candidate text line are formed and the results from all six component trees are combined (see Sun sect. F. Candidate Text Line Formation – sect. H. Post-processing). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Sun’s  teachings to the subtitle detection algorithm of Sang, Hirayama, Yusufu, and Agnihotri, such that the text in the extracted averaged subtitle regions is detected using Sun’s generalized color-enhance CER based text detection technique. This modification is rationalized as application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Sang, Hirayama, Yusufu, and Agnihotri teach a base method for subtitle detection. Sun teaches a known technique of performing robust text detection in images based on generalized color-enhance CER. One of ordinary skill in the art would have recognized that by applying Sun’s technique to the teachings of Sang, Hirayama, Yusufu, and Agnihotri would have predictably resulting performing the text detection technique based on generalized color-enhance CERs upon the detected subtitled regions of Sang, Hirayama, Yusufu, and Agnihotri.
Although Sun teaches that component trees are built for hue and saturation channel images in the PII color space (see Sun sect. II. A. Overview); Sang, Hirayama, Yusufu, Agnihotri, and Sun do not explicitly disclose wherein the component tree includes a series of nodes nested in order from bottom to top, and an area-change-rate RΔS between a node (N, i) and an ancestor node (n, i+Δ) is represented by formula                         
                            
                                
                                    R
                                
                                
                                    Δ
                                    S
                                
                            
                            (
                            
                                
                                    n
                                
                                
                                    i
                                
                            
                            ,
                            
                                
                                    n
                                
                                
                                    i
                                    +
                                    Δ
                                
                            
                            )
                            =
                            
                                
                                    
                                        
                                            S
                                            n
                                        
                                        
                                            i
                                            +
                                            Δ
                                        
                                    
                                    -
                                    
                                        
                                            S
                                            n
                                        
                                        
                                            i
                                        
                                    
                                
                                
                                    
                                        
                                            S
                                            n
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , Sni represents an area of the node (N, i), Sni+Δ represents an area of the ancestor node (N, i), and the contrasting extremal regions are extracted according to the area-change-rate RΔS.
Sun2015 teaches in a related and pertinent method of text detection based on color-enhanced contrasting extremal regions and neural network (see Sun2015 Abstract), where a component tree is (see Sun2015 sect. 3.3.1. Algorithm overview). Sun2015 further teaches that the max-tree representation of an exemplified image (see Sun2015 Fig. 7) is composed of a sequence of nested ER nodes, corresponding to a character “s” circled in the image shown in Fig. 7a, which are denoted as ni, ni+1, …, ni+Δ, the area of the nested ER nodes are represented as Sni, Sni+1, …, Sni+Δ, and the area variation between node ni and its ancestor ni+Δ can be calculated as                         
                            
                                
                                    R
                                
                                
                                    Δ
                                    S
                                
                            
                            (
                            
                                
                                    n
                                
                                
                                    i
                                
                            
                            ,
                            
                                
                                    n
                                
                                
                                    i
                                    +
                                    Δ
                                
                            
                            )
                            =
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    i
                                                    +
                                                    Δ
                                                
                                            
                                        
                                    
                                    -
                                    
                                        
                                            S
                                        
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            S
                                        
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                        
                     , where RΔS(ni, ni+Δ) measure the contrast of ni against its surrounding background in the image, and if                         
                            
                                
                                    R
                                
                                
                                    Δ
                                    S
                                
                            
                            (
                            
                                
                                    n
                                
                                
                                    i
                                
                            
                            ,
                            
                                
                                    n
                                
                                
                                    i
                                    +
                                    Δ
                                
                            
                            )
                            <
                            
                                
                                    T
                                
                                
                                    
                                        
                                            R
                                        
                                        
                                            Δ
                                            S
                                        
                                    
                                
                            
                        
                    , TRΔS being a constant, the node ni is considered to be a CER (see Sun2015 sect. 3.3.1. Algorithm overview). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Sun2015 in building the component trees for extracting the CER regions as suggested in the combined teachings of Sang, Hirayama, Yusufu, Agnihotri, and Sun, which suggests performing text detection based on generalized color-enhance CERs upon the detected subtitled regions. This modification is rationalized as use of a known technique to improve similar methods in the same way. In this instance, Sang, Hirayama, Yusufu, Agnihotri, and Sun teach a base method for performing text detection based on generalized color-enhance CERs upon the detected subtitled regions, in which component trees are built for hue and saturation channel images in the PII color space. Sun2015 teaches in a comparable method of text detection based on color-enhanced contrasting extremal regions, component trees are built as max-tree types, which are comprised of sequences of nested extremal region nodes and the area variation between a node ni and its ancestor ni+Δ is calculated as                         
                            
                                
                                    R
                                
                                
                                    Δ
                                    S
                                
                            
                            (
                            
                                
                                    n
                                
                                
                                    i
                                
                            
                            ,
                            
                                
                                    n
                                
                                
                                    i
                                    +
                                    Δ
                                
                            
                            )
                            =
                            
                                
                                    
                                        
                                            S
                                        
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    i
                                                    +
                                                    Δ
                                                
                                            
                                        
                                    
                                    -
                                    
                                        
                                            S
                                        
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            S
                                        
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                        
                     , which is compared to a threshold constant, TRΔS , for determining if the node ni is considered to be a contrasting extremal region. One of ordinary skill in the art could have applied 

Regarding claim 5, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 disclose the method according to claim 1, wherein determining the adjacency regions in the certain video frames including the same subtitle includes: 
in each of the video frames including the same subtitle, determining occurrence times of different distribution positions of an edge region of an adjacency region, respectively, and determining that a region formed by a distribution position having the most occurrence times as a subtitle region (see Sang sect. IV. C. Location of Non-Rolling Subtitle Region in Single Video Frame and Fig. 3(d) and 3(e), where the remaining edges after suppressing non-subtitle edge pixels are connected as a subtitle region; see Hirayama [0039], where the differences between the determined subtitle regions is used to determine if the subtitle has changed).

Regarding claim 6, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 disclose the method according to claim 1, wherein the constructing the component tree for the at least two channels of the new subtitle region, and using the constructed component tree to extract the contrasting extremal region corresponding to the at least two channels includes: 
for a subtitle region of each of the video frames, constructing the component tree formed by nested nodes from a perception-based Illumination Invariant PII tone channel and a PII saturation (see Sun sect. II. A. Overview, where a component trees are built for hue and saturation channel images in the PII color space), wherein nodes of the component tree correspond to characters of the subtitle region (see above combination of Sang, Hirayama, and Sun, where Sun’s text detection technique is applied to the detected subtitle region of Sang and Hirayama); and upon determining an area-change-rate of a node relative to a neighboring node is less than an area- change-rate threshold, determining that the node belongs to the contrasting extremal regions of the at least two channels (see Sun sect. E. Pruning Repeating Components, where ratios of the area of nodes and its children are pruned if it is larger than a threshold).

Regarding claim 7, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 disclose the method according to claim 1, wherein performing color enhancement processing on the contrasting extremal regions of the at least two channels, to form the color-enhanced contrasting extremal region includes: 
determining a main color each of the contrasting extremal regions of the at least two channels (see Sun sect. II. C. Generalized Color-enhanced CER and Fig. 5, where the dominant colors of remaining pixels in the CER are estimated and used to compose a generalized color enhanced CER); and 
from the contrast degree extremal regions of the at least two channels, extracting pixels whose similarity degree with the main color satisfy preset conditions, and forming the color-enhanced contrasting extremal regions of the at least two channels based on the extracted pixels (see Sun sect. II. C. Generalized Color-enhanced CER and Fig. 5, where pixels with similar color to the dominant color are extracted to compose the enhanced CER).

(see Yusufu sect. III. A. Feature Selection, where SIFT features are described to be feature invariant image transformation or distortion and shows best performance at rotation, scale changes and affine transformations).

Regarding claim 23, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 disclose the method according to claim 1, wherein extracting feature points from the adjacency regions via the scale-invariant feature transform (SIFT) algorithm includes: extracting feature points from the adjacency regions, the feature points being independent of illumination or affine transformation or noise (see Yusufu sect. III. A. Feature Selection, where SIFT features are described to be feature invariant image transformation or distortion and shows best performance at rotation, scale changes and affine transformations).

Regarding claim 26, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 disclose the method according to claim 1, wherein the color enhancement is performed by:
sorting pixels included in the contrasting extremal regions according to corresponding grayscale values (see Sun sect. II. A. Overview, where a component tree is built from grayscale image channel images and its inverted image; see Sun sect. II. C. Generalized Color-enhanced CER, where generalized color-enhanced CER is proposed which involves estimating the dominant color in a CER by sorting the pixels in descending order according to their pixel value in the corresponding color channel);
(see Sun sect. II. C. Generalized Color-enhanced CER, where generalized color-enhanced CER is proposed which involves extracting pixels with similar color to a dominant color to compose an enhanced CER, which the top 50% of the sorted pixels according to their pixel values are used to calculate a color average to determine the dominant color and if the color distance between a pixel and the dominant color is less than a threshold, the pixel is considered to have a similar color to the dominant color); and
forming the color-enhanced contrasting extremal regions according to the set of sorted pixels (see Sun sect. II. C. Generalized Color-enhanced CER, where generalized color-enhanced CER is proposed which involves extracting pixels with similar color to a dominant color to compose an enhanced CER).

Claims 2, 9-10, 13-15, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 as applied to claim 1 above, and further in view of Zhou et al. (US 2006/0045346), herein Zhou.
Regarding claim 2, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 disclose the method according to claim 1, wherein the performing adjacency operation on the pixels in the video frames to obtain the adjacency regions in the video frames includes: 
extracting the video frames at different time points according to a duration of the video (see Sang sect. IV. B. Detection of Frame with Non-Rolling Subtitles, where frames with non-rolling subtitles are detected); 
Although Sang discloses performing edge detection and connecting edges of the subtitle regions (see Sang sect. IV. C. Location of Non-Rolling Subtitle Region in Single Video Frame and Fig. 3(e)), Sang, Hirayama, and Sun do not explicitly disclose performing eroding and dilation treatment on the video 
Zhou teaches in a related and pertinent method for extraction captions (see Zhou Abstract), where morphological operations, including dilations and erosions are performed on detected edges to form candidate caption containing regions, and pixels of the candidate caption containing region are evaluated to determine if a predetermined number of the four non-diagonal adjacent pixels of candidate caption containing region pixels have the same value (see Zhou [0055]). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Zhou’s teachings to the subtitle detection algorithm of Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015, such that the detected edges used to extract subtitle regions are combined using morphological operations of dilation and erosion and four non-diagonal directional adjacency operations are performed. This modification is rationalized as application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 teach a base method for subtitle detection where subtitle regions are extracted based on detected edges of the image. Zhou teaches a known technique of performing dilation and erosion to determine candidate caption regions. One of ordinary skill in the art would have recognized that by applying Zhou’s technique to the teachings of Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 would have predictably resulted in performing the morphological operations and adjacency operations to connect the edges of the subtitle region.

Regarding claim 9, it recites a device performing the method of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 suggests a system performing the method of claim 1. Please see above for detailed claim analysis, with the exception to the following further limitations: a memory storing 
Please see the above rejection for claim 1, as the rationale to combine the teachings of Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 are similar, mutatis mutandis.
	Zhou teaches the use of data storage devices, such as ROM and RAM to store computer readable program code to be executed by a processing unit such as computer to implement the disclosed methods and systems (see Zhou [0049]).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Zhou’s teachings of using computer readable memory to store computer readable program code to be executed by a processing unit such as computer to implement the disclosed methods and systems to the suggested subtitle text detecting system of Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015. This modification is rationalized as use of a known technique to improve similar methods in the same way. In this instance, Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 teach a base system and method for subtitle text detection. Zhou teaches a known technique of using computer readable memory to store computer readable program code to be executed by a processing unit such as computer to implement the disclosed methods and systems. One of ordinary skill in the art could have applied Zhou’s technique of using computer readable memory to store computer readable program code to be executed by a processing unit such as computer to implement the disclosed methods and systems similarly to the disclosed subtitle title detection processing of Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015

Regarding claim 10, see above rejection for claim 9. It is a device claim reciting similar subject matter as claim 2. Please see above claim 2 for detailed claim analysis as the limitations of claim 10 are similarly rejected.



Regarding claim 14, see above rejection for claim 9. It is a device claim reciting similar subject matter as claim 6. Please see above claim 6 for detailed claim analysis as the limitations of claim 14 are similarly rejected.

Regarding claim 15, see above rejection for claim 9. It is a device claim reciting similar subject matter as claim 7. Please see above claim 7 for detailed claim analysis as the limitations of claim 15 are similarly rejected.

Regarding claim 17, it recites a non-transitory computer-readable storage medium storing computer program instructions executable by at least one processor to perform the method of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, Sun2015, and Zhou suggests a non-transitory computer-readable storage medium performing the method of claim 1 (see Zhou [0049]). Please see above for detailed claim analysis. 
Please see the above rejection for claim 9, as the rationale to combine the teachings of Sang, Hirayama, Yusufu, Agnihotri, Sun, Sun2015, and Zhou are similar, mutatis mutandis.

Regarding claim 18, see above rejection for claim 17. It is a non-transitory computer-readable storage medium claim reciting similar subject matter as claim 2. Please see above claim 2 for detailed claim analysis as the limitations of claim 18 are similarly rejected.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 as applied to claim 1 above, and further in view of Corey et al. (US 5,703,655), herein Corey.
Regarding claim 8, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 disclose the method according to claim 1, further comprising:
performing text recognition on the color-enhanced contrasting extremal regions as merged to generate recognized texts (see Sun sect. F. Candidate Text Line Formation – sect. H. Post-processing, where candidate text line are formed and the results from all six component trees are combined)
Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 do not explicitly disclose that based on the recognized texts, performing at least one of video searching, video recommendation, video identifier classification, and subtitle sharing.
Corey teaches in a related and pertinent video segment retrieving system and method using extracted closed captions (see Corey Abstract), where detected closed caption data is used to index the corresponding video segments and partition the closed caption text into “meaningful “ groups to describe the corresponding video segment (see Corey col. 5, ln. 30- col. 6, ln. 50).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Corey’s  teachings to the subtitle text detection algorithm of Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015, such that the detected text from the extracted subtitle regions are used to index the corresponding video segments. This modification is rationalized as application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 teach a base method for subtitle detection subtitle text is detected from extracted subtitle regions. Corey teaches a known technique of performing indexing video segments with the corresponding detected closed caption text. One of ordinary skill in the art would have recognized that by applying Corey’s technique to the teachings of Sang, Hirayama, Yusufu, .

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Sang, Hirayama, Yusufu, Agnihotri, Sun, Sun2015, and Zhou as applied to claim 9 above, and further in view of Corey et al. (US 5,703,655), herein Corey.
Regarding claim 16, see above rejection for claim 9. It is a device claim reciting similar subject matter as claim 8. Please see above claim 8 for detailed claim analysis as the limitations of claim 16 are similarly rejected.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015, as applied to claim 1 above, and further in view of Zafarifar et al. (US 2012/0206567), herein Zafarifar.
Regarding claim 20, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 do not explicitly disclose the method according to claim 1, wherein obtaining the difference value between the pixels of the adjacency regions is performed according to horizontal projection and vertical projection, the horizontal projection being a number of non-zero pixel values per text line, and the vertical projection being a number of non-zero pixel values.
Zafarifar teaches in a related and pertinent subtitle detection system and method (see Zafarifar Abstract), where the subtitle area is encompassed by a subtitle bounding box and the bounding box computation is performed based on an iterative bi-projection of temporarily filtered version of a pruned static region map (see Zafarifar [0067], [0070]-[0082]), where the bi-projection computes a horizontal projection which calculates the number of pruned static pixels on each line to select horizontally (see Zafarifar [0075] and Fig. 7) and a vertical projection on the selected areas (see Zafarifar [0078] and Fig. 7), and results in the bounding boxes for the subtitle lines (see Zafarifar [0081]-[0082] and Fig. 8). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Zafarifar’s teachings to the subtitle text detection algorithm of Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015, such that the subtitle regions used to check if a change in the subtitle has occurred is determined by a horizontal and vertical projection to determine the bounding box of the subtitle region. This modification is rationalized as application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 teach a base method for detecting a change in extracted subtitle regions by taking the sum of luminance level differences of corresponding pixels between two neighboring frames. Zafarifar teaches a known technique of determining the bounding box of a subtitle region based on a horizontal and vertical projection. One of ordinary skill in the art would have recognized that by applying Zafarifar’s technique to the teachings of Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 would have predictably resulted in an improved system for determining the subtitle regions for determining a subtitle change, where bounding boxes for the subtitle region are determined based on horizontal and vertical projections.

Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Sang, Hirayama, Yusufu, Agnihotri, Sun, Sun2015, and Zhou as applied to claim 1 above, and further in view of Thomsen (US 2002/0067428) and Cole et al. (US 2009/0013265), herein Cole.
Regarding claim 25, please see the above rejection of claim 1. Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 do not explicitly disclose the method according to claim 1, further comprising:

Thomsen teaches in a related and pertinent system and method for selecting symbols related to closed caption information on a television display (see Thomsen Abstract), where a caption module allows a viewer to use a cursor to point and select closed captions displayed on a screen and identify the specific words related to the selected onscreen words and copies the selected words to a second buffer to be transmitted to an external device (see Thomsen [0031]); where a cursor is suggested to allow a viewer to move and highlight symbols on display to select  the corresponding text (see Thomsen [0041]-[0042]); where the highlighted text are copied to a find buffer and the contents of the find buffer are sent to a data warehouse over a number of possible interfaces including wireless and cellular(see Thomsen [0048]-[0050]); and furthermore, teaches that the caption selection may be performed on a personal computer (PC) and allowing the a customer to select desired words or elements and pass the symbols to an internet browser on the PC (see Thomsen [0058]).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Thomsen to the subtitle text detection algorithm of Sang, Hirayama, Yusufu, Agnihotri, an Sun, and Sun2015, such that a user may select on-screen symbols corresponding to closed caption text and copy corresponding closed caption text to be transmitted to an external device for additional use, such as performing a search query. This modification is rationalized as application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Sang, Hirayama, Yusufu, Agnihotri, Sun, and Sun2015 teach a base method for performing text detection based on generalized color-enhance CERs upon the detected subtitled regions of video image frames. Thomsen teaches a known technique allowing a viewer to use a cursor to highlight and select on-screen closed caption, identify the corresponding closed caption text data of the selected on-screen symbols, and copy the closed caption text data to be transmitted to an external device via a number of possible interfaces.  
Sang, Hirayama, Yusufu, Agnihotri, Sun, Sun2015, and Thomsen do not explicitly disclose that the same subtitle is filled in a dialogue frame for sharing via instant messages. 
Cole teaches in a related and pertinent method for providing instant messaging communication between a first user and at least one other user via a communication network (see Cole Abstract), where in a user interface for an instant message conversation, a message composition region allows a user to type or compose messages to be sent and shared with another user (see Cole [0045]), and a user may copy and paste text data, e.g. a URL, into the message composition region to be sent and shared as an instant message (see Cole [0061]-[0063]).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Cole to the interactive subtitle text detection algorithm of Sang, Hirayama, Yusufu, Agnihotri, Sun, Sun2015, and Thomsen such that a user may copy and paste closed caption text corresponding to selected on-screen symbols into a message composition region of an instant messaging conversation user interface and send the copied closed caption text as an instant message. This modification is rationalized as application of a known technique to a known method ready for improvement to yield predictable results. In this instance, Sang, Hirayama, Yusufu, Agnihotri, Sun, Sun2015, and Thomsen teach a base method for performing text detection based on generalized color-enhance CERs upon the detected subtitled regions of video image frames and allows a user to select on-screen symbols of the detected subtitle region and copy the corresponding detected text to be transmitted to an external device for further use. Cole teaches a known technique allowing a user of an . 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIMOTHY WING HO CHOI whose telephone number is (571)270-3814.  The examiner can normally be reached on 9:00 AM to 5:00 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT RUDOLPH can be reached on (571) 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/TIMOTHY CHOI/Examiner, Art Unit 2661                                                                                                                                                                                                        

/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2661