DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/28/202 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 8-9 and 12-14 are rejected under 35 U.S.C. 102(a)(1) as being anticipated  by Yin, Xu-Cheng, et al. ("Text detection, tracking and recognition in video: a comprehensive survey."; Yin)

Regarding claims 1, 13 and 14  Yin discloses A method comprising using at least one hardware processor (Abstract: “within this framework, a variety of methods, systems, and evaluation protocols of video text extraction are summarized, compared, and analyzed.” it shows that the system has the processor as computer to execute the instruction or software.) to:
[Claim 13:   A system comprising: at least one hardware processor; and one or more software modules that, when executed by the at least one hardware processor, (Abstract: “within this framework, a variety of methods, systems, and evaluation protocols of video text extraction are summarized, compared, and analyzed.” it shows that the system has the processor as computer to execute the instructions or software.) ] 
[Claim 14: A non-transitory computer-readable medium having instructions stored therein, wherein the instructions, when executed by a processor, cause the processor to: (Abstract: “within this framework, a variety of methods, systems, and evaluation protocols of video text extraction are summarized, compared, and analyzed.” it shows that the system has the processor as computer to execute the instruction or software that store in the memory of the computer.”) ] 
until a determination to stop processing is made, for each of a plurality of image frames in a video stream. (II. A UNIFIED FRAMEWORK FOR VIDEO TEXT DETECTION, TRACKING AND RECOGNITION: “Detection is first performed first in each frame independently; then, the detected results in sequential frames can be integrated and enhanced based on the Tracking results (Tracking-based-Detection).”)
receive the image frame, (C. Tracking based Recognition: “text recognition mainly focus on recognizing text in a single image (frame).”)
generate a text-recognition result from the image frame, wherein the text- recognition result comprises a vector of class estimations for each of one or more characters, (4-Tracking in the Compressed Domain: “The general text tracking strategy in the compressed domain is to use motion vectors in compressed streams to assess the motion similarity of the text macroblocks to track detected text.”)
combine the text-recognition result with an accumulated text-recognition result, ( 3-Tracking With Tracking-by-Detection: “A linear classifier is applied to determine whether the current text block matches with a similar block in previous frames. Finally, both the word scores and the text block matches are linearly interpolated to recover missing detection results.”)
estimate a distance between the accumulated text-recognition result ( 3-Tracking With Tracking-by-Detection: “linear classifier” has accumulated text-recognition result) and a next accumulated text-recognition result( 3-Tracking With Tracking-by-Detection: “linear classifier” has  next accumulated text-recognition result)  based on an approximate model (3-Tracking With Tracking-by-Detection: “linear classifier”)  of the next accumulated text-recognition result( 3-Tracking With Tracking-by-Detection: “ next linear classifier”); (A-Text tracking: “only one related work [89] exists in which the edit distance between the a recognized word in the current frame and the candidate word in the next frame is regarded as one feature for text matching. For tracking with detection, objects or detected text positions are used to track text across consecutive frames.”), and 
determine whether or not to stop the processing based on the estimated distance; (II. A UNIFIED FRAMEWORK FOR VIDEO TEXT DETECTION, TRACKING AND RECOGNITION: “Detection is first performed first in each frame independently; then, the detected results in sequential frames can be integrated and enhanced based on the Tracking results (Tracking-based-Detection).”, it shows that “ the detected result” interpreted as “ determine whether or not to stop”) and,
after stopping the processing, output a character string based on the accumulated text- recognition result.(3-Tracking With Tracking-by-Detection: Finally, both the word scores and the text block matches are linearly interpolated to recover missing detection results.”)

Regarding claim 2, Yin discloses wherein estimating the distance between the accumulated text-recognition result and a next accumulated text-recognition result comprises modeling the next accumulated text-recognition result by using previous text-recognition results as candidates for a next text-recognition result. (3-Tracking With Tracking-by-Detection: three features are taken into account in the tracking-by-detection process, namely, the overlap ratio, the temporal distance measured by the number of frames between the current text block and the candidate in the prior frames, and the edit distance between the current word and the candidate word.”

Regarding claim 8, Yin discloses wherein estimating the distance between the accumulated text-recognition result and a next accumulated text-recognition result further comprises, for each of the previous text-recognition results, calculating a distance between the accumulated text- recognition result and a combination of the accumulated text-recognition result with the previous text-recognition result. (2-Recognition Result Fusion: “Recognition results fusion simply combines the text recognition results of different frames into one final character/text, which can generally improve the overall recognition performance. .”)

Regarding claim 9, Yin discloses wherein calculating a distance between the accumulated text-recognition result and a combination of the accumulated text-recognition result with the previous text-recognition result (2-Recognition Result Fusion: “Recognition results fusion simply combines the text recognition results of different frames into one final character/text, which can generally improve the overall recognition performance. .”) comprises aligning the accumulated text-recognition result with the previous text-recognition result based on a previous alignment of the accumulated text- recognition result with the previous text-recognition result. (1-Distorted Text Detection and Recognition: for skewed and unaligned text, we can simply use the common character/word classifiers after text de-skewing and realigning ... use segmentation-based word recognition methods, which construct strong character classifiers and search the target word by using optimization techniques with some priors (e.g., a lexicon) [192].”)

Regarding claim 12, Yin discloses  wherein the at least one hardware processor is comprised in a mobile device, (Real-time translation: “A mobile augmented reality (AR) translator on a mobile phone using a smart-phone camera and touchscreen is described in [20] and [108].”) and wherein the image frames are received in real time or near-real time as the image frames are captured by a camera of the mobile device. (Feature Representation: “used a projection profiles-based method for multi-frame verification, which can effectively track text when the camera is zooming in or out.”)

	Allowable Subject Matter
Claim 3-7 and 10-11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Regarding claim 3, Yin  does not discloses wherein the distance between the accumulated text- recognition result and a next accumulated text-recognition result is estimated as:
 
    PNG
    media_image1.png
    78
    291
    media_image1.png
    Greyscale

 wherein An is the estimated distance, 
wherein n is a current number of image frames for which text-recognition results have been combined with the accumulated text-recognition result, 32Attorney Docket No. 125405-0007UT01 
wherein 8 is an external parameter, 
wherein Sn is a number of vectors of class estimations in the accumulated text-recognition result, wherein K is a number of classes represented in each vector of class estimations in the accumulated text-recognition result, and 
wherein Aijk is a contribution to the estimated distance by a class estimation for a k-th class to aj- th component of the accumulated text-recognition result from the vector of class estimations in the text-recognition result generated from an i-th image frame.
Claim 3 is subject allowable matter by the limitation base on the specific equation of estimated the distance between the accumulated text- recognition result and a next accumulated text-recognition result.

Regarding claim 4, Yin does not disclose , wherein Aijk is calculated as: 
    PNG
    media_image2.png
    49
    250
    media_image2.png
    Greyscale
 
wherein Ask is a weighted sum of class estimations for the k-th class corresponding to the j-th component of the accumulated text-recognition result, 
wherein yijk is a class estimation for the k-th class that was merged into the j-th component of the accumulated text-recognition result from the vector of class estimations in the text-recognition result generated from the i-th image frame, 
wherein w, is a weight associated with the text-recognition result generated from the i-th image frame, and wherein W is a sum of all weights wi.
Claim 4 would be allowable because it is dependent on claim 3. 

Regarding claim 5,  Yin does not disclose, wherein Aijk is calculated as:
 
    PNG
    media_image3.png
    54
    195
    media_image3.png
    Greyscale

wherein Ask is a sum of class estimations for the k-th class corresponding to the j-th component of the accumulated text-recognition result, and 
wherein yijk is a class estimation for the k-th class that was merged into the j-th component of the accumulated text-recognition result from the vector of class estimations in the text-recognition result generated from the i-th image frame.
Claim 5 would be allowable because it is dependent on claim 3.

Regarding claim 6, Yin does not disclose, wherein El Aikj is calculated as:
 
    PNG
    media_image4.png
    66
    365
    media_image4.png
    Greyscale

 33Attorney Docket No. 125405-0007UT01 Lek c {1, 2, ..., n}, such that Vi E Lek : n - ytjk < Ask,
 
    PNG
    media_image5.png
    49
    161
    media_image5.png
    Greyscale

wherein Ask is a sum of class estimations for the k-th class corresponding to the j-th component of the accumulated text-recognition result, and
 wherein Yijk is a class estimation for the k-th class that was merged into the j-th component of the accumulated text-recognition result from the vector of class estimations in the text-recognition result generated from the i-th image frame.  
Claim 6 would be allowable because it is dependent on claim 3.

Regarding claim 7, Yin does not disclose wherein values of Yijk are stored in one or more balanced binary search trees.
Claim 7 would be allowable because it is dependent on claim 6.
Regarding claim 10, Yin does not discloses wherein the distance between the accumulated text- recognition result and the next accumulated text-recognition result is estimated as:
 
    PNG
    media_image6.png
    66
    395
    media_image6.png
    Greyscale

 wherein An is the estimated distance, wherein n is a current number of image frames for which text-recognition results have been combined with the accumulated text-recognition result,
wherein 8 is an external parameter, 34Attorney Docket No. 125405-0007UT01 
wherein p is a distance metric function, 
wherein R, is the accumulated text-recognition result, and wherein R (X1, ..., Xn, X1) is a combination of all text-recognition results, that have been previously combined to form the accumulated text-recognition result, with a text-recognition result from an i-th image frame.  
Claim 10 is subject allowable matter by the limitation base on the specific equation of estimated the distance between the accumulated text- recognition result and a next accumulated text-recognition result.

Regarding claim 11,Yin  does not discloses wherein the distance between the accumulated text- recognition result and a next accumulated text-recognition result is calculated using:             
                
                    
                        
                            
                                2
                                G
                            
                            
                                n
                            
                        
                    
                    
                        
                            
                                G
                            
                            
                                n
                            
                        
                        +
                        
                            
                                2
                                S
                            
                            
                                n
                            
                        
                        '
                    
                
            
        
wherein Gn is a sum of generalized Levenshtein distances between the accumulated text- recognition result and combinations of the accumulated text-recognition result with the previous text-recognition results, and 
wherein Sn is a number of vectors of class estimations in the accumulated text-recognition result.
Claim 11 is subject allowable matter by the limitation base on the specific equation of calculate the distance between the accumulated text- recognition result and a next accumulated text-recognition result.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Prasad et al (U.S. 20100246961 A1), “MULTI-FRAME VIDEOTEXT RECOGNITION”, teaches about exploited to mitigate challenges posed by varying characteristics of videotext across frame instances to improve OCR techniques of text in video streams, for examples, as measured by a word error rate (WER)
Isaev (U.S. 20170116494 A1), “VIDEO CAPTURE IN DATA CAPTURE SCENARIO”, teaches about computer systems, and more particularly, to facilitating data capture in video streams. It also teaches about lower cost alternative platform to capture data from physical documents using mobile devices (e.g., smart phones, tablet computers, etc.). Data may be captured from data fields on physical documents (forms, questionnaires, financial documents, etc.) using mobile devices with built-in cameras, processed using OCR, and either stored locally or sent to remote databases all within an application executing on the mobile device. 
Gokturk et al (U.S. 20060251339 A1), “System And Method For Enabling The Use Of Captured Images Through Recognition”, teaches about a system and method for enabling the use of captured images. It also teaches about the programmatic of digitally captured images using, among other advancements, image recognition. Image files for data and information that enables, among other features, the indexing of the contents of images based on analysis of the images. Additionally, images may be made searchable based on recognition information of objects contained in the images.
Wang et al (U.S. 20010012400 A1), “PAGE ANALYSIS SYSTEM”, teaches about a page analysis system for analyzing image data of a document page by utilizing a block selection technique, and particularly to such a system in which blocks of image data are classified based on characteristics of the image data. For example, blocks of image data may be classified as text data, titles, half-tone image data, line drawings, tables, vertical lines or horizontal lines.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Duy A Tran whose telephone number is (571)272-4887. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward F Urban can be reached on (571)-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY TRAN/Examiner, Art Unit 2665                 


/BOBBAK SAFAIPOUR/Primary Examiner, Art Unit 2665