DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/15/2021 has been entered.
 
Response to Arguments
Applicant's arguments filed 12/16/2020 have been fully considered but they are not persuasive.
Applicant argues:
Without conceding to the merits of the rejection, Applicant has amended independent claim 1 as follows:
"and wherein performing the character recognition operation on the region comprises referencing a standard dictionary." (Emphasis added).
Applicant respectfully asserts that nothing in the cited references can be read as disclosing, teaching, or suggesting character recognition that is based upon a reference to a standard dictionary. The claimed method improves the accuracy of recognized text in a region that is likely to include text based upon a reference to a standard dictionary. According to the claims, the region is identified based upon the identification of recognizable text, and nothing in the cited art teaches an acknowledgement of recognizable text via a reference to a standard dictionary.

Examiner respectfully disagrees.  As discussed in the advisory action dated 1/19/2021.  The amended language cites “wherein performing the character recognition operation on the region comprises referencing a standard dictionary”.  Koo (US 20130177203) Para 0052 teaches “The maximum-likelihood estimator may be configured to generate proposed text data via optical character recognition (OCR) and to access a dictionary to verify the proposed text data. For example, the maximum-likelihood estimator may access one or more dictionaries stored in the memory 108, such as a representative dictionary 140".  Para 0083 of Koo teaches “The maximum-likelihood estimator 634 may be configured to select a text candidate corresponding to an entry of the dictionary 140 according to a confidence value associated with the text candidate”.  Para 0094 of the applicant’s specifications describe the standard dictionary to include a listing of expected words and/or phrases against which the recognized textual data can be compared to determine if the recognized textual data is reasonable or valid. Examiner interprets Koo’s selection of “text candidates” according to confidence values for verification of proposed text data would teach Applicant’s feature of comparing a listing of “expected words” against the recognized textual data.  Therefore, according to broadest reasonable interpretation, Examiner maintains the character recognition operation of Koo would teach referencing a standard dictionary.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 7-13, and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Verrilli et al. (“Verrilli” US 20140082647) and further in view of Koo et al. (“Koo” US 20130177203), Pickering et al. (“Pickering” US 7489334), Swan (“Swan” US 20100259676), and Oztaskent et al. (“Oztaskent” US 20140282660).

Regarding claim 1, Verrilli teaches a method comprising:
receiving, by a computer system, video data comprising a plurality of frames arranged in an order [Verrilli - Fig. 1a: suggests any device may receive video data and digital video data from the broadcast system or content provider];
providing, in a frame buffer of the computer system, temporary storage of the video data [Verrilli - Para 0057-0058: discloses a client device obtaining screen capture data from the video signal.  Para 0071, Fig. 8: discloses a client device (item 102-1) with a data module (item 420) included in the memory (item 406) that stores display data in a display data cache (item 844)]; and
for a frame in the plurality of frames temporarily stored in the frame buffer: [Verrilli – Para 0075: discloses the display data cache 844 is used to store images and other data frequently downloaded by the client device 102-1.]
by the computer system, based on an analysis of the video data in the frame buffer [Verrilli - Para 0006: discloses evaluating the display data to determine whether or not the display data includes a text overlay], 
performing, by the computer system, a character recognition operation [i.e. optical character recognition] on the region to generate recognized characters [Verrilli - Para 0014: discloses applying an optical character recognition process to extract the text], 
generating, by the computer system, textual data [i.e. OCR data] based on the recognized characters [Verrilli - Para 0055, Fig. 2: discloses OCR data that it receives and stores in memory]; and
generating, by the computer system, a graphical user interface element definition [i.e. included in OCR data] corresponding to the region based on the textual data [Verrilli - Para 0055: discloses OCR data that includes data about all that was captured],
Verrilli does not explicitly teach identifying, a location within the frame corresponding to a region likely containing text, wherein identifying the location further comprises identifying the location based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer, and wherein the stored data comprises a score that is associated with a high probability of the region of the previous frame containing recognizable text, wherein the computer system identifies the location within the frame corresponding to a region containing text based upon an identified association between frame context data of the video data in the frame buffer and frame context data that is associated with the score;
wherein performing the character recognition operation on the region comprises performing the character recognition operation on corresponding regions containing the text in one or more other frames in the plurality of frames, and wherein performing the character recognition operation on the region comprises referencing a standard dictionary;
generating, by the computer system, a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Koo teaches identifying, a location within the frame corresponding to a region likely containing text, wherein identifying the location further comprises identifying the location based upon stored data from a character recognition operation, and wherein the stored data comprises a score that is associated with a high probability of the region of the previous frame containing recognizable text, wherein the computer system identifies the location within the frame corresponding to a region containing text based upon an identified association between frame context data of the video data in the frame buffer and frame context data that is associated with the score; [Koo - Para 0073: discloses determining a location of the text in each of the plurality of frames as the text moves relative to the image capture device 102 over a period of time, or as the image capture device 102 moves relative to the text 153 in each of the plurality of frames over a period of time.  Fig. 7: step 750 and 760 suggests estimating motion of object between a particular frame and a previous frame.  Para 0038: discloses to improve precision, a particular text box is shown only when the particular text box is detected in at least m times in recent n frames.  Assuming that the detection probability of a text box is p, this technique may improve precision of text box detection. The improved precision may be expressed as: f ( p , n , m ) = k = m n ( n k ) p k ( 1 - p ) n - k.  Therefore, the probability that a frame will contain a text box will be based on the calculation according to the previous frames];
wherein performing the character recognition operation on the region comprises performing the character recognition operation on corresponding regions containing the text in one or more other frames in the plurality of frames, and wherein performing the character recognition operation on the region comprises referencing a standard dictionary [Koo - Para 0073: discloses generating proposed text data (e.g., via optical character recognition (OCR)) representing the text in each of the plurality of frames.  Para 0052, 0070, 0083: teaches accessing one or more dictionaries stored in the memory to verify the proposed text data];
Verrilli and Koo are analogous in the art because they are from the same field of frame processing [abstract].  It would have been obvious to one of ordinary skill in the 
Verrilli and Koo do not explicitly teach identifying based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer;
generate a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data; and
generate a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Pickering teaches identifying based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer; [Pickering – Col. 4, Line12-19: discloses as images are captured and passed to the central processing device, they are stored in an image buffer as shown at step 130 so that they can be compared with each other as detailed below by the image processing modules 140 to 160 where, at module 140, object priority and sensitivity is established; at module 150, frame to frame changes (i.e., 
Verrilli, Koo, and Pickering are analogous in the art because they are from the same field of image analysis [Col. 1, Line7-16].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verrilli and Koo in view of Pickering to using data from previous frames for the reasons of improving accuracy by comparing frames when determining regions of interest. 
Verrilli, Koo, and Pickering do not explicitly teach generate a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data; and
generate a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Swan teaches generate a graphical user interface element definition comprising a boundary box [i.e. output text region] corresponding to the region based on the textual data [Swan - Para 0043-0044: discloses generating an output text region that includes input from multiple input text regions]; and
generate a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters [Swan - Para 0045-0047, Fig. 12: discloses applying character recognition technology to a detected text region, modifying the output video data to include data based on the character values detected in the region.  The text output region may include a rendering of the text].
Verrilli, Koo, Pickering, and Swan are analogous in the art because they are from the same field of character recognition in video signals [abstract].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verrilli, Koo, and Pickering in view of Swan to GUI elements for the reasons of displaying focus on the recognized characters.
Verrilli, Koo, Pickering, and Swan do not explicitly teach wherein the graphical user interface element is user-selectable, and wherein a user selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Oztaskent teaches wherein the graphical user interface element is user-selectable, and wherein a user selection of the graphical user interface element initiates a search using the textual data as a search query. [Oztaskent – Para 0063, 0072, Fig. 4: teaches the client application can enter a result display mode 
Verrilli, Koo, Pickering, Swan, and Oztaskent are analogous in the art because they are from the same field of providing information related to media content [abstract].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verrilli, Koo, Pickering, and Swan in view of Oztaskent to selectable elements for the reasons of improving the watching experience by providing additional information when the user selects to search for the identified data.

Regarding claim 3, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 1 further comprising accessing, by the computer system, a dictionary [i.e. database] comprising expected textual data, and wherein generating the textual data comprises comparing the recognized characters with the expected textual data [Verrilli - Para 0061: discloses cross referencing with a database to ensure validity of the information].

Regarding claim 4, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 1 further comprising transmitting the video data and the graphical user interface element definition from the computer system to a remote client computing device [i.e. laptops, tablets, phones] for display on the client computing device [Verrilli - Para 0039: discloses that video data may be received by any number of display devices, including computers, laptop computers, tablet computers, smart phones and the like].

Regarding claim 5, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 1 further comprising storing, by the computer system, the video data and the graphical user interface element definition in one or more data stores accessible to a plurality of client computing devices [Verrilli - Para 0051: discloses memory may optionally include one or more storage devices remotely located in/from the CPUs].

Regarding claim 7, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 1 further comprising:
generating, by the computer system, a graphical user interface element [i.e. on screen button] based on the graphical user interface element definition [Verrilli - Para 0077: discloses an “INFO” button on the application interface displayed]; and
associating, by the computer system, an operation [i.e. the initiation of the overlay] to be performed in response to a user input received through the user interface element [Verrilli - Para 0078: discloses the user input will initiate the display of the program information overlay].

the method of claim 7 wherein the user interface element comprises a visual representation [i.e. on screen results] of at least a portion of the region or the text [Verrilli - Para 0078, Fig. 7a, 7b: discloses the character recognition can be used to do a search query with the results displayed on screen].

Regarding claim 9, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 7 further comprising generating, by the computer system, a graphical user interface [i.e. information box] comprising the graphical user interface element, wherein the graphical user interface is superimposed on the frame and one or more other frames in the plurality of frames [Verrilli - Fig. 7a: suggests that the information box will be an overlay on playing television program].

Regarding claim 10, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 7 further comprising executing, by the computer system, the operation, wherein the operation uses the textual data as input [Verrilli - Para 0067: discloses performing an internet search based on at least some of the extracted text by submitting a search query to the search server system].

Regarding claim 11, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 10, wherein the operation comprises generating a request [i.e. query] for data comprising the textual data, the method further comprising:
sending the request for data from the computer system to an external data source [i.e. search server system] [Verrilli - Para 0067, Fig. 6: discloses the search queries are submitted to a search server system];
receiving, in response to the request for data, additional data [i.e. associated content] related to the textual data [Verrilli - Para 0067: discloses the search server system responds to a received search query by providing information and/or access to information.  Para 0074: discloses an associated content search module to produce one or more search queries transmitting to the search server system]; and
generating, by the computer system, another graphical user interface comprising information based on the additional data [Verrilli - Para 0075: discloses that the search results will be displayed in information box].

Regarding claim 12, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 1 further comprising determining, by the computer system, metadata associated with the video data and comprising information about the content of the video data, and wherein generating the textual data is further based on the metadata [Verrilli - Para 0048: discloses metadata associated with content files.  Para 0045: discloses the metadata being displayed on screen in a text overlay.  Para 0054: discloses the OCR data obtained from information on screen].

Regarding claim 13, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 12 wherein determining the metadata comprises receiving electronic program guide data comprising descriptions of content of the video data [Verrilli - Para 0050: discloses metadata is associated with the content received from the broadcast system].

Regarding claim 17, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 12 wherein determining the metadata comprises receiving a custom dictionary [i.e. content database] of expected textual data associated with the metadata or a user, and wherein generating the textual data comprises comparing the recognized characters with the custom dictionary [Verrilli - Para 0059: discloses the client device may communicate with the media server in order to check the validity of the extracted information using a content database].

Regarding claim 18, Verrilli, Koo, Pickering, Swan, and Oztaskent teaches the method of claim 12 wherein the metadata further comprises predetermined coordinates [i.e. position of expected text overlay] for the region in the frame and an area, and wherein determining the region is based on the metadata [Verrilli - Para 0045, Fig. 1b: discloses the expected text overlay that includes program channel, title, and information about actors, characters, synopses. Para 0050: discloses the application program interface instructions are included with the signal from the broadcasting system, so they metadata that is received will determine how it is displayed based on the instructions].

Regarding claim 19, Verrilli teaches a method comprising:
receiving, by a computer system, video data comprising a plurality of frames arranged in an order [Verrilli - Fig. 1a: suggests any device or system may receive video data and digital video data from the broadcast system or content provider];
providing, in a frame buffer of the computer system, temporary storage of the video data [Verrilli - Para 0057-0058: discloses a client device obtaining screen capture data from the video signal.  Fig. 1B: suggests a client device with memory]; and
for a frame in the plurality of frames temporarily stored in the frame buffer:
determining, by the computer system, contextual data associated with the video data based on an analysis of the video data in the frame buffer [Verrilli - Para 0058: discloses indicators that identify what is being shown with the content];
by the computer system, based on the contextual data a region [i.e. text overlay] containing text [Verrilli - Para 0006: discloses evaluating the display data to determine whether or not the display data includes a text overlay];
performing, by the computer system, a character recognition operation on the region to generate recognized characters [Verrilli - Para 0014: discloses applying an optical character recognition process to extract the text],
generating, by the computer system, textual data based on the recognized characters [Verrilli - Para 0055, Fig. 2: discloses OCR data that it receives and stores in memory],
Verrilli does not explicitly teach identifying, a location within the frame corresponding to a region likely containing text, wherein identifying the location further comprises identifying the location based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer, and wherein the stored data comprises a score that is associated with a high probability of the region of the previous frame containing recognizable text, wherein the computer system identifies the location within the frame corresponding to a region containing text based upon an identified association between frame context data of the video data in the frame buffer and frame context data that is associated with the score;
wherein performing the character recognition operation on the region comprises performing the character recognition operation on corresponding regions containing the text in one or more other frames in the plurality of frames, and wherein performing the character recognition operation on the region comprises referencing a standard dictionary;
generating, by the computer system, a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data; and
generating a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Koo teaches identifying, a location within the frame corresponding to a region likely containing text, wherein identifying the location further comprises identifying the location based upon stored data from a character recognition operation, and wherein the stored data comprises a score that is associated with a high probability of the region of the previous frame containing recognizable text, wherein the computer system identifies the location within the frame corresponding to a region containing text based upon an identified association between frame context data of the video data in the frame buffer and frame context data that is associated with the score [Koo - Para 0073: discloses determining a location of the text in each of the plurality of frames as the text moves relative to the image capture device 102 over a period of time, or as the image capture device 102 moves relative to the text 153 in each of the plurality of frames over a period of time.  Fig. 7: step 750 and 760 suggests estimating motion of object between a particular frame and a previous frame.  Para 0038: discloses to improve precision, a particular text box is shown only when the particular text box is detected in at least m times in recent n frames.  Assuming that the detection probability of a text box is p, this technique may improve precision of text box detection. The improved precision may be expressed as: f ( p , n , m ) = k = m n ( n k ) p k ( 1 - p ) n - k.  Therefore, the probability that a frame will contain a text box will be based on the calculation according to the previous frames];
wherein performing the character recognition operation on the region comprises performing the character recognition operation on corresponding regions containing the text in one or more other frames in the plurality of frames , and wherein performing the character recognition operation on the region comprises referencing a standard dictionary [Koo - Para 0073: discloses generating proposed text data (e.g., via optical character recognition (OCR)) representing the text in each of the plurality of frames.  Para 0052, 0070, 0083: teaches accessing one or more dictionaries stored in the memory to verify the proposed text data];
wherein identifying the location further comprises identifying the location based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer [Koo - Para 0073: discloses determining a location of the text in each of the plurality of frames as the text moves relative to the image capture device 102 over a period of time, or as the image capture device 102 moves relative to the text 153 in each of the plurality of frames over a period of time.  Fig. 7: step 750 and 760 suggests estimating motion of object between a particular frame and a previous frame].
In addition, the rationale of claim 1 regarding Koo is used for claim 19.
Verrilli and Koo do not explicitly teach identifying based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer;
generating, by the computer system, a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data; and
generating a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Pickering teaches identifying based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer,; [Pickering – Col. 4, Line12-19: discloses as images are captured and passed to the central processing device, they are stored in an image buffer as shown at step 130 so that they can be compared with each other as detailed below by the image processing modules 140 to 160 where, at module 140, object priority and sensitivity is established; at module 150, frame to frame changes (i.e., comparing the Nth frame with the N-1th frame within a datastream) are checked; and, at module 160, motion is identified and/or predicted]
In addition, the rationale of claim 1 regarding Pickering is used for claim 19.
Verrilli, Koo, and Pickering do not explicitly teach generating, by the computer system, a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data; and
generating a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Swan teaches generating, by the computer system, a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data [Swan - Para 0043-0044: discloses generating an output text region [i.e. output text region] that includes input from multiple input text regions]; and
generating a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters [Swan - Para 0045-0047, Fig. 12: discloses applying character recognition technology to a detected text region, modifying the output video data to include data based on the character values detected in the region.  The text output region may include a rendering of the text];
In addition, the rationale of claim 1 regarding Swan is used for claim 19.
Verilli, Koo, Pickering, and Swan do not explicitly teach wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Oztaskent teaches wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query. [Oztaskent – Para 0063, 0072, Fig. 4: teaches the client application can enter a result display mode that transmits the user selections, which can include a selected image, a selected region of interest, a selected face, a selected object, and/or any other suitable portion of an image, to the search server. The client application can then receive and present one or more search results associated with the selected image and/or region of interest to the user.]
In addition, the rationale of claim 1 regarding Oztaskent is used for claim 19.

Regarding claim 20, Verrilli teaches a computing system comprising:
one or more processors [Verrilli - Para 0042: discloses any device to include one or more processors that is able to connect to the communication network]; and
a memory comprising instructions that, when executed by the processors, configure the one or more processors to be configured to [Verrilli - Para 0042: discloses any device to include one or more processors and memory]:
receive video data comprising a plurality of frames arranged in an order [Verrilli - Fig. 1a: suggests any device or system may receive video data and digital video data from the broadcast system or content provider];
temporarily store the video data in a frame buffer of the computing system [Verrilli - Para 0057-0058: discloses a client device obtaining screen capture data from the video signal.  Fig. 1B: suggests a client device with memory]; and
for a frame in the plurality of frames temporarily stored in the frame buffer:
based on an analysis of the video data in the frame buffer, a region containing text [Verrilli - Para 0006: discloses evaluating the display data to determine whether or not the display data includes a text overlay],
perform a character recognition operation on the region to generate recognized characters [Verrilli - Para 0014: discloses applying an optical character recognition process to extract the text], 
generate textual data based on the recognized characters [Verrilli - Para 0055, Fig. 2: discloses OCR data that it receives and stores in memory];
Verrilli does not explicitly teach identify, a location within the frame corresponding to a region likely containing text, wherein identifying the location further comprises identifying the location based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer, and wherein the stored data comprises a score that is associated with a high probability of the region of the previous frame containing recognizable text, wherein the location within the frame corresponding to a region containing text is identified based upon an identified association between frame context data of the video data in the frame buffer and frame context data that is associated with the score;
wherein to perform the character recognition operation on the region comprises to perform the character recognition operation on corresponding regions containing the text in one or more other frames in the plurality of frames, and wherein performing the character recognition operation on the region comprises referencing a standard dictionary;
generate a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data; and
generate a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Koo teaches identify, a location within the frame corresponding to a region likely containing text, wherein identifying the location further comprises identifying the location based upon stored data from a character recognition operation, and wherein the stored data comprises a score that is associated with a high probability of the region of the previous frame containing recognizable text, wherein the location within the frame corresponding to a region containing text is identified based upon an identified association between frame context data of the video data in the frame buffer and frame context data that is associated with the score [Koo - Para 0073: discloses determining a location of the text in each of the plurality of frames as the text moves relative to the image capture device 102 over a period of time, or as the image capture device 102 moves relative to the text 153 in each of the plurality of frames over a period of time.  Fig. 7: step 750 and 760 suggests estimating motion of object between a particular frame and a previous frame.  Para 0038: discloses to improve precision, a particular text box is shown only when the particular text box is detected in at least m times in recent n frames.  Assuming that the detection probability of a text box is p, this technique may improve precision of text box detection. The improved precision may be expressed as: f ( p , n , m ) = k = m n ( n k ) p k ( 1 - p ) n - k.  Therefore, the probability that a frame will contain a text box will be based on the calculation according to the previous frames];
wherein to perform the character recognition operation on the region comprises to perform the character recognition operation on corresponding regions containing the text in one or more other frames in the plurality of frames, and wherein performing the character recognition operation on the region comprises referencing a standard dictionary [Koo - Para 0073: discloses generating proposed text data (e.g., via optical character recognition (OCR)) representing the text in each of the plurality of frames.  Para 0052, 0070, 0083: teaches accessing one or more dictionaries stored in the memory to verify the proposed text data];
wherein to identify the location further comprises identifying the location based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer [Koo - Para 0073: .
In addition, the rationale of claim 1 regarding Koo is used for claim 20. 
Verrilli and Koo do not explicitly teach identifying based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer;
generate a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data; and
generate a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Pickering teaches identifying based upon stored data from a character recognition operation previously performed on a region of a previous frame in the frame buffer; [Pickering – Col. 4, Line12-19: discloses as images are 
In addition, the rationale of claim 1 regarding Pickering is used for claim 20.
Verrilli, Koo, and Pickering do not explicitly teach generate a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data; and
generate a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters, wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query.

However, Swan teaches generate a graphical user interface element definition comprising a boundary box corresponding to the region based on the textual data [Swan - Para 0043-0044: discloses generating an output text region [i.e. output text region] that includes input from multiple input text regions]; and
generate a graphical user interface element based on the graphical user interface element definition, wherein the graphical user interface element is superimposed over the boundary box for the frame and for one or more other frames in the plurality of frames, and the graphical user interface element comprises a textual representation of at least a portion of the textual data based on the recognized characters [Swan - Para 0045-0047, Fig. 12: discloses applying character recognition technology to a detected text region, modifying the output video data to include data based on the character values detected in the region.  The text output region may include a rendering of the text];
In addition, the rationale of claim 1 regarding Swan is used for claim 20. 
Verrilli, Koo, Pickering, and Swan do not explicitly teach wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query

However, Oztaskent teaches wherein the graphical user interface element is user-selectable, and wherein a user-selection of the graphical user interface element initiates a search using the textual data as a search query [Oztaskent – Para 0063, 0072, Fig. 4: teaches the client application can enter a result display mode that transmits the user selections, which can include a selected image, a selected region of interest, a selected face, a selected object, and/or any other suitable portion of an image, to the search server. The client application can then receive and present one 
In addition, the rationale of claim 1 regarding Oztaskent is used for claim 20. 

Claims 2, 21, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Verrilli, Koo, Pickering, Swan, and Oztaskent as applied to claim 1 above, and further in view of Cummins et al. ("Cummins" US 20150169971).

Regarding claim 2, Verrilli, Koo, Pickering, Swan, and Oztaskent do not explicitly teach claim 2.  However, Cummins teaches The method of claim 1 the stored data comprises one or more of an estimate of successful recognition and a score describing a likelihood of accurate text recognition [Cummins - Para 0029: discloses an OCR engine confidence score that indicates a confidence level that the associated term has been correctly recognized].
Verrilli, Koo, Pickering, Swan, Oztaskent, and Cummins are analogous in the art because they are from the same field of character recognition [abstract].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verrilli, Koo, Pickering, Swan, and Oztaskent in view of Cummins to confidence levels for the reasons of improving the accuracy of the determined text.

Regarding claim 21, Verrilli, Koo, Pickering, Swan, and Oztaskent do not explicitly teach claim 21.  However, Cummins teaches The method of claim 19, wherein the stored data comprises one or more of an estimate of successful recognition and a score describing a likelihood of accurate text recognition [Cummins - Para 0029: discloses an OCR engine confidence score that indicates a confidence level that the associated term has been correctly recognized].
In addition, the rationale of claim 2 is used for claim 21.

Regarding claim 22, Verrilli, Koo, Pickering, Swan, and Oztaskent do not explicitly teach claim 22.  However, Cummins teaches The method of claim 20, wherein the stored data comprises one or more of an estimate of successful recognition and a score describing a likelihood of accurate text recognition [Cummins - Para 0029: discloses an OCR engine confidence score that indicates a confidence level that the associated term has been correctly recognized].
In addition, the rationale of claim 2 is used for claim 21.

Claim 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Verrilli, Koo, Pickering, Swan, and Oztaskent as applied to claim 1 above, and further in view of Ray ("Ray" US 20050105803).

Regarding claim 6, Verrilli, Koo, Pickering, Swan, and Oztaskent do not explicitly teach claim 6.  However, Ray teaches the method of claim 1 further comprising associating, by the computer system, the graphical user interface element definition with the frame and one or more other frames in the plurality of frames contiguous with the frame according to the order [Ray - Para 0053, Fig. 8: discloses .
Verrilli, Koo, Pickering, Swan, Oztaskent, and Ray are analogous in the art because they are both from the same field of performing image scans [Para 0013].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verrilli, Koo, Pickering, Swan, and Oztaskent in view of Ray to content scanning for the reasons of performing a scan on different displayed images.

Claims 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Verrilli, Koo, Pickering, Swan, and Oztaskent as applied to claim 12 above, and further in view of Bachman ("Bachman" US 8745650).

Regarding claim 14, Verrilli, Koo, Pickering, Swan, and Oztaskent do not explicitly teach claim 14.  However, Bachman teaches the method of claim 12 wherein determining the metadata comprises analyzing the video data to detect one or more segments of the video data [Bachman - Col 2, Line 38 – Col 3, Line 8: discloses using metadata to identify content segments, but when no metadata is available, it records and analyzes the video segment for comparison with a backend system to determine its metadata].
Verrilli, Koo, Pickering, Swan, Oztaskent, and Bachman are analogous in the art because they are both from the same field of video segment analyzation [Col 2, Line 38].  It would have been obvious to one of ordinary skill in the art before the effective 

Regarding claim 15, Verrilli, Koo, Pickering, Swan, and Oztaskent do not explicitly teach claim 15.  However, Bachman teaches the method of claim 14 wherein the segments of the video data are defined by continuity of audio data [Bachman - Col 2, Line 38 – Col 3, Line 8: discloses segments can be determined using time shifted analysis.  When there is some kind of interruption, depending on how long, it can separate and analyze the segments].
In addition, the rationale for claim 14 can be used for claim 15.

Regarding claim 16, Verrilli, Koo, Pickering, Swan, and Oztaskent do not explicitly teach claim 16.  However, Bachman teaches the method of claim 14 wherein the segments of the video data are defined by continuity of visual data [Bachman - Col 2, Line 38 – Col 3, Line 8: discloses segments can be determined using time shifted analysis.  When there is some kind of interruption, depending on how long, it can separate and analyze the segments].
In addition, the rationale for claim 14 can be used for claim 16.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

US 4512032 A	Namba; Hiromi – Buffer character recognition with standard patterns
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAYCEE IMPERIAL whose telephone number is (571)270-0604.  The examiner can normally be reached on 8-6 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on 571.272.4195.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/JAYCEE IMPERIAL/           Examiner, Art Unit 2426 



/NASSER M GOODARZI/           Supervisory Patent Examiner, Art Unit 2426