PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 14/266,172
Filing Date: 30 Apr 2014
Appellant(s): ARRIS Enterprises LLC



__________________
Sean M. Douglass
For Appellant


EXAMINER’S ANSWER




This is in response to the appeal brief filed 5/3/2022.

(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated 10/25/2021 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”
The following ground(s) of rejection are applicable to the appealed claims.

Claims 1, 3-5, 7-13, and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Verrilli et al. (“Verrilli” US 20140082647) and further in view of Koo et al. (“Koo” US 20130177203), Pickering et al. (“Pickering” US 7489334), Swan (“Swan” US 20100259676), and Oztaskent et al. (“Oztaskent” US 20140282660).

Claims 2, 21, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Verrilli, Koo, Pickering, Swan, and Oztaskent as applied to claim 1 above, and further in view of Cummins et al. ("Cummins" US 20150169971).

Claim 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Verrilli, Koo, Pickering, Swan, and Oztaskent as applied to claim 1 above, and further in view of Ray ("Ray" US 20050105803).

Claims 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Verrilli, Koo, Pickering, Swan, and Oztaskent as applied to claim 12 above, and further in view of Bachman ("Bachman" US 8745650).

(2) Response to Argument
Regarding claim 1 and prior art Verrilli, Appellant argues:
In the Final Office Action, it is alleged that paragraphs [0035], [0045], and [0058] of Verilli disclose "receiving, by a computer system, video data comprising a plurality of frames arranged in an order, the video data including text data and/or embedded text data." In particular, the Final Office Action cites the "text overlay" of Verilli as allegedly disclosing "the video data including text data and/or embedded text data." However, the disclosure of Verilli focuses on the detection of text in a text overlay by performing a screenshot of the text overlay. See Verilli at least paragraphs [0058], [0061], and [0063]. This is in contrast to the present independent claims where text is received as part of the video data and not from a screenshot of a television display as in Verilli. Further, in addition to the text being included in the actual video data, the text may be "text data and/or embedded text data," but is not limited to the embedded-type textual data, e.g., text overlays, that is detected by Verilli. Indeed, Appellant's specification clearly differentiates between these two types of textual data. See at least paragraphs [0004] ("Any such text embedded in the image component of the video data is referred to herein as "on-screen text. " On-screen text is differentiated from text rendered from textual data included in the video data in that it is not associated with computer readable data and exists only as an image"), and [0022] ("In one embodiment, the server can analyze the video data to detect text depicted in the visual video content. ... Some video sources generate and embed additional text that can also be included in the visual video content. For example, a news broadcast may include overlays of graphics and/or text that emphasize some aspect of a news story."). Thus, from the disclosure of Verrilli, it is readily apparent that the text data is received from a screenshot of a television display and that the text being detected is limited to text located in such overlays. The disclosure of Verilli does not teach or suggest receiving text data that included in the video data as recited in the context of Appellant's independent claims.

Examiner respectfully disagrees. The featured limitation cites, “… the video data including text data and/or embedded text data”.  Examiner interprets Para 0003, 0004, and 0030 as descriptions of this feature.  The PGPUB cites:
[0003] textual data is included or associated with the video content. For example, program information that describes a particular asset (e.g., title, actors, running time, etc.) can be embedded as textual data into the video signal or video data used to transmit or store the video content (emphasis added).
[0030] In addition to the text 105 and overlays 107, the visual video content 100 may also include text rendered from computer readable textual data, such as closed captioning text 109 or electronic program guide information (not shown) (emphasis added).
[0004] In addition to the text defined by the textual data, text can also be embedded or included in the images of the video content. For instance, text in a particular scene can be captured in some of the images in the video. Images of text in signs, text in written documents, and other forms of text can be imaged and included in the visual component of the video content. In other scenarios, the producer of the video content can embed text data into the images of the video content. Such text can be rendered as an overlay to portray certain information in addition to or in parallel to the other information being portrayed in the images or audio of the video content. For example, television programs often overlay text to present supplemental information concurrently with the information in the visual and audio components of the video content (e.g., upcoming episode information, advertisements, etc.). News broadcasts use text embedded in the visual component of the video content to display information about additional news stories or critical updates (e.g., top headlines, story updates, time, temperature, etc.). Financial programs often include a scrolling bar or ticker-tape type display under the image of a newscaster to provide timely stock quotes. Documentaries, and other television shows and movies, label images with identifying information such as the names of people, places, and events. Television stations also superimpose station identification and advertisements for other programs onto the visual component of the video content. Any such text embedded in the image component of the video data is referred to herein as “on-screen text.”
Emphasized above in Appellant’s Para 0004, text can be embedded or included in the images of the video content.  The PGPUB goes on to describe examples such as text in a particular scene such as images of text in signs, documents, and other forms of text included in the visual component of the video content.  Additionally, text can be rendered as an overlay to portray information, such as television programs overlaying supplemental information concurrently like news broadcasts.
As cited in the previous office action, Para 0045 of Verrilli teaches “while the video is being displayed, the set-top box 103 may provide a text overlay 119 that includes channel and title information. For example, the text overlay 119 typically includes the channel the media program is being presented on, as well as the title of the media program. The text overlay 119 also often includes information about actors, characters, and/or a synopsis of the media program presented as user readable text” (emphasis added).
Further, Para 0035 of Verrilli teaches, “one aspect of the disclosure is a method of identifying and presenting content associated with a media program by capturing display data associated with the media program, extracting text from the display data in response to determining that the display data includes a text overlay, wherein the extracted text is associated with the media program” (emphasis added).
Examiner interprets Verrilli’s feature of a text overlay which includes information about the channel, title, actors, characters would teach Appellant’s description of textual data wherein the textual data which may include program information that describes a particular asset such as title and actors, wherein the textual data may be rendered in the video content such as closed captioning text or EPG information.

Appellant further argues:
Further, in the Final Office Action, it is alleged that paragraphs [0057]-[0058] of Verilli disclose ''providing, in a frame buffer of the computing system, temporary storage of the video data." In particular, it is alleged that the "display data cache 844" of Verilli discloses the frame buffer of the present independent claims. However, the "display data cache 844" of Verilli is not analogous to a frame buffer. Verilli discloses that the "display data cache 844 is used to store images and other data frequently downloaded by the client device 102-1." In contrast, a frame buffer is temporary storage of data representing all the pixels in a complete video frame and is not merely a data store of downloaded images as in Verilli. Thus, it is readily apparent that the disclosure of Verrilli does not disclose the use of a frame buffer as recited in the context of Appellant's independent claims.

Examiner respectfully disagrees.  According to broadest reasonable interpretation and based on the Appellant’s argument, the direction of the frame buffer is merely to temporarily store video data.  Para 0071 of Verrilli teaches the data module 420 included in the memory 406 further includes a display data cache 844. This is further taught in 0075, wherein the display data cache 844 is used to store images and other data frequently downloaded by the client device 102-1.  Additionally, it is well known to one of ordinary skill in the art that a cache is defined as a block of memory for temporary storage of data likely to be used again. Therefore, Examiner maintains Verrilli’s feature of a data module included in the memory that stores display data in a display data cache teaches the Appellant’s feature of a frame buffer used for temporary storage of video data.

Regarding claim 1 and prior art Koo, Appellant argues:
Koo discloses a method of "tracking an object in each of a plurality of frames of video data to generate a tracking result." See Koo, Abstract. In the Final Office Action, it is alleged that paragraph [0073] of Koo "discloses determining a location of the text in each of the plurality of frames as the text moves relative to the image capture device 102 over a period of time." (Office Action, page 10). However, Koo requires the use of a separate image capture device (i.e., a device having a lens and capable of capturing and processing an image) that tracks the movement of an object across a plurality of frames. See Koo paragraph [0043]. The image capture device 102 of Koo is separate from the "image processing device 104," as illustrated in Figure 1 of Koo and as described in at least paragraphs [0042] and [0044] of Koo. Thus, like the disclosure of Verilli, a screen capture of a display is required. Further, the Final Office Action alleges that Koo discloses "wherein the computer system identifies the location within the frame corresponding to a region containing text and/or embedded text based upon an identified association between frame context data of the video data in the frame buffer and frame context data that is associated with the score" as recited in Appellant's independent claims. (Emphasis added). However, Koo fails to disclose the use of frame buffers and indeed, the Final Office Action omits the phrase "based on an analysis of the video data in the frame buffer" from the recitation of the claim limitation on page 9 of the Final Office Action, instead relying on Verilli as disclosing this feature. However, Verilli also fails to disclose this feature as discussed above. Thus, it is readily apparent that the disclosure of Koo requires multiple devices in order to track an object in video data in contrast to the computing system as recited in the context of independent claims 1, 19, and 20.

Examiner respectfully disagrees.  Appellant misconstrues examiner’s position.  As recited in the previous office action, Verrilli is relied upon to teach Appellant’s feature of “based on an analysis of the video data in the frame buffer.”  Also, as it was mentioned in the discussion above, Verrilli teaches a display data cache, wherein the display data cache is used to stored images and other data.  Further, Verrilli is already relied upon for doing an analysis (e.g. optical character recognition) on regions of the frame (see pages 7-8 of the most recent Final Rejection). 
As previously cited, Para 0073 of Koo teaches the temporal filter 134 which includes a Kalman filter and a maximum-likelihood estimator, wherein the Kalman filter may be configured to determine the location of the text 153 in each of the plurality of frames as the text moves relative to the image capture device over a period of time, or as the image capture device moves relative to the text in each of the plurality of frames over a period of time.  It is Examiner’s interpretation that the method of determining the text over the course of a duration and a plurality of frames, that there must inherently be a storage or buffer that stores the previous frames.  It is because the previous frames are stored, that it is possible for prior art Koo to carry out the method of determining the location of the text in each frame by tracking and/or comparing the plurality of frames over the period of time.  This is further taught in para 0049 and 0045 of Koo, wherein the temporal filter 134 is relied upon to compensate the motions between a current frame and a previous frame based on historical motion information (i.e. motion history), wherein every frame of the plurality of frames of the video data 160 are taken into consideration.  Examiner interprets the historical motion information (i.e. motion history) of previous frames of video data to be information stored in a memory/storage and would, therefore, further teach a relationship of text tracking throughout each of the plurality of previous frames from the video data in storage.   
It would be obvious to one of ordinary skill in the art that Verrilli’s analysis of character recognition on a region of the frame could also include an analysis of the where the location within the frame the text is contained.  Therefore, the combination teachings of Verrilli’s feature of performing an analysis of optical character recognition on frames and Koo’s feature of identifying a location of the text in each of the plurality of the frames by comparing a plurality of current and stored frames over a period of time based on a historical motion information would teach appellant’s feature of identifying the location within the frame corresponding to a region containing text and/or embedded text based upon an identified association between frame context data of the data in the frame buffer and frame context that is associated with the score.

Regarding Claim 3, Appellant argues:
In particular, claim 3 is submitted to be independently allowable for the below reasons in addition to those discussed above with reference to the independent claims 1, 19, and 20. Verilli does not disclose or suggest "a dictionary comprising expected textual data" as alleged in the Final Office Action. Rather, Verilli discloses "the media server 130 may then check the extracted information against a content database (e.g., the content database 133, FIG. 1) to ensure that the extracted data is correct." See Verilli paragraph [0061]. The content database of Verilli is can include "advertisements, videos, images, music, web pages, email messages, SMS messages, content feeds, advertisements, coupons, playlists, XML documents, and ratings associated with various media content or any combination thereof." See Verilli paragraph [0049].  In contrast the dictionary comprising expected textual data of claim 3 is a dictionary selected based on the context of the frame from which the text was extracted. For example, if the frame context data indicates that the text is part of a television program about race car driving, then a dictionary manager can select a custom dictionary that includes vocabulary and phrases specific to the sport of race car driving. See Appellant's PGPub application paragraph [0096]. The disclosures of Koo, Pickering, Swan, and Oztaskent fail to cure the above-noted deficiency of Verilli. Therefore, the disclosures of Verilli, Koo, Pickering, Swan, and Oztaskent, individually or in combination, fail to disclose or suggest all the features of dependent claim 3.

Examiner respectfully disagrees.  Applicant misconstrues examiner’s position.  Claim 3 is directed towards the comparison of recognized characters to expected characters to improve accuracy of the character recognition operations as described in Para 0026 of Appellant’s PGPUB. As cited in the previous office action, Para 0061 of Verrilli teaches Having extracted the title and/or program information, the method includes cross-referencing the extracted information with a local and/or remote database 133 to ensure the validity of the information. Additionally, Para 0052, 0070, Koo may also be relied upon to teach a maximum-likelihood estimator may be configured to generate proposed text data via optical character recognition (OCR) and to access a dictionary to verify the proposed text data.
Therefore, Examiner maintains Verrilli, Koo, Pickering, Swan, and Oztaskent’ s features of cross-referencing of extract information with a local and/or remote database to determine validity or estimation of proposed text compared with a dictionary to verify the proposed text teaches Appellant’s feature comparing the recognized characters with the expected textual data.

Regarding claim 6, Appellant argues:
Further, claim 6 is submitted to be independently allowable for the below reasons in addition to those discussed above with reference to the independent claims 1, 19, and 20. In the Final Office Action, it is alleged that paragraph [0053] of Ray discloses "associating, by the computer system, the graphical user interface element definition with the frame and one or more other frames in the plurality of frames contiguous with the frame according to the order." However, paragraph [0053] of Ray discloses analyzing a plurality of images for readable text in order to select an "emphasis image," which can be used as a representative image of the plurality of images, e.g., a cover image for an album. See Ray paragraph [0053]-[0054]. The cited portion of Ray, or indeed any portion of Ray, fails to disclose a graphical user interface element, e.g., a button, hyperlink, control, etc., which is then associated with a plurality of frames continuous with a frame currently being analyzed. The disclosures of Verilli, Koo, Pickering, Swan, and Oztaskent fail to cure the above-noted deficiency of Ray. Therefore, the disclosures of Verilli, Koo, Pickering, Swan, Oztaskent, and Ray, individually or in combination, fail to disclose or suggest all the features of dependent claim 6.

Examiner respectfully disagrees.  Examiner misconstrues examiner’s position.  As previously cited in claim 1, Verrilli is relied upon to teach generating a graphical user interface element definition corresponding to the region based on the textual data.  Para 0055 of Verrilli teaches OCR data that is stored and determined based on recognized characters.  Para 0055 goes on to teach the OCR data includes recognized images 161-2 (fig. 2).  Further, Swan is relied upon for teaching in claim 1, the graphical user interface element definition comprising a boundary box corresponding to the region.  Para 0043-0044 of Swan teaches generating an output text region that includes input from multiple input text regions. As previously cited, Para 0053 of Ray is relied upon for scanning an image for text to store as metadata, once this is complete for the current image, the algorithm returns via path 618 to process subsequent images.  
  Additionally, Examiner’s interpretation of a graphical user interface element definition is merely an element that that is shown on the graphical user interface.  It is well known to one of ordinary skill in the art that a graphical user interface element are elements used by graphical user interfaces to offer a consistent visual language to represent information stored in computers.  Examiner notes that the claim language does not feature any sort of interaction between the user and the graphical user interface element definition and therefore does not require an element in which to button press, link, or control.
Therefore, Examiner maintains that the combination teachings of Verrilli, Koo, Pickering, Swan, Oztaskent, and Ray’s feature of an optical character recognition process to extract text wherein the OCR data includes images and the scanning for images for text in subsequent images teaches Appellant’s feature of associating the graphical user interface element definition with the frame and one or more other frames.

Regarding combination of prior arts Verrilli and Koo, Appellant argues:
The Office alleges that one of ordinary skill in the art would have modified Verilli in view of Koo to frame analysis for the reasons of doing optical character recognition. Appellant disagrees and respectfully submits that one of ordinary skill in the art would not have combined the cited art as alleged.
The disclosure of Verilli discloses detecting text located within a text overlay in a media program, while the disclosure of Koo discloses using a separate imaging device to track an object in video data being displayed on a separate display device. One of ordinary skill in the art would not have modified Verilli to include the features of Koo as alleged because while they both generally address detecting features in video data, they do so in completely different ways. Specifically, Verilli relies on an electronic device receiving the media program to detect the text overlays while Koo utilizes a secondary device to simply image a video display. Thus, the imaging device of Koo could not determine or detect what kind of text data is being displayed, e.g., text overly data, which is what is required in Verilli. For at least this reason, one of ordinary skill in the art would not seek to combine the disclosures of Verilli and Koo at all, let alone to arrive at Appellant's claims.

Examiner respectfully disagrees.  Verrilli is relied upon for teach applying an optical character recognition process to extract text in the frame.  Koo is relied upon for merely determining a location of the text in each of the plurality of frames as the text moves.  Koo would provide an advantage of text recognition in multiple frames over a period of time to improve user experience and to improve performance of the object tracking and detection system.  Both Verrilli and Koo discuss the scope of text recognition and are, thus, analogous to each other.

For the above reasons, it is believed that the rejections should be sustained.
Respectfully submitted,
/JAYCEE IMPERIAL/Examiner, Art Unit 2426                                                                                                                                                                                                        
Conferees:
/KYU CHAE/Primary Examiner, Art Unit 2426                                                                                                                                                                                                        
/NASSER M GOODARZI/Supervisory Patent Examiner, Art Unit 2426                                                                                                                                                                                                        
Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.