Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 1, 15 and 18 are objected to because of the following informalities:  claims 1, 15 and 18 line 15 reads “text content including a incremented time value.” It should read, “text content including AN incremented time value.”  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 15 and 18 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The instant claim amendments read “a set of video segments identified by analyzing a clock present in frames of the video using the OCR model, the clock including text content, the text content including an incremented time value.” However, nothing in the specs suggest an OCR model which analyzes the clock present in frames of the videos. 


Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 

Allowable Subject Matter
Claims 5-9, 17 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 10-16 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wason et al (US 2021/0141867 A1) in view of Patwardhan et al (US 2011/0099195 A1) in view of Moskowitz et al (US 2020/0294365 A1). Hereinafter referred as Wason, Patwardhan and Moskowitz.
Regarding claims 1, 15 and 18, Wason teaches a method, computing device and non-transitory computer readable-storage medium comprising: identifying, by a device, a video (this disclosure describes embodiments that can generate contextual identifiers indicating context for frames of a video and utilize those contextual identifiers to generate translations of text corresponding to such video frames (page 1 paragraph (0002)); analyzing, by the device, the video, and based on the analysis, identifying information related to a set of recognized actions from content of the video (analyzing a digital video file, the disclosed system identifies video frames corresponding to a scene and a term sequence corresponding to a subset of the video frames (page 1 paragraph (0002)); further analyzing, by the device, the video, and determining metadata related to the video (by analyzing image features or metadata, the contextual translation system identifies a set of frames corresponding to each scene from the digital video including video frames corresponding to a particular scene. After identifying the video frames, the contextual translation system inputs the video frames into the contextual neural network (page 5 paragraph (0058)); determining, by the device executing a trained optical character recognition (OCR) model (the contextual translation system extracts the term sequence in a source language from metadata associated with the digital video by applying optical character recognition (OCR) (page 6 paragraph (0060)), a set of video segments, the matching further comprising aligning the play-by-play text to each of the segments (contextual translation system identifies a set of video frames corresponding to a scene. In some embodiments, for instance, the contextual translation system identifies a set of frames corresponding to a scene based on the similarity of image features between contiguous frames within a video (page 2 paragraph (0019)). 
However, Wason is silent in teaching each of the set of segments being at least a predetermined number of seconds long and in continuous order, such that an entirety of an action is captured during the respective segment; and providing, by the device, access to the set of video segments. Patwardhan teaches on (page 3 paragraph (0032)) the segmentation server obtains the videos from a source, which may either be the live video stream or the archived video stream. The segmentation server then identifies logical segments in the obtained video. The logical segments may be identified on the basis of time or nature of play and so forth. For example, the video segments may be of 1 minute each. Based on the identified logical segments in the video, the segmentation server creates video segments from the obtained video stream. The segmentation server sends the video segments to the annotation module, which then creates metadata for the video segments. The metadata as assigned by the annotation module comprises textual data such as descriptive text, entity names, events types, etc. Patwardhan further teaches identifying, by the device, play-by-play text of the video, the play-by-play text providing an official time-aligned captioning of segments of the video (page 2 paragraph (0016)).
Therefore, it would have been obvious at the time of the invention to modify Wason’s reference to include the teachings of Patwardhan for each of the set of segments being at least a predetermined number of seconds long and in continuous order, such that an entirety of an action is captured during the respective segment; and providing, by the device, access to the set of video segments before the effective filing data of the claimed invention. A useful combination is found on Patwardhan (page 1 paragraph (0002)) the present invention relates to video content. More specifically, it relates to the processing, search, delivery and consumption of sports video content over the internet.
However, Wason and Patwardhan are silent in teaching a set of video segments identified by analyzing a clock present in frames of the video using the OCR model, the clock including text content, the text content including an incremented time value. Moskowitz teaches on (page 5 paragraph (0063)) OCR techniques may be implemented to identify the clock time, which is also typically graphically displayed on the original video content of an original sporting event. For example, event segments in a basketball game may be indicated by a predetermined clock time interval, which may be determined based on OCR identification of the displayed clock time for the particular game. In some embodiments, OCR techniques may be implemented to recognize text of closed caption data of the original sporting event. For example, dialogue and commentary by sports broadcasters regarding the original sporting event may be displayed as closed caption text on the video of the original content.
Therefore, it would have been obvious at the time of the invention to modify Wason’s and Patwardhan’s references to include the teachings of Moskowitz for a set of video segments identified by analyzing a clock present in frames of the video using the OCR model, the clock including text content, the text content including an incremented time value before the effective filing data of the claimed invention. A useful combination is found on Moskowitz (page 1 paragraph (0002)) the present disclosure relates to a system and associated methods of audio and video content processing. In one example, the present disclosure relates to generation of virtual sporting events. 

Regarding claim 2, Wason, Patwardhan and Moskowitz teach the method of claim 1. Wason teaches further comprising: identifying a set of training videos; determining, for each training video (page 7 paragraphs (0075)-(0076)), a presence of other contextual objects within each training video frame (page 3 paragraph (0031)); determining, for each training video, surface form consistencies of objects in frames of the training videos (page 3 paragraph (0037)); and determining, for each video, temporal consistencies across the frames of each training video (page 7 paragraph (0070)). 
Regarding claim 3, Wason, Patwardhan and Moskowitz teach the method of claim 2. Wason teachesdetermining knowledge constraints based on the determinations from the training videos; applying the knowledge constraints to recognized text from the training videos utilizing pretrained models (by training and applying both such neural networks, the contextual translation system can better translate a variety of terms or phrases, such as by accurately translating homonyms, idiomatic expressions, or slang based on contextual identifiers. Because the contextual translation system further trains and applies the translation neural network to generate affinity scores, the contextual translation system likewise generate translations of terms with better affinity to the image features in corresponding video frames. [….] By comparing affinity scores for translated terms corresponding to contextual identifiers across multiple iterations, the contextual translation system adjusts weight for such  contextual identifiers and improves the accuracy of contextual translations based on the adjusted weight (page 3 paragraphs (0032)-(0033)); and training text detection and recognition models based on clean text determined by application of the knowledge constraints (page 6 paragraph (0060)). 
Regarding claims 16 and 19, Wason, Patwardhan and Moskowitz teach the computing device and non-transitory computer readable storage medium of claims 15 and 18. Wason teaches identifying a set of training videos; determining, for each training video (page 7 paragraphs (0075)-(0076)), a presence of other contextual objects within each training video frame (page 3 paragraph (0031)); determining, for each training video, surface form consistencies of objects in frames of the training videos (page 3 paragraph (0037)); and determining, for each video, temporal consistencies across the frames of each training video (page 7 paragraph (0070)); determining knowledge constraints based on the determinations from the training videos; applying the knowledge constraints to recognized text from the training videos utilizing pretrained models (by training and applying both such neural networks, the contextual translation system can better translate a variety of terms or phrases, such as by accurately translating homonyms, idiomatic expressions, or slang based on contextual identifiers. Because the contextual translation system further trains and applies the translation neural network to generate affinity scores, the contextual translation system likewise generate translations of terms with better affinity to the image features in corresponding video frames. [….] By comparing affinity scores for translated terms corresponding to contextual identifiers across multiple iterations, the contextual translation system adjusts weight for such  contextual identifiers and improves the accuracy of contextual translations based on the adjusted weight (page 3 paragraphs (0032)-(0033)); and training text detection and recognition models based on clean text determined by application of the knowledge constraints (page 6 paragraph (0060)). 
Regarding claim 10, Wason, Patwardhan and Moskowitz teach the method of claim 1. Wason teaches analyzing the play-by-play text, and based on the analysis (page 5 paragraph (0058)). Patwardhan teaches identifying information related to a time and a specific portion of a game (page 2 paragraph (0024)); and mapping, based on a composite key defined by data related to the identified time and game portion information, the video segments to the portions of the play-by-play text (page 3 paragraph (0033)). 
Regarding claim 11, Wason, Patwardhan and Moskowitz teach the method of claim 1. Patwardhan teaches when the video is for a portion of a game (page 2 paragraph (0024)), the play-by-play text includes a corresponding portion of the official time-aligned captioning (page 2 paragraph (0016)). 
Regarding claim 13, Wason, Patwardhan and Moskowitz teach the method of claim 1. Patwardhan teaches the video segment comprises information associated with at least one of a type of action, a particular player, a particular time period, or a particular team (page 2 paragraph (0024)). 
Regarding claim 14, Wason, Patwardhan and Moskowitz teach the method of claim 1. Wason teaches requesting, over the network, third party digital content based at least on information related to the video segment; receiving, over the network, the third party digital content (page 4 paragraphs (0046)-(0047)); and communicating, over the network, the third party digital content for display along with the video segments (page 4 paragraph (0048)). 

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Wason et al (US 2021/0141867 A1) in view of Patwardhan et al (US 2011/0099195 A1) in view of Moskowitz et al (US 2020/0294365 A1) in view of Taylor et al (US 9,986,394 B1). Hereinafter referred as Wason, Patwardhan, Moskowitz and Taylor.
Regarding claim 4, Wason, Patwardhan and Moskowitz teach the method of claim 3. However, Wason and Patwardhan are silent in teaching the training of the text detection and recognition models is performed for different domains. Taylor teaches on (column 12 lines 43-63) multiple domains may operate substantially in parallel with different domain specific components. That is, domain B for video may have its own recognizer, including NER component and IC component B. Taylor further teaches on (column 8 lines 15-18) the NLU component interprets a text string to derive an intent or a desired action from the user as well as the pertinent pieces of information in the text data that allow a device to complete that action. 
Therefore, it would have been obvious at the time of the invention to modify Wason’s, Patwardhan’s and Moskowitz’s references to include the teachings of Taylor for the training of the text detection and recognition models is performed for different domains before the effective filing data of the claimed invention. A useful combination is found on Taylor (column 2 lines 30-38) the present disclosure expands the aforementioned messaging capabilities by enabling a system to create a multimedia messaging service from spoken message to make the audio of the spoken message accessible to the message recipient. 

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wason et al (US 2021/0141867 A1) in view of Patwardhan et al (US 2011/0099195 A1) in view of Moskowitz et al (US 2020/0294365 A1) and Bourgoyne et al (US 10,834,158 B1). Hereinafter referred as Wason, Patwardhan, Moskowitz and Bourgoyne.
Regarding claim 12, Wason, Patwardhan and Moskowitz teach the method of claim 1. Patwardhan teaches the play-by-play text is identified based on an identifier (ID) of a game (page 2 paragraph (0024)), wherein the game ID is identified based on at least one of metadata related to the game (page 3 paragraph (0033)).
However, Wason, Patwardhan and Moskowitz are silent in teaching a portion of a uniform resource locator (URL) of the video. Bourgoyne teaches on (column 2 lines 56-60) based on identifier for user such as login information, media server uses master manifest data to generate customized manifest data that only includes the URL for the first version which has version information. 
Therefore, it would have been obvious at the time of the invention to modify Wason’s, Patwardhan’s and Moskowitz’s references to include the teachings of Bourgoyne for a portion of a uniform resource locator (URL) of the video before the effective filing data of the claimed invention. A useful combination is found on Bourgoyne (column 1 lines 48-56) this disclosure describes techniques for encoding information in manifest data in a way that enables identification of a version of media content. In particular, these techniques involve encoding an identifier into customized manifest data by selecting certain playback options for selected durations of media content. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANKLIN S ANDRAMUNO whose telephone number is (571)270-3004. The examiner can normally be reached Mon - Fri, 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jefferey Harold can be reached on (571) 272-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANKLIN S ANDRAMUNO/Examiner, Art Unit 2424                
/JEFFEREY F HAROLD/Supervisory Patent Examiner, Art Unit 2424