DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed 06/30/2022 has been entered. Claims 1-20 remain pending.

Response to Arguments
Applicant’s arguments, see Remarks filed 06/30/2022, regarding the rejection of Claims 1, 2, and 12, particularly in view of Paluri (US 2018/0189570), have been fully considered but are not persuasive. 
Applicant argues that Paluri ([0043]) discloses accessing feature vectors, but not generating feature vectors, as required by the claims.
Examiner respectfully disagrees. While Paluri at ([0043]) simply discloses accessing the feature vectors, ([0034-0035]) discuss the creation of the feature vectors by the video recognition system in more detail, particularly the use of text, audio and video recognition modules to generate the feature vectors that are later accessed. Therefore, Paluri explicitly discloses generation of the feature vectors that are later accessed by the system.   
Applicant argues that Paluri ([0027, 0043]) makes no mention whatsoever of the title of the subject video in regard to any of the disclosed feature vectors, and that to the extent “text” is discussed in ([0043]), Applicant argues that the generic and ambiguous reference to text cannot be used to anticipate the specific requirement of a “title” in the claims, therefore Paluri cannot anticipate the “generating, by the computing device, respective feature vectors of a title, a thumbnail, a description, and a content of the subject video” as required by the recited claim limitations.
While Examiner agrees that a generic and ambiguous mentioning of text cannot anticipate a title, the disclosure in Paluri ([0027]) does not disclose the text ambiguously, as argued by Applicant. Instead, Paluri discloses making a prediction about a video-content object based on “text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.)” This disclosure does not just generally mention text in conjunction with the video, but rather gives very specific examples of the types of text that are expected to be analyzed due to their association with the content file. It is unclear why the specific types of data disclosed in Paluri would not also include the title of the video file. Therefore, the rejection is maintained.
Applicant argues that Paluri ([0039]) does not disclose both a “video corpus” and a “misleading video corpus” as required by the recited claim limitations, that Paluri makes no mention of a video corpus in general, and that is unclear from the explanation of the rejection how Paluri is being interpreted as reading on these elements.
Paluri ([0039]) discloses determining the context of a video-content object is inappropriate by comparing, through a fusion module, the feature vectors of the current video-content object with a second video-content object, and thereby removing said video as inappropriate content when the comparison indicates that the two video-content objects have similar feature vectors. This is interpreted as indicative of the video corpus, since one video-content object has already been classified, the results of which are now being used in a comparative matter to determine similarity with another video object being analyzed, implying collection or storage of prior classifications. Furthermore, since similarity between these objects indicate that the content in question should be removed as inappropriate, it is interpreted that the system not only tracks the classification of analyzed video-content objects, but also particular actions to take based on those classification. There is no indication that only a single previous classification is used to perform subsequent analysis, and Paluri ([0039]) discloses many different types of content that may be deemed inappropriate suggesting that at least that many previous videos are available for comparison. Therefore Paluri discloses the claimed video corpus and continues to anticipate the recited claim limitations.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 12-13, 15, 17, and 19-20 are rejected under 35 U.S.C. 102(a)(1) & 102(a)(2) as being anticipated by Paluri et al (US 2018/0189570).
Regarding Claim 1, Paluri teaches a method (Figs. 4-5), comprising: training, by a computing device, a model using a video corpus ([0027-0028], a video understanding platform may be trained by machine learning to make a prediction about a video-content object based on one or more of: frames of the video-content object, audio of the video-content object, and text associated with the video-content object, Fig. 4, video understanding engine 400 may comprise a video-recognition module 410, a text-recognition module 420, and an audio-recognition module 430, video-recognition module 410 may be trained by machine learning to receive a feature vector representing a video-content object based on one or more frames of the video-content object and output a prediction about the video-content object, text-recognition module 420 may be trained by machine learning to receive a feature vector representing a video-content object based on text associated with the video-content object and output a prediction about the video-content object, audio-recognition module 430 may be trained by machine learning to receive a feature vector representing a video-content object based on one or more portions of audio of the video-content object and output a prediction about the video-content object, this disclosure contemplates any suitable video understanding engine); 
obtaining, by the computing device, a subject video from a content server ([0043], Fig. 5, determining a context of a video-content object, the social-networking system 160 accessing video-content object); 
generating, by the computing device, respective feature vectors ([0043], Fig. 5, at step 510 social-networking system 160 may access a first feature vector representing a video-content object corresponding to a node in a social graph of a social-networking system, wherein: the video-content object comprises frames and audio and is associated with text, the first feature vector is based on one or more of the frames of the video-content object, step 520, social-networking system 160 may access a second feature vector representing the video-content object, wherein the second feature vector is based on at least some of the text, step 530, social-networking system 160 may access a third feature vector representing the video-content object, wherein the third feature vector is based on one or more portions of the audio) of a title, a thumbnail, a description, and a content of the subject video ([0027], may be trained by machine learning to make a prediction about a video-content object based on an analysis of one or more frames (e.g., a still image) of the video-content object, based on an analysis of part or all of the audio of a video-content object (e.g., speech identification, language identification, sound identification, source separation, etc.), based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), [0036], feature vector based on one or more frames of the video-content object may be based on recognizing objects depicted in the frames, feature vector based on text associated with the video-content object, such as posts on an online social network that include the text or the title for the video-content object, a feature vector representing the video-content object based on a combination of the inputted feature vectors); 
determining, by the computing device, first semantic similarities between ones of the feature vectors ([0042], social-networking system 160 may, for each identified objects, access a feature vector representing the identified object and map objects to feature vectors by feature extraction, or access a cached feature vector for an object that has been previously mapped, social-networking system 160 may rank each identified object based on a similarity metric between the feature vector representing the video-content object and the feature vector representing the identified object, the similarity metric may be a cosine similarity between the feature vector representing the video-content object and the feature vector representing the identified object); 
determining, by the computing device, a second semantic similarity between the title of subject video and titles of videos in a misleading video corpus in a same domain as the subject video ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening material, or material that violates copyright law, social-networking system 160 may remove a second video-content object based on determining that the video-content object and the second video-content object are similar based on the feature vector for the video-content object and a feature vector for the second video-content object); 
determining, by the computing device, a third semantic similarity between comments of the subject video and comments of videos in the misleading video corpus in the same domain as the subject video ([0027], text-recognition module may be trained by machine learning to make a prediction about a video-content object based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), a prediction about a video-content object may comprise a context, a predicted future action, a predicted object, a predicted motion, or any other suitable prediction, a video-content object may be a video that is streamed live and information (e.g., likes, comments, shares, video content, etc.) may be received in an ongoing manner and the computer-vision platform may update a prediction based on this information); 
classifying, by the computing device, the subject video using the model and based on the first semantic similarities, the second semantic similarity, and the third semantic similarity ([0043]. Fig. 5, step 540 social-networking system 160 may determine a fourth feature vector representing the video-content object, wherein the fourth feature vector is based on a combination of the first, second, and third feature vectors, at step 550 social-networking system 160 may determine a context of the video-content object based on the fourth feature vector and social-graph information based at least in part on one or more nodes or edges connected to the node corresponding to the video-content object, [0036], fusion module 440 may be trained by machine learning, to determine the context based on a feature vector representing the video-content object, a feature vector based on at least some of the text associated with the video-content object, and a feature vector based on one or more portions of audio of the video-content object); and 
outputting, by the computing device, the classification of the subject video to a user ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening material, or material that violates copyright law).
Regarding Claim 2, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri further teaches analyzing a user profile of a user that posted the subject video, wherein the classifying the subject video is based on the analyzing user profile ([0037-0038], a video-content object may be posted on a user's page on an online social network, the user may have posted the video on her birthday, as determined by the user profile of the user, the context may be that the video-content object depicts the user's birthday party, as determined by a feature vector feature vector representing the video-content object and the social-graph information, social-networking system 160 may generate a recommendation for a second video-content object based on the feature vector of the video-content object and a user profile for the user).
Regarding Claim 12, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri further teaches wherein the classifying the subject video comprises classifying the subject video as one of: misleading; potentially misleading; and non-misleading ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening material, or material that violates copyright law).
Regarding Claim 13, Paluri teaches a computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media ([0044-0053], Fig. 6, computer system 600), the program instructions executable (Figs. 4-5) to: obtain a subject video from a content server ([0043], Fig. 5, determining a context of a video-content object, the social-networking system 160 accessing video-content object);
generate respective feature vectors ([0043], Fig. 5, at step 510 social-networking system 160 may access a first feature vector representing a video-content object corresponding to a node in a social graph of a social-networking system, wherein: the video-content object comprises frames and audio and is associated with text, the first feature vector is based on one or more of the frames of the video-content object, step 520, social-networking system 160 may access a second feature vector representing the video-content object, wherein the second feature vector is based on at least some of the text, step 530, social-networking system 160 may access a third feature vector representing the video-content object, wherein the third feature vector is based on one or more portions of the audio) of a title, a thumbnail, a description, and a content of the subject video ([0027], may be trained by machine learning to make a prediction about a video-content object based on an analysis of one or more frames (e.g., a still image) of the video-content object, based on an analysis of part or all of the audio of a video-content object (e.g., speech identification, language identification, sound identification, source separation, etc.), based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), [0036], feature vector based on one or more frames of the video-content object may be based on recognizing objects depicted in the frames, feature vector based on text associated with the video-content object, such as posts on an online social network that include the text or the title for the video-content object, a feature vector representing the video-content object based on a combination of the inputted feature vectors);
determine semantic similarities between ones of the feature vectors ([0042], social-networking system 160 may, for each identified objects, access a feature vector representing the identified object and map objects to feature vectors by feature extraction, or access a cached feature vector for an object that has been previously mapped, social-networking system 160 may rank each identified object based on a similarity metric between the feature vector representing the video-content object and the feature vector representing the identified object, the similarity metric may be a cosine similarity between the feature vector representing the video-content object and the feature vector representing the identified object); 
classify the subject video based on a weighted sum of the semantic similarities ([0043]. Fig. 5, step 540 social-networking system 160 may determine a fourth feature vector representing the video-content object, wherein the fourth feature vector is based on a combination of the first, second, and third feature vectors, at step 550 social-networking system 160 may determine a context of the video-content object based on the fourth feature vector and social-graph information based at least in part on one or more nodes or edges connected to the node corresponding to the video-content object, [0036-0037], fusion module 440 may be trained by machine learning, to determine the context based on a feature vector representing the video-content object, a feature vector based on at least some of the text associated with the video-content object, and a feature vector based on one or more portions of audio of the video-content object); and 
output the classification of the subject video to a user ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening, or material that violates copyright law).
Regarding Claim 15, Paluri teaches all aspects of the claimed invention as disclosed in Claim 13 above. Paluri further teaches wherein: the program instructions are executable to determine a title semantic similarity between the title of subject video and titles of videos in a misleading video corpus in a same domain as the subject video; and the classifying the subject video further based on the title semantic similarity ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening material, or material that violates copyright law, social-networking system 160 may remove a second video-content object based on determining that the video-content object and the second video-content object are similar based on the feature vector for the video-content object and a feature vector for the second video-content object).
Regarding Claim 17, Paluri teaches a system comprising: a processor, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media ([0044-0053], Fig. 6, computer system 600), the program instructions executable (Figs. 4-5) to: obtain a subject video from a content server ([0043], Fig. 5, determining a context of a video-content object, the social-networking system 160 accessing video-content object); 
generate respective feature vectors ([0043], Fig. 5, at step 510 social-networking system 160 may access a first feature vector representing a video-content object corresponding to a node in a social graph of a social-networking system, wherein: the video-content object comprises frames and audio and is associated with text, the first feature vector is based on one or more of the frames of the video-content object, step 520, social-networking system 160 may access a second feature vector representing the video-content object, wherein the second feature vector is based on at least some of the text, step 530, social-networking system 160 may access a third feature vector representing the video-content object, wherein the third feature vector is based on one or more portions of the audio) of a title, a thumbnail, a description, and a content of the subject video ([0027], may be trained by machine learning to make a prediction about a video-content object based on an analysis of one or more frames (e.g., a still image) of the video-content object, based on an analysis of part or all of the audio of a video-content object (e.g., speech identification, language identification, sound identification, source separation, etc.), based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), [0036], feature vector based on one or more frames of the video-content object may be based on recognizing objects depicted in the frames, feature vector based on text associated with the video-content object, such as posts on an online social network that include the text or the title for the video-content object, a feature vector representing the video-content object based on a combination of the inputted feature vectors);
determine semantic similarities between ones of the feature vectors ([0042], social-networking system 160 may, for each identified objects, access a feature vector representing the identified object and map objects to feature vectors by feature extraction, or access a cached feature vector for an object that has been previously mapped, social-networking system 160 may rank each identified object based on a similarity metric between the feature vector representing the video-content object and the feature vector representing the identified object, similarity metric may be a cosine similarity between the feature vector representing the video-content object and the feature vector representing the identified object); 
classify the subject video based on a weighted sum of the semantic similarities ([0043]. Fig. 5, step 540 social-networking system 160 may determine a fourth feature vector representing the video-content object, wherein the fourth feature vector is based on a combination of the first, second, and third feature vectors, at step 550 social-networking system 160 may determine a context of the video-content object based on the fourth feature vector and social-graph information based at least in part on one or more nodes or edges connected to the node corresponding to the video-content object, [0036-0037], fusion module 440 may be trained by machine learning, to determine the context based on a feature vector representing the video-content object, a feature vector based on at least some of the text associated with the video-content object, and a feature vector based on one or more portions of audio of the video-content object); and 
output the classification of the subject video to a user ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening, or material that violates copyright law).
Regarding Claim 19, Paluri teaches all aspects of the claimed invention as disclosed in Claim 17 above. Paluri further teaches wherein: the program instructions are executable to determine a title semantic similarity between the title of subject video and titles of videos in a misleading video corpus in a same domain as the subject video; and the classifying the subject video further based on the title semantic similarity ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening material, or material that violates copyright law, social-networking system 160 may remove a second video-content object based on determining that the video-content object and the second video-content object are similar based on the feature vector for the video-content object and a feature vector for the second video-content object).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Paluri et al (US 2018/0189570), in view of Kapoor et al (US 2018/0293278).
Regarding Claim 3, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. While Paluri teaches classification of video-content ([0039, 0043]), Paluri fails to teach determining an average watch time of the subject content, wherein the classifying the subject content is based on the average watch time.
In the same field of endeavor, Kapoor teaches determining an average watch time of the subject content, wherein the classifying the subject content is based on the average watch time ([0056], length of time the comment has been viewed may comprise an average time over a specified period that users spend viewing the comment (e.g., an average of 30 seconds), the comment relevance system 216 may interpret the length of time the comment has been viewed as having a direct relationship to the level of relevancy of the comment).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature vectors and other text and social content associated with the video, as taught in Paluri, to further include consideration of an average watch time when classifying the video, as taught in Kapoor, in order to enhance usability and electronic resource efficiency by filtering content according to relevance and appropriateness. (See Kapoor [0013, 0060])

Claims 7-8, 10-11, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Paluri et al (US 2018/0189570), in view of Zhang et al (US 2019/0347355).
Regarding Claim 7, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri further teaches where generating the respective feature vectors comprises: extracting entities from each of the title, the thumbnail, the description, and the content of the subject video ([0027], may be trained by machine learning to make a prediction about a video-content object based on an analysis of one or more frames (e.g., a still image) of the video-content object, based on an analysis of part or all of the audio of a video-content object (e.g., speech identification, language identification, sound identification, source separation, etc.), based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), [0036], feature vector based on one or more frames of the video-content object may be based on recognizing objects depicted in the frames, feature vector based on text associated with the video-content object, such as posts on an online social network that include the text or the title for the video-content object, a feature vector representing the video-content object based on a combination of the inputted feature vectors).
Paluri fails to teach wherein: the video corpus comprises videos classified using predefined classes; and mapping the extracted entities to one or more of the predefined classes.
In the same field of endeavor, Zhang teaches wherein: the video corpus comprises videos classified using predefined classes; and mapping the extracted entities to one or more of the predefined classes ([0030-0031], content item classification module 102 can classify a content item and determine whether the content item is of a particular type of content item using a “multi-stage classification process”, content item classification module 102 can make an initial determination of whether the content item falls within the classification, examples of such types of content items can include engagement bait, click bait, etc., [0040], initial classification module 204 can train a machine learning model to determine whether a content item is a particular type of content item, the particular type of content item can be engagement bait or click bait).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature vectors and other text and social content associated with the video, as taught in Paluri, to further include mapping the determine context into predefined classification types based on the detected content, as taught in Zhang, in order to determine a more accurate classification and take one or more actions based on the classification of a content item. (See Zhang [0032])
Regarding Claim 8, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri fails to teach wherein the model is one of plural different models that each have different combinations of inputs, and further comprising selecting the model from the plural different models based on inputs available for the subject video.
In the same field of endeavor, Zhang teaches wherein the model is one of plural different models that each have different combinations of inputs, and further comprising selecting the model from the plural different models based on inputs available for the subject video ([0030], content item classification module 102 can make an initial determination of whether the content item falls within the classification, the initial determination can be based on non-social signals associated with the content item, if it is uncertain whether the content item falls within the classification based on the initial determination, the content item classification module 102 can monitor the content item and make a subsequent determination of whether the content item falls within the classification, the subsequent determination can be based on social signals associated with the content item and/or non-social signals associated with the content item, a process of making an initial determination and one or more subsequent determinations of whether a content item falls within a classification can be referred to as a “multi-stage classification process” [0040], initial classification module 204 can train a machine learning model to determine whether a content item is a particular type of content item, the particular type of content item can be engagement bait or click bait, training data can include various features that relate to non-social signals, such as content attributes associated with content items including text, an image, a video, an audio, a type of media (e.g., an image, a video, an audio, text, etc.), a duration of a content item (e.g., time length of a video), a subject matter, one or more objects represented in a content item [0028], social signals not available when content item first created or posted).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature vectors and other text and social content associated with the video, as taught in Paluri, to further include applying a multi-stage classification process using different model, at different times, based on different input data, when determining the classification of the content, as taught in Zhang, in order to determine a more accurate classification for a content item based on different data available over time and take one or more actions based on the classification. (See Zhang [0032])
Regarding Claim 10, Paluri, as modified by Zhang, teaches all aspects of the claimed invention as disclosed in Claim 8 above. Zhang further teaches re-classifying the subject video at a later time using a different one of the plural different models ([0030], content item classification module 102 can make an initial determination of whether the content item falls within the classification, the initial determination can be based on non-social signals associated with the content item, if it is uncertain whether the content item falls within the classification based on the initial determination, the content item classification module 102 can monitor the content item and make a subsequent determination of whether the content item falls within the classification, the subsequent determination can be based on social signals associated with the content item and/or non-social signals associated with the content item, a process of making an initial determination and one or more subsequent determinations of whether a content item falls within a classification can be referred to as a “multi-stage classification process” [0047-0048], monitoring determination module 206 can have determined that the content item should be monitored, the triggering module 224 can trigger determining the subsequent classification for the content item based on specified criteria, subsequent classification module 226 can determine a subsequent classification for a monitored content item based on social signals associated with the content item and/or non-social signals associated with the content item, [0050-0052], subsequent classification module 226 can train a separate machine learning model for each type of content item).
Regarding Claim 11, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri fails to teach updating the video corpus to include the subject video and the classification of the subject video; and re-training the model using the updated video corpus.
In the same field of endeavor, Zhang teaches updating the video corpus to include the subject video and the classification of the subject video; and re-training the model using the updated video corpus ([0040, 0043, 0050], machine learning models are initially trained with labeled data relating to content items, the classification modules can retrain the machine learning models based on new or updated training data).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature vectors and other text and social content associated with the video, as taught in Paluri, to further include retraining the trained models with new and updated classification data, as taught in Zhang, in order to determine a more accurate classification and take one or more actions based on the classification of a content item. (See Zhang [0032])
Regarding Claim 16, Paluri teaches all aspects of the claimed invention as disclosed in Claim 13 above. Paluri further teaches wherein: the program instructions are executable to determine a comments semantic similarity between comments of subject video and comments of videos in a misleading video corpus in a same domain as the subject video; and the classifying the subject video further based on the comments semantic similarity ([0027], text-recognition module may be trained by machine learning to make a prediction about a video-content object based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), a prediction about a video-content object may comprise a context, a predicted future action, a predicted object, a predicted motion, or any other suitable prediction, a video-content object may be a video that is streamed live and information (e.g., likes, comments, shares, video content, etc.) may be received in an ongoing manner and the computer-vision platform may update a prediction based on this information).
Paluri fails to teach wherein the misleading video corpus is a subset of videos of a video corpus, wherein each video in the video corpus is tagged with one or more predefined classes, one or more predefined domains, and one or more predefined audio/visual features, and wherein each video in the misleading video corpus is additionally tagged as known misleading.
In the same field of endeavor, Zhang teaches wherein the misleading video corpus is a subset of videos of a video corpus, wherein each video in the video corpus is tagged with one or more predefined classes, one or more predefined domains, and one or more predefined audio/visual features, and wherein each video in the misleading video corpus is additionally tagged as known misleading ([0030-0031], content item classification module 102 can classify a content item and determine whether the content item is of a particular type of content item using a “multi-stage classification process”, content item classification module 102 can make an initial determination of whether the content item falls within the classification, examples of such types of content items can include engagement bait, click bait, etc., [0040], initial classification module 204 can train a machine learning model to determine whether a content item is a particular type of content item, the particular type of content item can be engagement bait or click bait).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature vectors and other text and social content associated with the video, as taught in Paluri, to further include mapping the determine context into predefined classification types based on the detected content, as taught in Zhang, in order to determine a more accurate classification and take one or more actions based on the classification of a content item. (See Zhang [0032])

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Paluri et al (US 2018/0189570), in view of Rose et al (US 2020/0099755).
Regarding Claim 20, Paluri teaches all aspects of the claimed invention as disclosed in Claim 17 above. Paluri further teaches wherein: the program instructions are executable to determine a comments semantic similarity between comments of subject video and comments of videos in a misleading video corpus in a same domain as the subject video; and the classifying the subject video further based on the comments semantic similarity ([0027], text-recognition module may be trained by machine learning to make a prediction about a video-content object based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), a prediction about a video-content object may comprise a context, a predicted future action, a predicted object, a predicted motion, or any other suitable prediction, a video-content object may be a video that is streamed live and information (e.g., likes, comments, shares, video content, etc.) may be received in an ongoing manner and the computer-vision platform may update a prediction based on this information).
Paluri fails to teach the thumbnail of the video is selectable by a user to play the subject video in a user interface.
In the same field of endeavor, Rose teaches the thumbnail of the video is selectable by a user to play the subject video in a user interface ([0247], main user interface or ‘home page’ of the interactive media player system when accessed by a device, for example by means of a web browser, the user interface is a web page comprising a plurality of page elements arranged on the (single) page including elements described as ‘drawers’ and others described as ‘components,’ the user interacts with the various page elements of the user interface in order to access the functionality of the interactive media player system and to consume media content, links to media content items are presented in the form of thumbnail images representative of the media item).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature vectors and other text and social content associated with the video, as taught in Paluri, to further include providing thumbnails as links to the videos, as taught in Rose, in order to provide convenient and more efficient access to user content (See Rose [0247-0248])

Allowable Subject Matter
Claims 4-6, 9, 14, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARGARET G MASTRODONATO whose telephone number is (571)270-7803. The examiner can normally be reached M-F 9:00-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on (571) 272-7904. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARGARET G MASTRODONATO/Primary Examiner, Art Unit 2641