DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Islamic Republic of Pakistan on 06/22/2020. Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 12-13, 15-17, and 19-20 rejected under 35 U.S.C. 102(a)(1) & 102(a)(2) as being anticipated by Paluri et al (US 2018/0189570).
Regarding Claim 1, Paluri teaches a method (Figs. 4-5), comprising: training, by a computing device, a model using a video corpus ([0027-0028], a video understanding platform may be trained by machine learning to make a prediction about a video-content object based on one or more of: frames of the video-content object, audio of the video-content object, and text associated with the video-content object, Fig. 4, video understanding engine 400 may comprise a video-recognition module 410, a text-recognition module 420, and an audio-recognition module 430, video-recognition module 410 may be trained by machine learning to receive a feature vector representing a video-content object based on one or more frames of the video-content object and output a prediction about the video-content object, text-recognition module 420 may be trained by machine learning to receive a feature vector representing a video-content object based on text associated with the video-content object and output a prediction about the video-content object, audio-recognition module 430 may be trained by machine learning to receive a feature vector representing a video-content object based on one or more portions of audio of the video-content object and output a prediction about the video-content object, this disclosure contemplates any suitable video understanding engine); 
obtaining, by the computing device, a subject video from a content server ([0043], Fig. 5, determining a context of a video-content object, the social-networking system 160 accessing video-content object); 
generating, by the computing device, respective feature vectors ([0043], Fig. 5, at step 510 social-networking system 160 may access a first feature vector representing a video-content object corresponding to a node in a social graph of a social-networking system, wherein: the video-content object comprises frames and audio and is associated with text, the first feature vector is based on one or more of the frames of the video-content object, step 520, social-networking system 160 may access a second feature vector representing the video-content object, wherein the second feature vector is based on at least some of the text, step 530, social-networking system 160 may access a third feature 
determining, by the computing device, first semantic similarities between ones of the feature vectors ([0042], social-networking system 160 may, for each identified objects, access a feature vector representing the identified object and map objects to feature vectors by feature extraction, or access a cached feature vector for an object that has been previously mapped, social-networking system 160 may rank each identified object based on a similarity metric between the feature vector representing the video-content object and the feature vector representing the identified object, the similarity metric may be a cosine similarity between the feature vector representing the video-content object and the feature vector representing the identified object); 
determining, by the computing device, a second semantic similarity between the title of subject video and titles of videos in a misleading video corpus in a same domain as the subject video ([0039], 
determining, by the computing device, a third semantic similarity between comments of the subject video and comments of videos in the misleading video corpus in the same domain as the subject video ([0027], text-recognition module may be trained by machine learning to make a prediction about a video-content object based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), a prediction about a video-content object may comprise a context, a predicted future action, a predicted object, a predicted motion, or any other suitable prediction, a video-content object may be a video that is streamed live and information (e.g., likes, comments, shares, video content, etc.) may be received in an ongoing manner and the computer-vision platform may update a prediction based on this information); 
classifying, by the computing device, the subject video using the model and based on the first semantic similarities, the second semantic similarity, and the third semantic similarity ([0043]. Fig. 5, step 540 social-networking system 160 may determine a fourth feature vector representing the video-content object, wherein the fourth feature vector is based on a combination of the first, second, and 
outputting, by the computing device, the classification of the subject video to a user ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening material, or material that violates copyright law).
Regarding Claim 2,
Regarding Claim 12, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri further teaches wherein the classifying the subject video comprises classifying the subject video as one of: misleading; potentially misleading; and non-misleading ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening material, or material that violates copyright law).
Regarding Claim 13, Paluri teaches a computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media ([0044-0053], Fig. 6, computer system 600), the program instructions executable (Figs. 4-5) to: obtain a subject video from a content server ([0043], Fig. 5, determining a context of a video-content object, the social-networking system 160 accessing video-content object);
generate respective feature vectors ([0043], Fig. 5, at step 510 social-networking system 160 may access a first feature vector representing a video-content object corresponding to a node in a social graph of a social-networking system, wherein: the video-content object comprises frames and audio and is associated with text, the first feature vector is based on one or more of the frames of the video-content object, step 520, social-networking system 160 may access a second feature vector representing the video-content object, wherein the second feature vector is based on at least some of the text, step 530, social-networking system 160 may access a third feature vector representing the video-content object, wherein the third feature vector is based on one or more portions of the audio) of a title, a thumbnail, a description, and a content of the subject video ([0027], may be trained by machine learning to make a prediction about a video-content object based on an analysis of one or more frames (e.g., a 
determine semantic similarities between ones of the feature vectors ([0042], social-networking system 160 may, for each identified objects, access a feature vector representing the identified object and map objects to feature vectors by feature extraction, or access a cached feature vector for an object that has been previously mapped, social-networking system 160 may rank each identified object based on a similarity metric between the feature vector representing the video-content object and the feature vector representing the identified object, the similarity metric may be a cosine similarity between the feature vector representing the video-content object and the feature vector representing the identified object); 
classify the subject video based on a weighted sum of the semantic similarities ([0043]. Fig. 5, step 540 social-networking system 160 may determine a fourth feature vector representing the video-content object, wherein the fourth feature vector is based on a combination of the first, second, and third feature vectors, at step 550 social-networking system 160 may determine a context of the video-content object based on the fourth feature vector and social-graph information based at least in part on one or more nodes or edges connected to the node corresponding to the video-content object, [0036-
output the classification of the subject video to a user ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening, or material that violates copyright law).
Regarding Claim 15,
Regarding Claim 16, Paluri teaches all aspects of the claimed invention as disclosed in Claim 13 above. Paluri further teaches wherein: the program instructions are executable to determine a comments semantic similarity between comments of subject video and comments of videos in a misleading video corpus in a same domain as the subject video; and the classifying the subject video further based on the comments semantic similarity ([0027], text-recognition module may be trained by machine learning to make a prediction about a video-content object based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), a prediction about a video-content object may comprise a context, a predicted future action, a predicted object, a predicted motion, or any other suitable prediction, a video-content object may be a video that is streamed live and information (e.g., likes, comments, shares, video content, etc.) may be received in an ongoing manner and the computer-vision platform may update a prediction based on this information).
Regarding Claim 17, Paluri teaches a system comprising: a processor, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media ([0044-0053], Fig. 6, computer system 600), the program instructions executable (Figs. 4-5) to: obtain a subject video from a content server ([0043], Fig. 5, determining a context of a video-content object, the social-networking system 160 accessing video-content object); 
generate respective feature vectors ([0043], Fig. 5, at step 510 social-networking system 160 may access a first feature vector representing a video-content object corresponding to a node in a social graph of a social-networking system, wherein: the video-content object comprises frames and audio and is associated with text, the first feature vector is based on one or more of the frames of the video-content object, step 520, social-networking system 160 may access a second feature vector representing 
determine semantic similarities between ones of the feature vectors ([0042], social-networking system 160 may, for each identified objects, access a feature vector representing the identified object and map objects to feature vectors by feature extraction, or access a cached feature vector for an object that has been previously mapped, social-networking system 160 may rank each identified object based on a similarity metric between the feature vector representing the video-content object and the feature vector representing the identified object, similarity metric may be a cosine similarity between the feature vector representing the video-content object and the feature vector representing the identified object); 

output the classification of the subject video to a user ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content (e.g., a pyramid scheme), harmful or dangerous content (e.g., encourages others to do harmful activities), threatening, or material that violates copyright law).
Regarding Claim 19, Paluri teaches all aspects of the claimed invention as disclosed in Claim 17 above. Paluri further teaches wherein: the program instructions are executable to determine a title semantic similarity between the title of subject video and titles of videos in a misleading video corpus in a same domain as the subject video; and the classifying the subject video further based on the title semantic similarity ([0039], determining the context of the video-content object may comprise determining that the video-content object is inappropriate, fusion module 440 may output a prediction that a video-content object depicts nudity or sexual content, violent or graphic content, hateful content (e.g., promotes or condones violence against individuals or groups), fraudulent or misleading content 
Regarding Claim 20, Paluri teaches all aspects of the claimed invention as disclosed in Claim 17 above. Paluri further teaches wherein: the program instructions are executable to determine a comments semantic similarity between comments of subject video and comments of videos in a misleading video corpus in a same domain as the subject video; and the classifying the subject video further based on the comments semantic similarity ([0027], text-recognition module may be trained by machine learning to make a prediction about a video-content object based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), a prediction about a video-content object may comprise a context, a predicted future action, a predicted object, a predicted motion, or any other suitable prediction, a video-content object may be a video that is streamed live and information (e.g., likes, comments, shares, video content, etc.) may be received in an ongoing manner and the computer-vision platform may update a prediction based on this information).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Paluri et al (US 2018/0189570), in view of Kapoor et al (US 2018/0293278).
Regarding Claim 3, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. While Paluri teaches classification of video-content ([0039, 0043]), Paluri fails to teach determining an average watch time of the subject content, wherein the classifying the subject content is based on the average watch time.
In the same field of endeavor, Kapoor teaches determining an average watch time of the subject content, wherein the classifying the subject content is based on the average watch time ([0056], length of time the comment has been viewed may comprise an average time over a specified period that users spend viewing the comment (e.g., an average of 30 seconds), the comment relevance system 216 may interpret the length of time the comment has been viewed as having a direct relationship to the level of relevancy of the comment).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature See Kapoor [0013, 0060])

Claims 7-8 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Paluri et al (US 2018/0189570), in view of Zhang et al (US 2019/0347355).
Regarding Claim 7, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri further teaches where generating the respective feature vectors comprises: extracting entities from each of the title, the thumbnail, the description, and the content of the subject video ([0027], may be trained by machine learning to make a prediction about a video-content object based on an analysis of one or more frames (e.g., a still image) of the video-content object, based on an analysis of part or all of the audio of a video-content object (e.g., speech identification, language identification, sound identification, source separation, etc.), based on text associated with the video-content object (e.g., posts or comments associated with a video-content object posted on an online social network, text metadata associated with the video-content object, topic classification information associated with the video-content object, intent understanding information associated with the video-content object, etc.), [0036], feature vector based on one or more frames of the video-content object may be based on recognizing objects depicted in the frames, feature vector based on text associated with the video-content object, such as posts on an online social network that include the text or the title for the video-content object, a feature vector representing the video-content object based on a combination of the inputted feature vectors).
Paluri fails to teach wherein: the video corpus comprises videos classified using predefined classes; and mapping the extracted entities to one or more of the predefined classes.

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature vectors and other text and social content associated with the video, as taught in Paluri, to further include mapping the determine context into predefined classification types based on the detected content, as taught in Zhang, in order to determine a more accurate classification and take one or more actions based on the classification of a content item. (See Zhang [0032])
Regarding Claim 8, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri fails to teach wherein the model is one of plural different models that each have different combinations of inputs, and further comprising selecting the model from the plural different models based on inputs available for the subject video.
In the same field of endeavor, Zhang teaches wherein the model is one of plural different models that each have different combinations of inputs, and further comprising selecting the model from the plural different models based on inputs available for the subject video ([0030], content item classification module 102 can make an initial determination of whether the content item falls within the classification, the initial determination can be based on non-social signals associated with the content 
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature vectors and other text and social content associated with the video, as taught in Paluri, to further include applying a multi-stage classification process using different model, at different times, based on different input data, when determining the classification of the content, as taught in Zhang, in order to determine a more accurate classification for a content item based on different data available over time and take one or more actions based on the classification. (See Zhang [0032])
Regarding Claim 10, Paluri, as modified by Zhang, teaches all aspects of the claimed invention as disclosed in Claim 8 above. Zhang further teaches re-classifying the subject video at a later time using a different one of the plural different models ([0030], content item classification module 102 can make an initial determination of whether the content item falls within the classification, the initial determination 
Regarding Claim 11, Paluri teaches all aspects of the claimed invention as disclosed in Claim 1 above. Paluri fails to teach updating the video corpus to include the subject video and the classification of the subject video; and re-training the model using the updated video corpus.
In the same field of endeavor, Zhang teaches updating the video corpus to include the subject video and the classification of the subject video; and re-training the model using the updated video corpus ([0040, 0043, 0050], machine learning models are initially trained with labeled data relating to content items, the classification modules can retrain the machine learning models based on new or updated training data).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification of video context based on similarity of feature See Zhang [0032])

Allowable Subject Matter
Claims 4-6, 9, 14, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARGARET G MASTRODONATO whose telephone number is (571)270-7803. The examiner can normally be reached M-F 9:00-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on (571) 272-7904. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/MARGARET G MASTRODONATO/Primary Examiner, Art Unit 2641