DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
2.	Receipt of Applicant’s Amendment filed on 10/10/2022 is acknowledged.  The amendment includes the amending of claims 1, 3-4, and 19-20, the cancellation of claims 5-6, 8-10, and 12-13, and the addition of claim 21.
Claim Rejections - 35 USC § 112
3.	The rejections raised in the Office Action mailed on 07/13/2022 have been overcome by applicant’s amendment received on 10/10/2022.
Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
7.	Claims 1-4, 6, 11, 15-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (U.S. Patent 8,452,778), and further in view of Jarman et al. (U.S. PGPUB 2007/0186235).
8.	Regarding claims 1, 19, and 20, Song teaches a computer program product, method, and device comprising:
A)  computer executable instructions stored on a non-transitory computer readable medium that when executed by a processor instruct the processor to:  receive, using an artificial intelligence (AI) system, explicit labels associated with a first media (Column 4, lines 44-51, Column 5, lines 29-35, lines 59-63);
B)  train the AI system based on the explicit labels (Column 4, lines 44-51, Column 5, lines 29-35, lines 59-63);
C)  automatically label, using the AI system, a second media with one or more labels (Column 4, lines 44-51, Column 6, lines 43-47, Column 11, lines 17-27).
	The examiner notes that Song teaches “computer executable instructions stored on a non-transitory computer readable medium that when executed by a processor instruct the processor to:  receive, using an artificial intelligence (AI) system, explicit labels associated with a first media” as “The video hosting service 100 further comprises a classifier training subsystem 119 that trains accurate video classifiers for a predetermined set of categories, even in the absence of a large number of labeled videos to use as training examples. The trained classifiers can then be applied to a given video to determine which of the categories the video represents. The classifier training subsystem 119 is now described in greater detail” (Column 4, lines 44-51), “The classifier training subsystem 119 further comprises a labeled content repository 220 containing various content items previously authoritatively labeled by an entity as representing one or more categories from the category set 205. The content items, together with their labels, serve as training data for supervised learning algorithms that learn classifiers for the various categories of the category set 205. In one embodiment, the content items are authoritatively labeled by a human expert trained in the meaning and use of the category set 205” (Column 5, lines 29-35), “The labeled content repository 220 further comprises authoritatively labeled videos 224. In one embodiment, the authoritatively labeled videos 224 are a subset of the video repository 116, with the addition of labels manually added by expert users for the purposes of classifier training” (Column 5, lines 59-63).  The examiner further notes that learned trained classifier (i.e. an “AI system”) clearly receives manually labeled videos 224 (i.e. the claimed explicit labels associated with first media) for training purposes.  The examiner further notes that Song teaches “train the AI system based on the explicit labels” as “The video hosting service 100 further comprises a classifier training subsystem 119 that trains accurate video classifiers for a predetermined set of categories, even in the absence of a large number of labeled videos to use as training examples. The trained classifiers can then be applied to a given video to determine which of the categories the video represents. The classifier training subsystem 119 is now described in greater detail” (Column 4, lines 44-51), “The classifier training subsystem 119 further comprises a labeled content repository 220 containing various content items previously authoritatively labeled by an entity as representing one or more categories from the category set 205. The content items, together with their labels, serve as training data for supervised learning algorithms that learn classifiers for the various categories of the category set 205. In one embodiment, the content items are authoritatively labeled by a human expert trained in the meaning and use of the category set 205” (Column 5, lines 29-35), “The labeled content repository 220 further comprises authoritatively labeled videos 224. In one embodiment, the authoritatively labeled videos 224 are a subset of the video repository 116, with the addition of labels manually added by expert users for the purposes of classifier training” (Column 5, lines 59-63).  The examiner further notes that learned trained classifier (i.e. an “AI system”) clearly receives manually labeled videos 224 (i.e. the claimed explicit labels associated with first media) for training purposes.  The examiner further notes that Song teaches “automatically label, using the AI system, a second media with one or more labels” as “The video hosting service 100 further comprises a classifier training subsystem 119 that trains accurate video classifiers for a predetermined set of categories, even in the absence of a large number of labeled videos to use as training examples. The trained classifiers can then be applied to a given video to determine which of the categories the video represents. The classifier training subsystem 119 is now described in greater detail” (Column 4, lines 44-51), “The classifier training subsystem 119 further comprises a learning module 230 that analyzes the content items in the labeled content repository 220 and learns the set of category classifiers 240 that can be used to automatically categorize videos” (Column 6, lines 43-47), and “After the training process is complete, each category of the set of categories 205 has some classifier associated with it. In one embodiment, every category has an adapted classifier 242. In another embodiment, only those categories with a threshold number of videos labeled as representing that category have an adapted classifier, and the other categories have only the initial text-based classifier 241 learned from the authoritatively labeled text documents 222.  These classifiers may then be applied to categorize videos 117 from the video repository 116 that do not already have authoritatively applied category labels” (Column 11, lines 17-27).  The examiner further notes that the learned trained classifiers are then used to categorize (i.e. label) unlabeled videos (i.e. second media). 
	Song does not explicitly teach:
D)  during playback of the second media, filter the second media based on the one or more labels by:  detecting a portion of the second media labeled with a particular label of the one or more labels; and
E)  skipping the playback of the second media forward for a defined period of time to prevent playback of the portion of the second media labeled with the particular label.
	Jarman, however, teaches “during playback of the second media, filter the second media based on the one or more labels by:  detecting a portion of the second media labeled with a particular label of the one or more labels” as “he operations set forth in FIGS. 2A-2B (collectively "FIG. 2") include the same operations 10-110 set forth in FIG. 1. Additionally, FIG. 2 includes operations A-D. First, operation A provides a user with an input mechanism to choose types of content to filter. One example of an input mechanism in the form of an on-screen selection menu is set forth in U.S. patent application Ser. No. 11/104,924 titled "Apparatus, System, and Method for Filtering Objectionable Portions of Multimedia Presentation," filed on Apr. 12, 2005 (the '924 application), which is hereby incorporated by reference herein …the user could access a filter selection menu on a third computing platform, such as a PC. The filter settings are then made available to the STB for operation C (described below). For example, the filter settings could be entered on a computer and sent to the server or STB via a network connection therebetween. In any of the embodiments discussed herein, filter selections may be made before or after the selection of the video-on-demand presentation. Accordingly, in some implementations where all possible filter types may not be available for a certain movie, e.g., a particularly violent movie has all possible violence filter types, but the movie has no sex or nudity therefore there are no sex or nudity filter types, only a subset of filter types may be presented to the user for activation” (Paragraph 20), “through an on-screen menu, the user may activate a strong action violence filter type and a brutal/gory violence filter type. Subsequently, during presentation of the video-on-demand movie, audio, video, or both associated with scenes having strong action violence or brutal/gory violence will be suppressed, which may involve skipping the scene entirely, blurring some or all of the movie for period of time of the scene, cropping portions of the movie to remove image portions, and/or muting some or all of the audio” (Paragraph 21), “the server transmits a filter metadata file to the STB (operation B) and the STB receives and stores the metadata (operation C). It is possible to transmit filters to the STB at any time. Moreover, it is also possible to transmit more than one set of filters (i.e., filter files for more than one multimedia presentation). Thus, in some instances, the STB will be configured to determine whether filters for a given movie have already been loaded into the set-top-box. With respect to transmission of the metadata filter files from the content server, the metadata transmission may be done automatically with all VOD requests” (Paragraph 22), “the filter metadata file may include filter information as set forth in the '924 application. Generally, the filter information includes some indicia of a start time and end time of a portion of a multimedia presentation along with the type of content set forth between the start and end time. For example, between 5 minutes and 5 minutes and 20 seconds, a certain film may have strong action violence. The metadata will include information related or associated with the start time (5 minutes), the end time (5 min. 20 sec.) and strong action violence. If the user activates the strong action violence filter, when the playback reaches the 5 minute time of the movie, the next 20 seconds are suppressed. Finally, playback of the VOD movie may be filtered as a function of the user's filter settings and the metadata (operation D)” (Paragraph 23), and “the filtering logic takes place on the STB side. This means that the server will send the entire multimedia content, e.g., all of the data to play the movie "Field of Dreams," and specific visual and/or audible portions of the multimedia content will be suppressed or "filtered" from the presentation on the STB side. Another advantage of the method set forth in FIG. 2 is that the entire content may be stored on the STB (e.g. on a hard-drive or a recordable optical disc). This would allow subsequent playback where the user could choose to view the entire multimedia without any filtering, or the user could view the multimedia with different filter settings” (Paragraph 24), and “skipping the playback of the second media forward for a defined period of time to prevent playback of the portion of the second media labeled with the particular label” as “he operations set forth in FIGS. 2A-2B (collectively "FIG. 2") include the same operations 10-110 set forth in FIG. 1. Additionally, FIG. 2 includes operations A-D. First, operation A provides a user with an input mechanism to choose types of content to filter. One example of an input mechanism in the form of an on-screen selection menu is set forth in U.S. patent application Ser. No. 11/104,924 titled "Apparatus, System, and Method for Filtering Objectionable Portions of Multimedia Presentation," filed on Apr. 12, 2005 (the '924 application), which is hereby incorporated by reference herein …the user could access a filter selection menu on a third computing platform, such as a PC. The filter settings are then made available to the STB for operation C (described below). For example, the filter settings could be entered on a computer and sent to the server or STB via a network connection therebetween. In any of the embodiments discussed herein, filter selections may be made before or after the selection of the video-on-demand presentation. Accordingly, in some implementations where all possible filter types may not be available for a certain movie, e.g., a particularly violent movie has all possible violence filter types, but the movie has no sex or nudity therefore there are no sex or nudity filter types, only a subset of filter types may be presented to the user for activation” (Paragraph 20), “through an on-screen menu, the user may activate a strong action violence filter type and a brutal/gory violence filter type. Subsequently, during presentation of the video-on-demand movie, audio, video, or both associated with scenes having strong action violence or brutal/gory violence will be suppressed, which may involve skipping the scene entirely, blurring some or all of the movie for period of time of the scene, cropping portions of the movie to remove image portions, and/or muting some or all of the audio” (Paragraph 21), “the server transmits a filter metadata file to the STB (operation B) and the STB receives and stores the metadata (operation C). It is possible to transmit filters to the STB at any time. Moreover, it is also possible to transmit more than one set of filters (i.e., filter files for more than one multimedia presentation). Thus, in some instances, the STB will be configured to determine whether filters for a given movie have already been loaded into the set-top-box. With respect to transmission of the metadata filter files from the content server, the metadata transmission may be done automatically with all VOD requests” (Paragraph 22), “the filter metadata file may include filter information as set forth in the '924 application. Generally, the filter information includes some indicia of a start time and end time of a portion of a multimedia presentation along with the type of content set forth between the start and end time. For example, between 5 minutes and 5 minutes and 20 seconds, a certain film may have strong action violence. The metadata will include information related or associated with the start time (5 minutes), the end time (5 min. 20 sec.) and strong action violence. If the user activates the strong action violence filter, when the playback reaches the 5 minute time of the movie, the next 20 seconds are suppressed. Finally, playback of the VOD movie may be filtered as a function of the user's filter settings and the metadata (operation D)” (Paragraph 23), and “the filtering logic takes place on the STB side. This means that the server will send the entire multimedia content, e.g., all of the data to play the movie "Field of Dreams," and specific visual and/or audible portions of the multimedia content will be suppressed or "filtered" from the presentation on the STB side. Another advantage of the method set forth in FIG. 2 is that the entire content may be stored on the STB (e.g. on a hard-drive or a recordable optical disc). This would allow subsequent playback where the user could choose to view the entire multimedia without any filtering, or the user could view the multimedia with different filter settings” (Paragraph 24).
	The examiner further notes that the secondary reference of Jarman teaches the concept of detecting specific metadata (i.e. the claimed one or more labels) associated with movies (i.e. examples of second media) that is used to suppress (which includes skipping) portions of a movie in accordance with user designated filtering criteria during playback of that movie (See example skipping movie scene 00:05:00 to 00:05:20 with metadata (i.e. labeled) with strong violence after a user has activated a strong violence filter).  The combination would result in allowing for the skipping of labeled media in Song.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Jarman’s would have allowed Song’s to provide a method for individualized filtering of movies, as noted by Jarman (Paragraph 24).

	Regarding claim 2, Song further teaches a computer program product comprising:
A)  wherein the explicit labels include filtering criteria (Column 1, lines 19-26, Column 4, lines 44-51, Column 5, lines 29-35, lines 59-63).
	The examiner notes that Song teaches “wherein the explicit labels include filtering criteria” as “Video hosting systems, such as YOUTUBE or GOOGLE VIDEO, have become an increasingly popular way of sharing and viewing digital videos, with users contributing tens of millions of videos each year. Accurate categorization of a video is of great value in such systems, permitting users to search for videos corresponding to given categories, video hosting systems to more accurately match videos with relevant advertising, and the like” (Column 1, lines 19-26), “The video hosting service 100 further comprises a classifier training subsystem 119 that trains accurate video classifiers for a predetermined set of categories, even in the absence of a large number of labeled videos to use as training examples. The trained classifiers can then be applied to a given video to determine which of the categories the video represents. The classifier training subsystem 119 is now described in greater detail” (Column 4, lines 44-51), “The classifier training subsystem 119 further comprises a labeled content repository 220 containing various content items previously authoritatively labeled by an entity as representing one or more categories from the category set 205. The content items, together with their labels, serve as training data for supervised learning algorithms that learn classifiers for the various categories of the category set 205. In one embodiment, the content items are authoritatively labeled by a human expert trained in the meaning and use of the category set 205” (Column 5, lines 29-35), “The labeled content repository 220 further comprises authoritatively labeled videos 224. In one embodiment, the authoritatively labeled videos 224 are a subset of the video repository 116, with the addition of labels manually added by expert users for the purposes of classifier training” (Column 5, lines 59-63).  The examiner further notes that categories (i.e. labels) assigned by a human on videos teach the claimed undefined “filtering criteria” in the broadest reasonable interpretation as such categories are used to specifically “filter” which videos are returned to querying users and/or matching ads.

	Regarding claim 3, Song further teaches a computer program product comprising:
A)  wherein automatically labelling the second media includes modifying the second media (Column 11, lines 57-67-Column 12, lines 1-4).
	The examiner notes that Song teaches “wherein automatically labelling the second media includes modifying the second media” as “The category labels assigned to a video can then be used in various ways by users of the video hosting service 100. For example, the labels of the determined categories can be viewed along with other metadata of a video, e.g., appearing as video tags for the video when viewed in a web-based user interface of a video sharing site. The labels can also be used to browse videos of a particular type, such as showing all videos that were assigned labels of a chosen category, such as "Tennis." Similarly, they can be integrated into more sophisticated search functionality as one input, such as a search for videos of a particular category and having other additional attributes, such as particular text in their titles, a creation date within a given timeframe, or the like. The possible uses of the assigned category label are numerous, and are not limited to these specific examples” (Column 11, lines 57-67-Column 12, lines 1-4).  The examiner further notes that assigned labels/tags to videos teaches the claimed modification of the second media in the broadest reasonable interpretation.

Regarding claim 4, Song does not explicitly teach a computer program product comprising:
A)  wherein automatically labelling the second media includes masking at least a portion of the second media.
	Jarman, however, teaches “wherein automatically labelling the second media includes masking at least a portion of the second media” as “he operations set forth in FIGS. 2A-2B (collectively "FIG. 2") include the same operations 10-110 set forth in FIG. 1. Additionally, FIG. 2 includes operations A-D. First, operation A provides a user with an input mechanism to choose types of content to filter. One example of an input mechanism in the form of an on-screen selection menu is set forth in U.S. patent application Ser. No. 11/104,924 titled "Apparatus, System, and Method for Filtering Objectionable Portions of Multimedia Presentation," filed on Apr. 12, 2005 (the '924 application), which is hereby incorporated by reference herein …the user could access a filter selection menu on a third computing platform, such as a PC. The filter settings are then made available to the STB for operation C (described below). For example, the filter settings could be entered on a computer and sent to the server or STB via a network connection therebetween. In any of the embodiments discussed herein, filter selections may be made before or after the selection of the video-on-demand presentation. Accordingly, in some implementations where all possible filter types may not be available for a certain movie, e.g., a particularly violent movie has all possible violence filter types, but the movie has no sex or nudity therefore there are no sex or nudity filter types, only a subset of filter types may be presented to the user for activation” (Paragraph 20), “through an on-screen menu, the user may activate a strong action violence filter type and a brutal/gory violence filter type. Subsequently, during presentation of the video-on-demand movie, audio, video, or both associated with scenes having strong action violence or brutal/gory violence will be suppressed, which may involve skipping the scene entirely, blurring some or all of the movie for period of time of the scene, cropping portions of the movie to remove image portions, and/or muting some or all of the audio” (Paragraph 21), “the server transmits a filter metadata file to the STB (operation B) and the STB receives and stores the metadata (operation C). It is possible to transmit filters to the STB at any time. Moreover, it is also possible to transmit more than one set of filters (i.e., filter files for more than one multimedia presentation). Thus, in some instances, the STB will be configured to determine whether filters for a given movie have already been loaded into the set-top-box. With respect to transmission of the metadata filter files from the content server, the metadata transmission may be done automatically with all VOD requests” (Paragraph 22), “the filter metadata file may include filter information as set forth in the '924 application. Generally, the filter information includes some indicia of a start time and end time of a portion of a multimedia presentation along with the type of content set forth between the start and end time. For example, between 5 minutes and 5 minutes and 20 seconds, a certain film may have strong action violence. The metadata will include information related or associated with the start time (5 minutes), the end time (5 min. 20 sec.) and strong action violence. If the user activates the strong action violence filter, when the playback reaches the 5 minute time of the movie, the next 20 seconds are suppressed. Finally, playback of the VOD movie may be filtered as a function of the user's filter settings and the metadata (operation D)” (Paragraph 23), and “the filtering logic takes place on the STB side. This means that the server will send the entire multimedia content, e.g., all of the data to play the movie "Field of Dreams," and specific visual and/or audible portions of the multimedia content will be suppressed or "filtered" from the presentation on the STB side. Another advantage of the method set forth in FIG. 2 is that the entire content may be stored on the STB (e.g. on a hard-drive or a recordable optical disc). This would allow subsequent playback where the user could choose to view the entire multimedia without any filtering, or the user could view the multimedia with different filter settings” (Paragraph 24).
	The examiner further notes that the secondary reference of Jarman teaches the concept of suppressing (which includes blurring (i.e. masking)) portions of a movie in accordance with user designated filtering criteria during playback of that movie.  The combination would result in allowing for the masking of labeled media in Song.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Jarman’s would have allowed Song’s to provide a method for individualized filtering of movies, as noted by Jarman (Paragraph 24).

Regarding claim 7, Song further teaches a computer program product comprising:
A)  wherein the explicit labels include preferences inputted by a user (Column 4, lines 44-51, Column 5, lines 29-35, lines 59-63).
	The examiner notes that Song teaches “wherein the explicit labels include preferences inputted by a user” as “The video hosting service 100 further comprises a classifier training subsystem 119 that trains accurate video classifiers for a predetermined set of categories, even in the absence of a large number of labeled videos to use as training examples. The trained classifiers can then be applied to a given video to determine which of the categories the video represents. The classifier training subsystem 119 is now described in greater detail” (Column 4, lines 44-51), “The classifier training subsystem 119 further comprises a labeled content repository 220 containing various content items previously authoritatively labeled by an entity as representing one or more categories from the category set 205. The content items, together with their labels, serve as training data for supervised learning algorithms that learn classifiers for the various categories of the category set 205. In one embodiment, the content items are authoritatively labeled by a human expert trained in the meaning and use of the category set 205” (Column 5, lines 29-35), “The labeled content repository 220 further comprises authoritatively labeled videos 224. In one embodiment, the authoritatively labeled videos 224 are a subset of the video repository 116, with the addition of labels manually added by expert users for the purposes of classifier training” (Column 5, lines 59-63).  The examiner further notes that learned trained classifier (i.e. an “AI system”) clearly receives manually labeled videos 224.  Such videos with labels teach the claimed “preferences” in the broadest reasonable interpretation because manually assigned labels by a human are that’s human’s preferences (which are undefined in the claim).

Regarding claim 11, Song does not explicitly teach a computer program product comprising:
A)  wherein the second media includes a modified version of at least a portion of the first media.
	Jarman, however, teaches “wherein the second media includes a modified version of at least a portion of the first media” as “he operations set forth in FIGS. 2A-2B (collectively "FIG. 2") include the same operations 10-110 set forth in FIG. 1. Additionally, FIG. 2 includes operations A-D. First, operation A provides a user with an input mechanism to choose types of content to filter. One example of an input mechanism in the form of an on-screen selection menu is set forth in U.S. patent application Ser. No. 11/104,924 titled "Apparatus, System, and Method for Filtering Objectionable Portions of Multimedia Presentation," filed on Apr. 12, 2005 (the '924 application), which is hereby incorporated by reference herein …the user could access a filter selection menu on a third computing platform, such as a PC. The filter settings are then made available to the STB for operation C (described below). For example, the filter settings could be entered on a computer and sent to the server or STB via a network connection therebetween. In any of the embodiments discussed herein, filter selections may be made before or after the selection of the video-on-demand presentation. Accordingly, in some implementations where all possible filter types may not be available for a certain movie, e.g., a particularly violent movie has all possible violence filter types, but the movie has no sex or nudity therefore there are no sex or nudity filter types, only a subset of filter types may be presented to the user for activation” (Paragraph 20), “through an on-screen menu, the user may activate a strong action violence filter type and a brutal/gory violence filter type. Subsequently, during presentation of the video-on-demand movie, audio, video, or both associated with scenes having strong action violence or brutal/gory violence will be suppressed, which may involve skipping the scene entirely, blurring some or all of the movie for period of time of the scene, cropping portions of the movie to remove image portions, and/or muting some or all of the audio” (Paragraph 21), “the server transmits a filter metadata file to the STB (operation B) and the STB receives and stores the metadata (operation C). It is possible to transmit filters to the STB at any time. Moreover, it is also possible to transmit more than one set of filters (i.e., filter files for more than one multimedia presentation). Thus, in some instances, the STB will be configured to determine whether filters for a given movie have already been loaded into the set-top-box. With respect to transmission of the metadata filter files from the content server, the metadata transmission may be done automatically with all VOD requests” (Paragraph 22), “the filter metadata file may include filter information as set forth in the '924 application. Generally, the filter information includes some indicia of a start time and end time of a portion of a multimedia presentation along with the type of content set forth between the start and end time. For example, between 5 minutes and 5 minutes and 20 seconds, a certain film may have strong action violence. The metadata will include information related or associated with the start time (5 minutes), the end time (5 min. 20 sec.) and strong action violence. If the user activates the strong action violence filter, when the playback reaches the 5 minute time of the movie, the next 20 seconds are suppressed. Finally, playback of the VOD movie may be filtered as a function of the user's filter settings and the metadata (operation D)” (Paragraph 23), and “the filtering logic takes place on the STB side. This means that the server will send the entire multimedia content, e.g., all of the data to play the movie "Field of Dreams," and specific visual and/or audible portions of the multimedia content will be suppressed or "filtered" from the presentation on the STB side. Another advantage of the method set forth in FIG. 2 is that the entire content may be stored on the STB (e.g. on a hard-drive or a recordable optical disc). This would allow subsequent playback where the user could choose to view the entire multimedia without any filtering, or the user could view the multimedia with different filter settings” (Paragraph 24).
	The examiner further notes that the secondary reference of Jarman teaches the concept of suppressing (which includes blurring (i.e. modifying)) portions of a movie in accordance with user designated filtering criteria during playback of that movie.  Such a modified movie is clearly different from an unmodified movie (See example of an uncensored version of Field of Dreams and a censored version of Field of Dreams that is censored in accordance with user-specified criteria).   
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Jarman’s would have allowed Song’s to provide a method for individualized filtering of movies, as noted by Jarman (Paragraph 24).

Regarding claim 15, Song further teaches a computer program product comprising:
A)  wherein the second media includes a single video file (Column 11, lines 57-67-Column 12, lines 1-4).
	The examiner notes that Song teaches “wherein the second media includes a single video file” as “The video hosting service 100 further comprises a classifier training subsystem 119 that trains accurate video classifiers for a predetermined set of categories, even in the absence of a large number of labeled videos to use as training examples. The trained classifiers can then be applied to a given video to determine which of the categories the video represents. The classifier training subsystem 119 is now described in greater detail” (Column 4, lines 44-51).  The examiner further notes that an automatically assigned label to a given video (i.e. the claimed second media) via a trained classifier entails that such a given video is a single file.

	Regarding claim 16, Song further teaches a computer program product comprising:
A)  wherein the computer program product is configured to create additional data, the additional data being used to further train the AI system (Column 10, lines 26-34).
	The examiner notes that Song teaches “wherein the computer program product is configured to create additional data, the additional data being used to further train the AI system” as “In one embodiment, the classifier training subsystem 119 produces 440, for each video, a hybrid feature vector comprising both the content feature vector extracted from the video and the score vector obtained from the video. In one embodiment, the score vector is appended to the content feature vector to produce the hybrid feature vector.  With the hybrid feature vectors produced for the various authoritatively labeled videos 224, adapted classifiers 242 are then trained 450 for categories of the category set 205” (Column 10, lines 26-34).  The examiner further notes that generated hybrid vectors (i.e. “additional data” in the broadest reasonable interpretation) is used for training purposes.

	Regarding claim 17, Song further teaches a computer program product comprising:
A)  wherein the computer program product is configured to receive additional explicit labels associated with a third media, the additional explicit labels used to further train the AI system (Column 4, lines 44-51, Column 5, lines 29-35, lines 59-63).
	The examiner notes that Song teaches “wherein the computer program product is configured to receive additional explicit labels associated with a third media, the additional explicit labels used to further train the AI system” as “The video hosting service 100 further comprises a classifier training subsystem 119 that trains accurate video classifiers for a predetermined set of categories, even in the absence of a large number of labeled videos to use as training examples. The trained classifiers can then be applied to a given video to determine which of the categories the video represents. The classifier training subsystem 119 is now described in greater detail” (Column 4, lines 44-51), “The classifier training subsystem 119 further comprises a labeled content repository 220 containing various content items previously authoritatively labeled by an entity as representing one or more categories from the category set 205. The content items, together with their labels, serve as training data for supervised learning algorithms that learn classifiers for the various categories of the category set 205. In one embodiment, the content items are authoritatively labeled by a human expert trained in the meaning and use of the category set 205” (Column 5, lines 29-35), “The labeled content repository 220 further comprises authoritatively labeled videos 224. In one embodiment, the authoritatively labeled videos 224 are a subset of the video repository 116, with the addition of labels manually added by expert users for the purposes of classifier training” (Column 5, lines 59-63).  The examiner further notes that learned trained classifier (i.e. an “AI system”) clearly receives multiple manually labeled videos 224 (i.e. a third media with explicit label(s)) for training purposes.
9.	Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (U.S. Patent 8,452,778), and further in view of Jarman et al. (U.S. PGPUB 2007/0186235) as applied to claims 1-4, 6, 11, 15-17, and 19-20 above, and further in view of Hunter et al. (U.S. PGPUB 2005/0111824).
10.	Regarding claim 14, Song and Jarman do not explicitly teach a computer program product comprising:
A)  wherein the second media includes a grouping of more than one video files.
	Hunter, however, teaches “wherein the second media includes a grouping of more than one video files” as “Display labeling instructions 506 include instructions for determining a label for the group of video shots being displayed. The display labeling instructions 506 uniquely label each group of video shots with a time and date label that makes the particular group easily identifiable to the user when viewing. The time and date label for a group video shots is determined from the determined time span of that particular group of video shots. For example, the following list can be used to determine a label for a group of video shots” (Paragraph 73).
	The examiner further notes that the secondary reference of Hunter teaches the concept of labeling a group of videos.  The combination would result in expanding Song to allow for labeling of groups of videos.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Hunter’s would have allowed Song’s and Jarman’s to provide a method for optimally clustering videos, as noted by Hunter (Paragraph 9).
11.	Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (U.S. Patent 8,452,778), and further in view of Jarman et al. (U.S. PGPUB 2007/0186235) as applied to claims 1-4, 6, 11, 15-17, and 19-20 above, and further in view of Xing et al. (U.S. PGPUB 2020/0068038).
12.	Regarding claim 18, Song and Jarman do not explicitly teach a computer program product comprising:
A)  wherein the additional explicit labels originate from a cloud based source, including at least one of a media provider, a service provider, or a third party user.
	Xing, however, teaches “wherein the additional explicit labels originate from a cloud based source, including at least one of a media provider, a service provider, or a third party user” as “In some embodiments, a cloud-based data management service can be leveraged to develop and support large-scale distributed applications. Consider, for instance, a video storage and enhancement application that is executed in a cloud computing environment via the cloud-based data management service. This application may be configured to store videos and video metadata in the storage of the cloud computing environment, and allow users to tag videos with various attributes in real time, as the videos are being submitted” (Paragraph 60).
	The examiner further notes that the secondary reference of Xing teaches the concept of cloud-based sources providing tags (i.e. labels).  Such sources include the users (i.e. a “media provider” in the broadest reasonable interpretation) providing videos via the cloud. 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Xing’s would have allowed Song’s and Jarman’s to provide a method for simplifying storage management, as noted by Xing (Paragraph 4).
13.	Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (U.S. Patent 8,452,778), and further in view of Jarman et al. (U.S. PGPUB 2007/0186235) as applied to claims 1-4, 6, 11, 15-17, and 19-20 above, and further in view of Harmon et al. (U.S. Patent 9,363,561).
14.	Regarding claim 21, Song does not explicitly teach a computer program product comprising:
A)  wherein the processor is further instructed to: receive a user selection of one or more categories of material to be filtered from the second media. 
	Jarman, however, teaches “wherein the processor is further instructed to: receive a user selection of one or more categories of material to be filtered from the second media” as “he operations set forth in FIGS. 2A-2B (collectively "FIG. 2") include the same operations 10-110 set forth in FIG. 1. Additionally, FIG. 2 includes operations A-D. First, operation A provides a user with an input mechanism to choose types of content to filter. One example of an input mechanism in the form of an on-screen selection menu is set forth in U.S. patent application Ser. No. 11/104,924 titled "Apparatus, System, and Method for Filtering Objectionable Portions of Multimedia Presentation," filed on Apr. 12, 2005 (the '924 application), which is hereby incorporated by reference herein …the user could access a filter selection menu on a third computing platform, such as a PC. The filter settings are then made available to the STB for operation C (described below). For example, the filter settings could be entered on a computer and sent to the server or STB via a network connection therebetween. In any of the embodiments discussed herein, filter selections may be made before or after the selection of the video-on-demand presentation. Accordingly, in some implementations where all possible filter types may not be available for a certain movie, e.g., a particularly violent movie has all possible violence filter types, but the movie has no sex or nudity therefore there are no sex or nudity filter types, only a subset of filter types may be presented to the user for activation” (Paragraph 20), “through an on-screen menu, the user may activate a strong action violence filter type and a brutal/gory violence filter type. Subsequently, during presentation of the video-on-demand movie, audio, video, or both associated with scenes having strong action violence or brutal/gory violence will be suppressed, which may involve skipping the scene entirely, blurring some or all of the movie for period of time of the scene, cropping portions of the movie to remove image portions, and/or muting some or all of the audio” (Paragraph 21), “the server transmits a filter metadata file to the STB (operation B) and the STB receives and stores the metadata (operation C). It is possible to transmit filters to the STB at any time. Moreover, it is also possible to transmit more than one set of filters (i.e., filter files for more than one multimedia presentation). Thus, in some instances, the STB will be configured to determine whether filters for a given movie have already been loaded into the set-top-box. With respect to transmission of the metadata filter files from the content server, the metadata transmission may be done automatically with all VOD requests” (Paragraph 22), “the filter metadata file may include filter information as set forth in the '924 application. Generally, the filter information includes some indicia of a start time and end time of a portion of a multimedia presentation along with the type of content set forth between the start and end time. For example, between 5 minutes and 5 minutes and 20 seconds, a certain film may have strong action violence. The metadata will include information related or associated with the start time (5 minutes), the end time (5 min. 20 sec.) and strong action violence. If the user activates the strong action violence filter, when the playback reaches the 5 minute time of the movie, the next 20 seconds are suppressed. Finally, playback of the VOD movie may be filtered as a function of the user's filter settings and the metadata (operation D)” (Paragraph 23), and “the filtering logic takes place on the STB side. This means that the server will send the entire multimedia content, e.g., all of the data to play the movie "Field of Dreams," and specific visual and/or audible portions of the multimedia content will be suppressed or "filtered" from the presentation on the STB side. Another advantage of the method set forth in FIG. 2 is that the entire content may be stored on the STB (e.g. on a hard-drive or a recordable optical disc). This would allow subsequent playback where the user could choose to view the entire multimedia without any filtering, or the user could view the multimedia with different filter settings” (Paragraph 24).
	The examiner further notes that the secondary reference of Jarman teaches the concept of suppressing (which includes blurring (i.e. masking)) portions of a movie in accordance with user designated filtering criteria (See example of strong action violence filter) during playback of that movie.  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Jarman’s would have allowed Song’s to provide a method for individualized filtering of movies, as noted by Jarman (Paragraph 24).
	Song and Jarman do not explicitly teach:
B)  wherein the second media is labeled in accordance with the user selection.
	Harmon, however, teaches “wherein the second media is labeled in accordance with the user selection” as “A content map is a mapping of content, e.g., of a movie, identifying some or all parts of the content that may be filtered. For example, the content map may identify time periods during the movie which may be filtered for language, e.g., the “sh--” word at minute:second marker 45:39.5-45:40; or time periods during the movie which may be filtered for nudity/sex, e.g., a sex scene at minute:second marker 24:21-27:35; or time periods during the movie which be filtered for violence, e.g., a violent scene at minute:second marker 99:15-99:23 in which a man is graphically decapitated. In general, a content map is simply a mapping that identifies segments and characteristics of filterable content. A content map entry may include identification of the temporal (e.g., minute markers during the movie), spatial (e.g., area of display to be cut, cropped, kept, blurred, or otherwise filtered), audible (e.g., channels or other content aspects containing filterable content) dimensions of filterable content in the movie (or other type of content), or other characteristics of a particular filterable element” (Column 5, lines 9-27) and “A content map may be generated in a variety of ways. In one embodiment, a human filtering agent may watch the movie and manually create, such as by writing down or entering into computer software, the time segments of the movie which contain potentially filterable content. The human filtering agent may also assign a filter category from filter categories 400 to each time segment containing potentially filterable content. Although not depicted in FIG. 5, it is possible that more than one filter category could be assigned to one segment, e.g., violence and vulgarity could both be assigned to a segment showing a graphic decapitation while the decapitator is swearing” (Column 6, lines 14-25).
	The examiner further notes that although the secondary reference of Jarman teaches that a user can select a filter with respect to a movie, there is no explicit teaching that such filter selections are then used to label the movie.  Nevertheless, the secondary reference of Harmon teaches the concept of a user manually designating multiple filter categories via a content map that is used to label a corresponding movie for censorship purposes.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Harmon’s would have allowed Song’s and Jarman’s to provide a method for improving the convenience of filtering movies, as noted by Harmon (Column 1, lines 47-54).
Response to Arguments
15.	Applicant’s arguments with respect to claims 1-4, 6, 11, and 14-21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument (See newly applied art of Jarman and Harmon).
Conclusion
16.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. PGPUB 2014/0207450 issued to LaVoie et al. on 24 July 2014.  The subject matter disclosed therein is pertinent to that of claims 1-4, 6, 11, and 14-21 (e.g., methods censor media).
Article entitled “Explicit Content Detection in Music Lyrics Using Machine Learning”, by Chin et al., dated 2018.  The subject matter disclosed therein is pertinent to that of claims 1-4, 6, 11, and 14-21 (e.g., methods censor media).
17.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Contact Information
18.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mahesh Dwivedi whose telephone number is (571) 272-2731.  The examiner can normally be reached on Monday to Friday 8:20 am – 4:40 pm.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached (571) 272-4034.  The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.
	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


Mahesh Dwivedi
Primary Examiner
Art Unit 2168

October 12, 2022
/MAHESH H DWIVEDI/Primary Examiner, Art Unit 2168