Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This Office Action is in response to the Amendment After Non-Final Rejection filed 10/26/2022.  Claims 1-25 are pending and have been examined.
The information disclosure statement (IDS) submitted on 10/26/2022 was considered by the examiner.
Response to Arguments
Applicant’s arguments with respect to claims 1-25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6, 7, 9, 10, 12, 13, 15-17 and 22-25 are rejected under 35 U.S.C. 103 as being unpatentable over Polak et al. (US 2018/0032845), herein Polak, in view of Borel et al. (US 2017/0078767), herein Borel.
Consider claim 1, Polak clearly teaches a method (Fig. 1) comprising: 

accessing, by a hardware processor ([0064]), content data for a current media asset; (Video stream 250, [0072])

determining, by the hardware processor, a set of events within the content data by scanning the content data for events that relate to at least one event classification, each event in the set of events comprising at least one of a visual content element, a textual content feature, or an audio content feature from the content data being presented at a timestamp of the current media asset, (Fig. 3: Modalities data 310, including visual, motion, audio, and textual data, are extracted from each scene, [0075], [0076].) each event in the set of events being associated with an event classification label selected from a predetermined event classification library, the predetermined event classification library comprising a plurality of event classification labels where each event classification label comprises a set of available event subclassification labels; (Fig. 4: For each of the modalities data 310 a class probability 410 is calculated, [0092]-[0093].  Classified concepts are further classified into narrower subclasses, [0095].)

for each individual event in the set of events and based on an identified event classification label of the individual event, determining a set of identified event subclassification labels for the individual event, the set of identified event subclassification labels being selected from the set of available subclassification labels for the identified event classification as provided by the predetermined event classification library; (Fig. 4: For each of the modalities data 310 a class probability 410 is calculated, [0092]-[0093].  Classified concepts are further classified into narrower subclasses, [0095].)

determining, by the hardware processor, a set of scenes within the content data, each scene in the set of scenes comprising a subset of events from the set of events; (Fig. 1: Video stream 250 is divided into scenes 302, [0074].)
for each individual scene in the set of scenes, determining a set of scene attributes for the individual scene; (Fig. 4: In-modality probabilities vector 420 of all the modalities extracted from the scene 302 are aggregated to create a scene descriptor 430, [0102]-[105].)

determining, by the hardware processor, a set of themes for the current media asset based on at least one of the set of scenes or the set of scene attributes; (Fig. 5: Scene categories probability 510 are calculated from scene descriptors 430, [0106], [0107].)

determining, by the hardware processor, a set of title attributes for the current media asset based on at least the set of themes and metadata associated with the media asset; (Fig. 5: Video stream categories probability 520 is created by aggregating scene categories probability 510, [0110]-[0112].  Video classification module extracts textual data from metadata of the video stream 250, [0078].) and 

generating, by the hardware processor, contextual data for the current media asset based on at least one of the set of events, the set of event classification labels determined for the set of events, the sets of identified event subclassification labels determined for the set of events, the set of scenes, sets of scene attributes for the set of scenes, the set of themes, or the set of title attributes. (Fig. 1: The video classification data 255 may be constructed as a structured textual representation comprising, for example, the high-level categories identified for the video stream 250, the high-level categories identified for one or more scenes 302 and/or one or more of the recognized concepts, [0115]-[0122].)

However, Polak does not explicitly teach at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; an intent of the context of the individual event; or an outcome of the context of the individual event.

In an analogous art, Borel, which discloses a system for video processing, clearly teaches at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; (The context of the event is presented as e.g. breakfast, meeting, private phone call etc., [0068], [0069], [0083].) an intent of the context of the individual event; or an outcome of the context of the individual event. 

Therefore, before the effective filing date of the claimed invention, it would have been obvious to one with ordinary skill in the art to modify the system of Polak by at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; an intent of the context of the individual event; or an outcome of the context of the individual event, as taught by Borel, for the benefit of better classifying the video content.
	
Consider claim 2, Polak combined with Borel clearly teaches the predetermined event classification library is configured such that event classification labels and event subclassification labels of the predetermined event classification library cause events in the current media asset to be classified without cultural bias. (Fig. 4: For each of the modalities data 310 a class probability 410 is calculated, [0092]-[0093].  Classified concepts are further classified into narrower subclasses, [0095] Polak.)

Consider claim 3, Polak combined with Borel clearly teaches an individual event subclassification label determined for the individual event provides detail with respect to a context of the individual event. ([0080], [0085], [0095]-[0097] Polak)

Consider claim 4, Polak combined with Borel clearly teaches the individual event subclassification label provides at least one of: a description of the context of the individual event; an explanation of the context of the individual event; how the context of the individual event is presented in the content data of the current media asset; an intent of the context of the individual event; or an outcome of the context of the individual event. ([0080], [0085], [0095]-[0097] Polak)

Consider claim 6, Polak combined with Borel clearly teaches the scanning of the media asset is performed using an event scanner; and wherein the event scanner comprises a machine learning model trained to automatically identify a select event at a select timestamp of the current media asset based on a set of signals provided by at least one computer vision analysis, audio analysis, or natural language processing of content presented by the current media asset at the select timestamp. ([0079] Polak)

Consider claim 7, Polak combined with Borel clearly teaches the machine learning model is trained based on contextual data of another media asset. ([0108] Polak)

Consider claim 9, Polak combined with Borel clearly teaches the determining the set of identified event subclassification labels for the individual event is performed using an event classifier, wherein the individual event is at a select timestamp of the current media asset; and wherein the event classifier comprises a machine learning model trained to automatically identify the set of identified event subclassification labels for the individual event based on a set of signals provided by at least one computer vision analysis, audio analysis, or natural language processing of content presented by the current media asset at the select timestamp. ([0079] Polak)

Consider claim 10, Polak combined with Borel clearly teaches the machine learning model is trained based on contextual data of another media asset. ([0108] Polak)

Consider claim 12, Polak combined with Borel clearly teaches the determining the set of scene attributes for the individual scene is performed using a scene analyzer; and wherein the scene analyzer comprises a machine learning model trained to automatically identify the set of scene attributes for the individual scene based on at least one of: one or more events of the individual scene; one or more event classification labels for the one or more events; or one or more event subclassification labels for the one or more events. (Fig. 4: In-modality probabilities vector 420 of all the modalities extracted from the scene 302 are aggregated to create a scene descriptor 430, [0102]-[105] Polak.)

Consider claim 13, Polak combined with Borel clearly teaches the machine learning model is trained based on contextual data of another media asset. ([0108] Polak)

Consider claim 15, Polak combined with Borel clearly teaches the determining the set of scene attributes for the individual scene comprises determining at least one of: determining a frequency of events in the individual scene; determining a mixture of events, with different event classification labels, in the individual scene; determining a time distance between events in the individual scene; or determining a duration of the individual scene. (Fig. 4: In-modality probabilities vector 420 of all the modalities extracted from the scene 302 are aggregated to create a scene descriptor 430, [0102]-[105] Polak.)

Consider claim 16, Polak combined with Borel clearly teaches the determining of the set of themes for the current media asset is performed using a theme analyzer; and wherein the theme analyzer comprises a machine learning model trained to automatically identify the set of themes for the current media asset based on at least one of the set of scenes or the set of scene attributes.  ([0105]-[0108] Polak)

Consider claim 17, Polak combined with Borel clearly teaches the machine learning model is trained based on contextual data of another media asset. ([0108] Polak)

Consider claim 22, Polak combined with Borel clearly teaches causing, by the hardware processor, a media software tool configured to process the current media asset based on the contextual data for the current media asset. ([0122] Polak)

Consider claim 23, Polak combined with Borel clearly teaches the metadata comprises at least one of: an attribute describing a genre of the current media asset; an attribute describing how the content data of the current media asset is presented; (The context of the event is presented as e.g. breakfast, meeting, private phone call etc., [0068], [0069], [0083] Borel.) an attribute describing a cast or a crew member listed for the current media asset; an attribute describing entities involved in production of the current media asset; an attribute describing a production or release date for the current media asset; or a runtime of the current media asset.

Consider claim 24, Polak clearly teaches a system (Fig. 2) comprising: 

a memory storing instructions; and one or more hardware processors communicatively coupled to the memory and configured by the instructions to perform operations ([0064]) comprising: 

accessing content data for a current media asset; (Video stream 250, [0072])

determining a set of events within the content data by scanning the content data for events that relate to at least one event classification, each event in the set of events comprising at least one of a visual content element, a textual content feature, or an audio content feature from the content data being presented at a timestamp of the current media asset, (Fig. 3: Modalities data 310, including visual, motion, audio, and textual data, are extracted from each scene, [0075], [0076].) each event in the set of events being associated with an event classification label selected from a predetermined event classification library, the predetermined event classification library comprising a plurality of event classification labels where each event classification label comprises a set of available event subclassification labels; (Fig. 4: For each of the modalities data 310 a class probability 410 is calculated, [0092]-[0093].  Classified concepts are further classified into narrower subclasses, [0095].)
for each individual event in the set of events and based on an identified event classification label of the individual event, determining a set of identified event subclassification labels for the individual event, the set of identified event subclassification labels being selected from the set of available subclassification labels for the identified event classification as provided by the predetermined event classification library; (Fig. 4: For each of the modalities data 310 a class probability 410 is calculated, [0092]-[0093].  Classified concepts are further classified into narrower subclasses, [0095].)

determining a set of scenes within the content data, each scene in the set of scenes comprising a subset of events from the set of events; (Fig. 1: Video stream 250 is divided into scenes 302, [0074].)

for each individual scene in the set of scenes, determining a set of scene attributes for the individual scene; (Fig. 4: In-modality probabilities vector 420 of all the modalities extracted from the scene 302 are aggregated to create a scene descriptor 430, [0102]-[105].)

determining a set of themes for the current media asset based on at least one of the set of scenes or the set of scene attributes; (Fig. 5: Scene categories probability 510 are calculated from scene descriptors 430, [0106], [0107].)

determining a set of title attributes for the current media asset based on at least the set of themes and metadata associated with the media asset; (Fig. 5: Video stream categories probability 520 is created by aggregating scene categories probability 510, [0110]-[0112].  Video classification module extracts textual data from metadata of the video stream 250, [0078].) and 

generating contextual data for the current media asset based on at least one of the set of events, the set of event classification labels determined for the set of events, the sets of identified event subclassification labels determined for the set of events, the set of scenes, sets of scene attributes for the set of scenes, the set of themes, or the set of title attributes. (Fig. 1: The video classification data 255 may be constructed as a structured textual representation comprising, for example, the high-level categories identified for the video stream 250, the high-level categories identified for one or more scenes 302 and/or one or more of the recognized concepts, [0115]-[0122].)

However, Polak does not explicitly teach at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; an intent of the context of the individual event; or an outcome of the context of the individual event.

In an analogous art, Borel, which discloses a system for video processing, clearly teaches at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; (The context of the event is presented as e.g. breakfast, meeting, private phone call etc., [0068], [0069], [0083].) an intent of the context of the individual event; or an outcome of the context of the individual event. 

Therefore, before the effective filing date of the claimed invention, it would have been obvious to one with ordinary skill in the art to modify the system of Polak by at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; an intent of the context of the individual event; or an outcome of the context of the individual event, as taught by Borel, for the benefit of better classifying the video content.

Consider claim 25, Polak clearly teaches a non-transitory computer-readable medium comprising instructions that, when executed by a hardware processor of a device, cause the device to perform operations ([0064]) comprising: 

accessing content data for a current media asset; (Video stream 250, [0072])

determining a set of events within the content data by scanning the content data for events that relate to at least one event classification, each event in the set of events comprising at least one of a visual content element, a textual content feature, or an audio content feature from the content data being presented at a timestamp of the current media asset, (Fig. 3: Modalities data 310, including visual, motion, audio, and textual data, are extracted from each scene, [0075], [0076].) each event in the set of events being associated with an event classification label selected from a predetermined event classification library, the predetermined event classification library comprising a plurality of event classification labels where each event classification label comprises a set of available event subclassification labels; (Fig. 4: For each of the modalities data 310 a class probability 410 is calculated, [0092]-[0093].  Classified concepts are further classified into narrower subclasses, [0095].)

for each individual event in the set of events and based on an identified event classification label of the individual event, determining a set of identified event subclassification labels for the individual event, the set of identified event subclassification labels being selected from the set of available subclassification labels for the identified event classification as provided by the predetermined event classification library; (Fig. 4: For each of the modalities data 310 a class probability 410 is calculated, [0092]-[0093].  Classified concepts are further classified into narrower subclasses, [0095].)

determining a set of scenes within the content data, each scene in the set of scenes comprising a subset of events from the set of events; (Fig. 1: Video stream 250 is divided into scenes 302, [0074].)

for each individual scene in the set of scenes, determining a set of scene attributes for the individual scene; (Fig. 4: In-modality probabilities vector 420 of all the modalities extracted from the scene 302 are aggregated to create a scene descriptor 430, [0102]-[105].)

determining a set of themes for the current media asset based on at least one of the set of scenes or the set of scene attributes; (Fig. 5: Scene categories probability 510 are calculated from scene descriptors 430, [0106], [0107].)

determining a set of title attributes for the current media asset based on at least the set of themes and metadata associated with the media asset; (Fig. 5: Video stream categories probability 520 is created by aggregating scene categories probability 510, [0110]-[0112].  Video classification module extracts textual data from metadata of the video stream 250, [0078].) and
 generating contextual data for the current media asset based on at least one of the set of events, the set of event classification labels determined for the set of events, the sets of identified event subclassification labels determined for the set of events, the set of scenes, sets of scene attributes for the set of scenes, the set of themes, or the set of title attributes. (Fig. 1: The video classification data 255 may be constructed as a structured textual representation comprising, for example, the high-level categories identified for the video stream 250, the high-level categories identified for one or more scenes 302 and/or one or more of the recognized concepts, [0115]-[0122].)

However, Polak does not explicitly teach at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; an intent of the context of the individual event; or an outcome of the context of the individual event.

In an analogous art, Borel, which discloses a system for video processing, clearly teaches at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; (The context of the event is presented as e.g. breakfast, meeting, private phone call etc., [0068], [0069], [0083].) an intent of the context of the individual event; or an outcome of the context of the individual event. 

Therefore, before the effective filing date of the claimed invention, it would have been obvious to one with ordinary skill in the art to modify the system of Polak by at least one identified event subclassification label in the set of identified event subclassification labels describing at least one of: how a context of the individual event is presented in the content data of the current media asset; an intent of the context of the individual event; or an outcome of the context of the individual event, as taught by Borel, for the benefit of better classifying the video content.

Claims 5, 8, 11, 14, 18, 19 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Polak et al. (US 2018/0032845) in view of Borel et al. (US 2017/0078767) in view of Zadeh et al. (US 10,789,291), herein Zadeh.
Consider claim 5, Polak combined with Borel clearly teaches the method of claim 1.

However, Polak combined with Borel does not explicitly teach causing, by the hardware processor, display of a graphical user interface for screening the current media asset, the graphical user interface being configured to receive user input that identifies at least one of: one or more events in the content data of the current media asset; one or more event classification labels for an event of the current media asset; one or more event subclassification labels for an event of the current media; one or more scenes in the content data of the current media asset; one or more themes for the current media asset; or one or more title attributes for the current media asset. 

In an analogous art, Zadeh, which discloses a system for classifying video content, clearly teaches causing, by the hardware processor, display of a graphical user interface for screening the current media asset, the graphical user interface being configured to receive user input that identifies at least one of: one or more events in the content data of the current media asset; one or more event classification labels for an event of the current media asset; one or more event subclassification labels for an event of the current media; one or more scenes in the content data of the current media asset; one or more themes for the current media asset; or one or more title attributes for the current media asset. (Fig. 2A: Display area 201 displays the video and user interface element 205 receives user input specifying the classification to display, col. 9 lines 59-61, col. 11 lines 22-28.)

Therefore, before the effective filing date of the claimed invention, it would have been obvious to one with ordinary skill in the art to modify the system of Polak combined with Borel by causing, by the hardware processor, display of a graphical user interface for screening the current media asset, the graphical user interface being configured to receive user input that identifies at least one of: one or more events in the content data of the current media asset; one or more event classification labels for an event of the current media asset; one or more event subclassification labels for an event of the current media; one or more scenes in the content data of the current media asset; one or more themes for the current media asset; or one or more title attributes for the current media asset, as taught by Zadeh, for the benefit of enabling users to search for desired content classifications.
	
Consider claim 8, Polak combined with Borel clearly teaches the machine learning model.

However, Polak combined with Borel does not explicitly teach causing the machine learning model to further train using at least a portion of the generated contextual data. 

In an analogous art, Zadeh, which discloses a system for classifying video content, clearly teaches causing the machine learning model to further train using at least a portion of the generated contextual data. (The media detection system 140 retrains 810 one or more detectors of the set of detectors based on the selected subset of the displayed media content items, col. 19 lines 47-61.)

Therefore, before the effective filing date of the claimed invention, it would have been obvious to one with ordinary skill in the art to modify the system of Polak combined with Borel by causing the machine learning model to further train using at least a portion of the generated contextual data, as taught by Zadeh, for the benefit of improving the classification of video content. 

Consider claim 11, Polak combined with Borel and Zadeh clearly teaches causing the machine learning model to further train using at least a portion of the generated contextual data. (The media detection system 140 retrains 810 one or more detectors of the set of detectors based on the selected subset of the displayed media content items, col. 19 lines 47-61 Zadeh.)

Consider claim 14, Polak combined with Borel and Zadeh clearly teaches causing the machine learning model to further train using at least portion of the generated contextual data. (The media detection system 140 retrains 810 one or more detectors of the set of detectors based on the selected subset of the displayed media content items, col. 19 lines 47-61 Zadeh.)

Consider claim 18, Polak combined with Borel and Zadeh clearly teaches causing the machine learning model to further train using at least a portion of the generated contextual data. (The media detection system 140 retrains 810 one or more detectors of the set of detectors based on the selected subset of the displayed media content items, col. 19 lines 47-61 Zadeh.)

Consider claim 19, Polak combined with Borel and Zadeh clearly teaches causing, by the hardware processor, display of a graphical user interface for screening the current media asset, the graphical user interface including a time bar for the content data of the current media asset, and the time bar including a visual indicator for each timestamp of the current media asset that is associated with a select event from the set of events or a select scene from the set of scenes. (User interface element 20, col. 10 lines 15-30 Zadeh)

Consider claim 20, Polak combined with Borel and Zadeh clearly teaches causing, by the hardware processor, display of a graphical user interface for screening the current media asset, the graphical user interface including a listing of tags that correspond to events from the set of events or scenes from the set of scenes.  (Fig. 3A: User interface elements 303, col. 11 line 65 to col. 12 line 17 Zadeh)

Consider claim 21, Polak combined with Borel and Zadeh clearly teaches in the listing of tags, a select tag for a select event or a select scene is displayed with one or more event classification labels of the select event or the select scene. (Fig. 3A: User interface elements 303, col. 11 line 65 to col. 12 line 17 Zadeh)


Conclusion
In the case of amending the claimed invention, applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN R SCHNURR whose telephone number is (571)270-1458. The examiner can normally be reached M-F 6a-4p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Pendleton can be reached on (571)272-7527. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN R SCHNURR/           Primary Examiner, Art Unit 2425