Detailed Action
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This office action is in response to the amendment filed on 04/21/2022. 
Status of Claims
3.	Claim 1-29 are pending.
	Claims 23, 24 and 26 are canceled.
Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 12, 14 and 21-26 are rejected under 35 U.S.C. 103 as being unpatentable over Li (USPN 8843470, referred to as Li), and further in view of Carton (USPGPPub 20050220439, referred to as Carton), and further in view of Maurer (USPGPPub 20150279426, referred to as Maurer).
Regarding claims 1, 2, 21 and 22:
A system for interactive video content delivery, the system comprising: 
Li teaches a communication module configured to receive a video content, the video content including one or more video frames, (Li, receiving video images multimedia content, analyzing information from domains for dining, movies, live performances, and sporting events. Still other domains include domains for shopping--electronics, shopping--vehicle, and shopping--general. Additionally, several domains are available that represent categories that may intersect with other domains. These domains include categories for travel, images, and videos, C 2: L 54-60, Fig. 2, C1: L25-35, C 11: L 35-43, C11: L44-56); 
Li teaches a video analyzer module configured to run one or more machine-learning classifiers on the one or more video frames to create classification metadata, the classification metadata corresponding to the one or more machine-learning classifiers and one or more probability scores associated with the classification metadata, a video analyzer module configured to run one or more machine-learning classifiers on the one or more video frames to create classification metadata associated with the one or more video frames, (Li, an image meta-classifier will tend to rely too heavily on output from the corresponding image domain classifier, C 6: L 56-65 wherein the ranking or probability for evaluating the assignment is based on a meta-classifier or a domain classifier, C5: L 56-67 classifiers can operate on dedicated processors, dedicated virtual machines, C11: L 40-45, Fig. 4, C 15: L 1-22); and Li does not specifically teach classification metadata, associated with the one or more video frames. However, Carton teaches the metadata that is a part of the classification data may include any information about the scene/frame, [0028], Figs. 5-7. 
Li does not teach one or more sensors configured to collect data associated with environmental conditions of an observer of the video content. However, Maurer teaches environment sensors, such as the environment sensor 106 in the learning environment 101. The environment sensor 106 is configured to provide additional information about the learning environment 101 to the remote server 110 over the network 130 wherein the environment sensor 106 can be any type of sensor and may be, for example, a temperature sensor, a light sensor, a humidity sensor, an air quality sensor, a motion sensor, or the like, [0045], Fig. 1/item 106 and the selection of the video based on different criterion such as information from the environment sensor 106 placed in the learning environment, [0065]. It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Maurer with the teaching of collecting sensor data related to environmental conditions of the observer into the invention of Li for the purpose of analyzing more information to create a proper recommendation images.
Li teaches a processing module configured to create one or more interaction triggers based on a set of rules, the one or more interaction triggers being configured to trigger one or more actions with regard to the video content based on the classification metadata, (Li, a meta-classifier may have a subject matter area of "commerce", which represents a query that indicates a user who intends to purchase something. In this example, the subject matter area of "commerce" can correspond to two domains, C 10: L 14-54 “the assigned meta-classifier category can then be used in any convenient manner, such as by triggering additional uses of the search query to match alternative types of documents, such as use of the query in an image search” C 3: L30-67 and  the assigned meta-classifier category for a query can be used such as by triggering additional uses of the search query to match images, abstract). Li does not specifically teach trigger one or more actions with regard to the one or more video frames of the video content. However,   Carton teaches overlaying icons on indexed objects on scene, [0028].
Li does not specifically teach wherein the one or more actions include modifying one or more objects present in the one or more video frames. However, Carton teaches Icons on the screen appears overlaying items from a scene, [0028], Fig. 4B/ item 108. It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Carton with the teaching of modifying an object presented on the video frame into the invention of Li for the purpose of receiving an additional information.
Regarding claim 3:
Li teaches the method of claim 2, in which the triggering of the one or more actions is further based on the one or more probability scores, (Li, the meta-classifiers are based on a non-linear ensemble model for combining the information from the domain classifiers to generate a meta-classifier ranking, probability, or other score. The assigned meta-classifier category can then be used in any convenient manner, such as by triggering additional uses of the search query to match images or other alternative types of documents, C 1: L 37-45).
Regarding claim 4:
Li teaches the method of claim 2, in which the video content includes a live video, the live video being delayed until the one or more machine-learning classifiers are run on the one or more video frames, (Li, the search of the primary index could also wait for the query classification in order to improve the initial identification of responsive documents in the web index 465, C 15: L 33-37, C 3: L 2-7).
Regarding claim 5:
Li teaches the method of claim 2, in which the video content includes video-on- demand, the one or more machine-learning classifiers being run on the one or more video frames before the video content is uploaded to a content distribution network (CDN), (Li, the assignment to the "images" category by the meta-classifier initiates/triggering a secondary search wherein  the results from the primary search engine and the secondary (image) search engine are displayed to a user, wherein the images are not displayed before the machine learning classification module analyze them, C 13: L 52-67-C14:L1-18 and before the video content to upload to the content service, some queries may appear to be relevant to more than one domain after evaluation, waiting for such search results prior to triggering an image index search will increase the latency time of responding to a query, C 2: L 32-34, Fig. 4).
Regarding claim 7:
Li teaches the method of claim 2, further comprising: determining that a condition for triggering at least one of the one more interaction triggers is met; and in response to the determination, triggering the one or more actions with regard to the video content, (Li, a meta-classifier may have a subject matter area of "commerce", which represents a query that indicates a user who intends to purchase something. In this example, the subject matter area of "commerce" can correspond to two domains. One domain is a "shopping--electronics" domain, which includes a variety of software and computer hardware products. If the "commerce" meta-classifier generates the highest meta-classifier category score, the query will be assigned to at least one of the domains within the commerce subject matter area. The domain evaluation scores from the domain classifiers for "shopping--electronics" and "shopping--general" are then used to assign the query to at least one of the domains within the commerce category, C 10: L 14-53).
Regarding claim 12:
Li teaches the method of claim 2, in which the set of rules are based on one or more of the following: a user profile, a user setting, a user preference, a viewer identity, a viewer age, and an environmental condition, (Li, evaluation factors may be related to a user context, such as a geographic location for a user or demographic data for a user, C 4:L 21-24).
Regarding claim 14:
Li teaches the method of claim 8, in which: the one or more machine-learning classifiers include a product classifier configured to identify one or more purchasable items present in the one or more video frames; and the one or more actions to be taken upon triggering of the one or more interaction triggers include providing the one or more links enabling a user to make a purchase of the one or more purchasable items, (Li, meta-classifier may have a subject matter area of "commerce", which represents a query that indicates a user who intends to purchase something, C 10: L 32-50; based on the assignment first to the commerce category, and then the "shopping--electronics" domain, a specialized shopping interface can be displayed to the user C 11: L18-22).
Regarding claim 25:
Li in view of Cordova-Diba teaches the system of claim 1, wherein the modifying one or more objects includes at least one of the following: replacing the one or more objects with one or more new objects, highlighting the one or more objects, and editing the video content, (Cordova-Diba, an object in the database that is associated with the hotspot and/or hotspot package may be changed (e.g., using the editing tool), [0094]; the object recognition process to be robust to shadows, shading, highlights, reflections, and other factors caused by differences in illumination, [0116]; the query image to enhance robustness of object localization algorithms to shadows, shading, highlights, and illumination intensity, [0134]).
5.	Claims 6 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Li (USPN 8843470, referred to as Li), in view of Carton (USPGPPub 20050220439, referred to as Carton), and further in view of Manico (USPGPPubN 20170351417, referred to as Manico).
Regarding claim 6:
Li does not specifically teach the method of claim 2, in which the video content includes a video game. However, Manico teaches the objects games, see table 2. It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Manico with the teaching of the video includes games into the invention of Li for the purpose of presenting different types of video.
Li in view of Manico teaches the method of claim 8, in which: the one or more machine-learning classifiers include a people classifier configured to identify one or more individuals present in the one or more video frames; and the one or more actions to be taken upon triggering the one or more interaction triggers include one or more of the following: labeling the one or more individuals in the one or more video frames, providing recommendations related to another media content associated with the one or more individuals, providing another media content associated with the one or more individuals, editing the video content based on the one or more individuals, controlling delivery of the video content based on the one or more individuals, and presenting search options related to the one or more individuals, (Manico, deriving the metadata from images of objects, people or scenes, [0040], [0048]; The curator may also suggest usage for an image or a set of images based on the set of associated tags, [0050], [0057]).
6.	Claims 8-11, 13, 15, 16 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Li (USPN 8843470, referred to as Li), in view of Carton (USPGPPub 20050220439, referred to as Carton), and further in view of Ishtiaq (USPGPPubN 20150082349, referred to as Ishtiaq).
Regarding claim 8:
Li does not specifically teach the method of claim 2, in which the one or more machine- learning classifiers include at least one of an image recognition classifier configured to analyze a still image in one of the video frames, and a composite recognition classifier configured to analyze: (i) one or more image changes between two or more of the video frames; and (ii) one or more sound changes between two or more of the video frames. However, Ishtiaq teaches audio module 113 of the video data analyzer 111 can analyze the audio data of the video data to detect various audio characteristics or features. For example, the audio module can recognize voices, songs, sound effects, noises, tones, and other audio features and the visual module 112 of the video data analyzer 111 can analyze the visual data to detect data corresponding to on-screen text or objects, [0031], [0033], Figs. 1A-1C, 4, [0060]). It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Ishtiaq with the teaching of frame and voice recognition into the invention of Li for the purpose of predicting user’s interest.
Regarding claims 9 and 10:
Li in view of Ishtiaq teaches the method of claim 2, further comprising creating one or more entry points corresponding to the one or more interaction triggers, in which each of the one or more entry points include a user input associated with the video content or a user gesture associated with the video content; The method of claim 9, in which each of the one or more entry points include one or more of the following: a pause of the video content, a jump point of the video content, a bookmark of the video content, a location marker of the video content, a search result associated with the video content, and a voice command, (Ishtiaq, client device 120 simply issues "trick play" commands (fast-forward, rewind, play, pause) for the STB to start playing the video content from the point that begins the segment, [0077]).
Regarding claim 11:
Li in view of Ishtiaq teaches the method of claim 9, in which the one or more actions are based on the classification metadata of a frame associated with one of the entry points of the video content, (Ishtiaq, based on the segment chosen, user interface engine 121 transmits "trick-play" commands to client device 120 in order to fast-forward or rewind to the beginning of the chosen segment of the video content, [0078], [0125]).
Regarding claim 13:
Li in view of Ishtiaq teaches the method of claim 8, in which: the one or more machine-learning classifiers include a general object classifier configured to identify one or more objects present in the one or more video frames; and the one or more actions to be taken upon triggering the one or more interaction triggers include one or more of the following: replacing the one or more objects with new objects in the one or more video frames, automatically highlighting the objects, recommending purchasable items represented by the one or more objects, editing the video content based on the identification of the one or more objects, controlling delivery of the video content based on the identification of the one or more objects, and presenting search options related to the one or more objects, (Ishtiaq, the recognized patterns can be associated with textual data or image data that describes the recognized patterns. The recognized object can be associated with the corresponding regions in the frames or frame sequences in which it appears, [0031], [0042], [0056]).
Regarding claim 15:
Li in view of Ishtiaq teaches the method of claim 8, in which: the one or more machine-learning classifiers include a product classifier configured to identify one or more purchasable items present in the one or more video frames; and the one or more actions to be taken upon triggering of the one or more interaction triggers include providing the one or more links enabling a user to make a purchase of the one or more purchasable items, (Ishtiaq, consider a video asset in the comedy genre. Often such video data contains laughter from an audience embedded in the audio stream. As the video data is analyzed and laughter is detected in the video data, the time period corresponding to the laughter is transmitted to the client device 120, which turns on the microphone and/or camera only on the indicated time period, [0139], [0118], [0060]).
Regarding claim 16:
Li in view of Ishtiaq teaches the method of claim 8, in which: the one or more machine-learning classifiers include a sentiment condition classifier configured to determine a sentiment level associated with the one or more video frames; the classification metadata is created based on one or more of the following: color information of the one or more video frames, audio information of the one or more video frames, a user behavior exhibited by the user upon watching the video content; and the one or more actions to be taken upon triggering of the one or more interaction triggers include one or more of the following: providing recommendations related to another media content associated with the sentiment level and providing another media content associated with the sentiment level, (Ishtiaq, rank the segments based on a user-selectable criteria, or, alternatively, on criteria learned from the user's viewing history. For instance, if the visual, audio, and textual features of the video content relates to a baseball content, the user may decide to watch only segments containing home-runs or plays with high emotion. In this case, video content segment services module 125 would extract and fuse visual, audio, and textual features of the video content that correspond to high emotion to generate the ranking, [0081], [0100]; the user behavior data may also inform recommendations for other video content, [0102]).
Regarding claim 17:
Li in view of Ishtiaq teaches the method of claim 8, in which: the one or more machine-learning classifiers include a landmark classifier configured to identify a landmark present in the one or more video frames; and the one or more actions to be taken upon triggering the one or more interaction triggers include one or more of the following: labeling the identified landmark in the one or more video frames, providing recommendations related to another media content associated with the identified landmark, providing another media content associated with the identified landmark, editing the video content based on the identified landmark, controlling delivery of the video content based on the identified landmark, and presenting search options related to the identified landmark, (Ishtiaq, the visual features can include, video markers or templates (e.g., graphical features overlaid on the visual content indicating a transition, or identifying a content segment), video editing cuts or transitions, video fade ins/fade outs, light flashes or strobes, detection of long shots or close ups, and the like, [0118], [0125]).
7.	Claims 27 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Li (USPN 8843470, referred to as Li), in view of Carton (USPGPPub 20050220439, referred to as Carton), and further in view of Kalva (USPGPPubN 20100158099, referred to as Kalva).
Regarding claims 27 and 28:
Li does not specifically teach the method of claim 2, where the interaction triggers present an information or actions on at least two screens, each of the two screens displaying said information or actions that is different from the other screen. However, Kalva teaches screen shots interactive video object overlay elements 1210, 1220, and 1230. Some of the objects 1210, 1220, and/or 1230 may be selected by a user. Selection of the objects 1210, 1220, and/or 1230 wherein video objects, such as the objects 1210, 1220, and/or 1230 may be pasted into new scenes and blended to appear as normal part of this scene yet still remain selectable, [0104], Figs. 12A-C. It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Kalva with the teaching of a screen presentation of information in result of action interaction into the invention of Li for the purpose of presenting an additional information triggered user’s interaction.
8.	Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Li (USPN 8843470, referred to as Li), and further in view of Carton (USPGPPub 20050220439, referred to as Carton), in view of Maurer (USPGPPub 20150279426, referred to as Maurer), and further in view of Saptharishi (USPN 8224029, referred to as Maurer).
Regarding claim 29:
Li  does not specifically teach the method of claim 2, where the probability scores refer to a confidence level that a particular video frame includes or is associated with a certain asset. However, Saptharishi teaches the selection of the objects may depend on how confident the base system was in its classification of an object, C 14: L 48-65; the object tracking module 206 may call on the match classifier 218 to first determine whether the highest ranked object of the current frame matches the tracked object. Moreover, the object tracking module 206 may use match probability information to determine an order for the tracked objects. For example, if the motion modeling module 1002 determines that the probability of a match between a first tracked object and its highest ranked object is greater than the probability of a match between a second tracked object and its highest ranked object, C18: L 5-39. It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Saptharishi with the teaching of probability value of confidence that the frame includes best matched object into the invention of Li for the purpose of degerming the object of a frame with high probability to match the user need.
Response to Arguments
9.	Applicant's arguments filed 04/21/2022 related to claims 1-26 have been fully considered but they are not persuasive.
In reference to Applicant's argument: 
Applicant reviewed the cited support C2: L 54-60, FIG. 2, C1: L25-35, C 11: L35-43, C11: L44-56. Applicant could not find any mention of receiving videos images and multimedia content. The only time the words "images" or "video" appear are in relation to "subject matter domain classifiers" and not receiving the multimedia or multimedia content itself.
Examiner’s response:
Examiner respectfully disagrees. Claims 1, 2, 21 and 22 cite: “a communication module configured to receive a video content…” Li teaches “domains correspond to various types of entertainment activities, such as domains for dining, movies, live performances, and sporting events” C11:L 36-67- C12: L 1-4.
In reference to Applicant's argument: 
Applicant reiterates that Li discloses "identifying a query as belonging to the image domain would allow a separate image search to be triggered based on the assignment of the - 13 query". There is no mention of receiving any actual images, video or multimedia content, Li only teaches that if a search query belongs to, for example, an image domain i.e. the class "image", then that search query can be used to trigger other searches based on the classification of the query as a search query pertaining to the image domain.
Examiner’s response:
Examiner respectfully disagrees. Claims 1, 2, 21 and 22 cite: “a processing module configured to create one or more interaction triggers based on a set of rules, the one or more interaction triggers being configured to trigger one or more actions with regard to the video content based on the classification metadata” Li teaches various types of image queries may also have high classification scores for domains such as sports, Fig. 3 wherein “the assigned meta-classifier category can then be used in any convenient manner, such as by triggering additional uses of the search query to match alternative types of documents, such as use of the query in an image search” C 3: L30-67 and  the assigned meta-classifier category for a query can be used such as by triggering additional uses of the search query to match images, abstract.
In reference to Applicant's argument:
Applicant has reviewed C3: L 2-7, C 15: L33-37, and there is no "video content", "live video" or "video frames". In C3: L2-7 all that is mentioned are "results from processing a search query" and "waiting for such results prior to triggering an image index search"… Because "video content", "live video" or "video frames" are not mentioned in Li in a manner relevant to the Applicant's claim 5. 
Examiner’s response:
Examiner respectfully disagrees. Li teaches various types of image queries may also have high classification scores for domains such as sports, Fig. 3 wherein “the assigned meta-classifier category can then be used in any convenient manner, such as by triggering additional uses of the search query to match alternative types of documents, such as use of the query in an image search” C 3: L30-67 and  the assigned meta-classifier category for a query can be used such as by triggering additional uses of the search query to match images, abstract. 
In reference to Applicant's argument:
Applicant has reviewed C3: L 2-7, C 15: L33-37, and there is no there is no "video content", "live video" or "video frames". In C3: L2-7 all that is mentioned are "results from processing a search query" and "waiting for such results prior to triggering an image index search". This teaches waiting for the results of one result for a search query prior to triggering a second search. An image index does not indicate an actual image, or image content. This is distinguishable from what is claimed by Applicant, where a live video is delayed (not a search query or its results) until classifiers are run on it.
Examiner’s response:
Examiner respectfully disagrees. Claim 5 cites: “the video content includes video-on-demand, the one or more machine-learning classifiers being run on the one or more video frames before the video content is uploaded to a content distribution network (CDN)” Li teaches the assignment to the "images" category by the meta-classifier initiates/triggering a secondary search wherein  the results from the primary search engine and the secondary (image) search engine are displayed to a user. Therefore, the images are not displayed before the machine learning classification module analyze them, C 13: L 52-67-C14:L1-18. 
Hence, Applicant’s arguments are not persuasive. The finality of the last office action is proper, meets all the claim limitations and is maintained.
Conclusion
10.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Contact Information
11.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIVKA A RABOVIANSKI whose telephone number is (571)270-1845. The examiner can normally be reached 10 am Monday -7pm Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on (571) 272-4195. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIVKA A RABOVIANSKI/Primary Examiner, Art Unit 2426                                                                                                                                                                                                        May 23, 2022