DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
1.	This office action is in response to the amendment filed on 03/22/2021.
Status of Claims
2.	Claim 1-26 are pending.
Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims  1-7, 12, 14 and 21-26 are rejected under 35 U.S.C. 103 as being unpatentable over Li (USPN 8843470, referred to as Li), and further in view of Carton (USPGPPub 20050220439, referred to as Carton), and further in view of Hill (USPGPPub 20160364621, referred to as Hill).
Regarding claims 1, 2, 21 and 22:
A system for interactive video content delivery, the system comprising: 

Li teaches a video analyzer module configured to run one or more machine-learning classifiers on the one or more video frames to create classification metadata, the classification metadata corresponding to the one or more machine-learning classifiers and one or more probability scores associated with the classification metadata, a video analyzer module configured to run one or more machine-learning classifiers on the one or more video frames to collect sensor data associated with environmental conditions of an observer of the video content and create classification metadata associated with the one or more video frames (Li, an image meta-classifier will tend to rely too heavily on output from the corresponding image domain classifier, C 6: L 56-65 wherein the ranking or probability for evaluating the assignment is based on a meta-classifier or a domain classifier, C5: L 56-67 classifiers can operate on dedicated processors, dedicated virtual machines, C11: L 40-45, Fig. 4, C 15: L 1-22); and Li does not specifically teach classification metadata, associated with the one or more video frames. However, Carton teaches the metadata that is a part of the classification data may include any information about the scene/frame, [0028], Figs. 5-7. 
collect sensor data associated with environmental conditions of an observer of the video content. However, Hill teaches suitable sensor types implemented by sensor array 108 may include one or more light sensors (e.g., light intensity detectors), photodetectors, photodiodes, Hall Effect sensors, [0043] wherein the executed classification algorithm on the live video data, a determination may be made based upon the characteristics utilized by that particular classification algorithm identifies environmental conditions such as lighting may impact the outcome, [0081],  [0083]-[0084], Fig. 1/items 108. The classification metadata is related the sensor data. It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Hill with the teaching of collecting sensor data related to environmental conditions of the observer into the invention of Li for the purpose of analyzing more information to create a proper recommendation images.
Li teaches a processing module configured to create one or more interaction triggers based on a set of rules, the one or more interaction triggers being configured to trigger one or more actions with regard to the video content based on the classification metadata, (Li, a meta-classifier may have a subject matter area of "commerce", which represents a query that indicates a user who intends to purchase something. In this example, the subject matter area of "commerce" can correspond to two domains, C 10: L 14-54). Li does not specifically teach trigger one or more actions with one or more video frames of the video content. However,   Carton teaches overlaying icons on indexed objects on scene, [0028].
Li does not specifically teach wherein the one or more actions include modifying one or more objects present in the one or more video frames. However, Carton teaches Icons on the screen appears overlaying items from a scene, [0028], Fig. 4B/ item 108. It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Carton with the teaching of modifying an object presented on the video frame into the invention of Li for the purpose of receiving an additional information.
Regarding claim 3:
Li teaches the method of claim 2, in which the triggering of the one or more actions is further based on the one or more probability scores, (Li, the meta-classifiers are based on a non-linear ensemble model for combining the information from the domain classifiers to generate a meta-classifier ranking, probability, or other score. The assigned meta-classifier category can then be used in any convenient manner, such as by triggering additional uses of the search query to match images or other alternative types of documents, C 1: L 37-45).
Regarding claim 4:
Li teaches the method of claim 2, in which the video content includes a live video, the live video being delayed until the one or more machine-
Regarding claim 5:
Li teaches the method of claim 2, in which the video content includes video-on- demand, the one or more machine-learning classifiers being run on the one or more video frames before the video content is uploaded to a content distribution network (CDN), (Li, before the video content to upload to the content service, some queries may appear to be relevant to more than one domain after evaluation, waiting for such search results prior to triggering an image index search will increase the latency time of responding to a query, C 2: L 32-34, Fig. 4).
Regarding claim 7:
Li teaches the method of claim 2, further comprising: determining that a condition for triggering at least one of the one more interaction triggers is met; and in response to the determination, triggering the one or more actions with regard to the video content, (Li, a meta-classifier may have a subject matter area of "commerce", which represents a query that indicates a user who intends to purchase something. In this example, the subject matter area of "commerce" can correspond to two domains. One domain is a "shopping--electronics" domain, which includes a variety of software and 
Regarding claim 12:
Li teaches the method of claim 2, in which the set of rules are based on one or more of the following: a user profile, a user setting, a user preference, a viewer identity, a viewer age, and an environmental condition, (Li, evaluation factors may be related to a user context, such as a geographic location for a user or demographic data for a user, C 4:L 21-24).
Regarding claim 14:
Li teaches the method of claim 8, in which: the one or more machine-learning classifiers include a product classifier configured to identify one or more purchasable items present in the one or more video frames; and the one or more actions to be taken upon triggering of the one or more interaction triggers include providing the one or more links enabling a user to make a purchase of the one or more purchasable items, (Li, meta-classifier may have a subject matter area of "commerce", which represents a query that indicates a user who intends to purchase something, C 10: L 32-50; based on the assignment first to the commerce category, and then the 
Regarding claims 23-26:
Li in view of Cordova-Diba teaches the system of claim 1, wherein the modifying one or more objects includes at least one of the following: replacing the one or more objects with one or more new objects, highlighting the one or more objects, and editing the video content, (Cordova-Diba, an object in the database that is associated with the hotspot and/or hotspot package may be changed (e.g., using the editing tool), [0094]; the object recognition process to be robust to shadows, shading, highlights, reflections, and other factors caused by differences in illumination, [0116]; the query image to enhance robustness of object localization algorithms to shadows, shading, highlights, and illumination intensity, [0134]).
4.	Claims 6 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Li (USPN 8843470, referred to as Li), in view of Carton (USPGPPub 20050220439, referred to as Carton), and further in view of Manico (USPGPPubN 20170351417, referred to as Manico).
Regarding claim 6:
Li does not specifically teach the method of claim 2, in which the video content includes a video game. However, Manico teaches the objects games, see table 2. It would have been obvious to one of ordinary skill in the art at 
Li in view of Manico teaches the method of claim 8, in which: the one or more machine-learning classifiers include a people classifier configured to identify one or more individuals present in the one or more video frames; and the one or more actions to be taken upon triggering the one or more interaction triggers include one or more of the following: labeling the one or more individuals in the one or more video frames, providing recommendations related to another media content associated with the one or more individuals, providing another media content associated with the one or more individuals, editing the video content based on the one or more individuals, controlling delivery of the video content based on the one or more individuals, and presenting search options related to the one or more individuals, (Manico, deriving the metadata from images of objects, people or scenes, [0040], [0048]; The curator may also suggest usage for an image or a set of images based on the set of associated tags, [0050], [0057]).
5.	Claims 8-11, 13, 15, 16 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Li (USPN 8843470, referred to as Li), in view of Carton (USPGPPub 20050220439, referred to as Carton), and further in view of Ishtiaq (USPGPPubN 20150082349, referred to as Ishtiaq).
Regarding claim 8:
Li does not specifically teach the method of claim 2, in which the one or more machine- learning classifiers include at least one of an image recognition classifier configured to analyze a still image in one of the video frames, and a composite recognition classifier configured to analyze: (i) one or more image changes between two or more of the video frames; and (ii) one or more sound changes between two or more of the video frames. However, Ishtiaq teaches audio module 113 of the video data analyzer 111 can analyze the audio data of the video data to detect various audio characteristics or features. For example, the audio module can recognize voices, songs, sound effects, noises, tones, and other audio features and the visual module 112 of the video data analyzer 111 can analyze the visual data to detect data corresponding to on-screen text or objects, [0031], [0033], Figs. 1A-1C, 4, [0060]). It would have been obvious to one of ordinary skill in the art at the time the invention was made to incorporate Ishtiaq with the teaching of frame and voice recognition into the invention of Li for the purpose of predicting user’s interest.
Regarding claims 9 and 10:
Li in view of Ishtiaq teaches the method of claim 2, further comprising creating one or more entry points corresponding to the one or more interaction triggers, in which each of the one or more entry points include a user input associated with the video content or a user gesture associated 
Regarding claim 11:
Li in view of Ishtiaq teaches the method of claim 9, in which the one or more actions are based on the classification metadata of a frame associated with one of the entry points of the video content, (Ishtiaq, based on the segment chosen, user interface engine 121 transmits "trick-play" commands to client device 120 in order to fast-forward or rewind to the beginning of the chosen segment of the video content, [0078], [0125]).
Regarding claim 13:
Li in view of Ishtiaq teaches the method of claim 8, in which: the one or more machine-learning classifiers include a general object classifier configured to identify one or more objects present in the one or more video frames; and the one or more actions to be taken upon triggering the one or more interaction triggers include one or more of the following: replacing the one or more objects with new objects in the one or more video frames, 
Regarding claim 15:
Li in view of Ishtiaq teaches the method of claim 8, in which: the one or more machine-learning classifiers include a product classifier configured to identify one or more purchasable items present in the one or more video frames; and the one or more actions to be taken upon triggering of the one or more interaction triggers include providing the one or more links enabling a user to make a purchase of the one or more purchasable items, (Ishtiaq, consider a video asset in the comedy genre. Often such video data contains laughter from an audience embedded in the audio stream. As the video data is analyzed and laughter is detected in the video data, the time period corresponding to the laughter is transmitted to the client device 120, which turns on the microphone and/or camera only on the indicated time period, [0139], [0118], [0060]).
Regarding claim 16:
Li in view of Ishtiaq teaches the method of claim 8, in which: the one or more machine-learning classifiers include a sentiment condition classifier configured to determine a sentiment level associated with the one or more video frames; the classification metadata is created based on one or more of the following: color information of the one or more video frames, audio information of the one or more video frames, a user behavior exhibited by the user upon watching the video content; and the one or more actions to be taken upon triggering of the one or more interaction triggers include one or more of the following: providing recommendations related to another media content associated with the sentiment level and providing another media content associated with the sentiment level, (Ishtiaq, rank the segments based on a user-selectable criteria, or, alternatively, on criteria learned from the user's viewing history. For instance, if the visual, audio, and textual features of the video content relates to a baseball content, the user may decide to watch only segments containing home-runs or plays with high emotion. In this case, video content segment services module 125 would extract and fuse visual, audio, and textual features of the video content that correspond to high emotion to generate the ranking, [0081], [0100]; the user behavior data may also inform recommendations for other video content, [0102]).
Regarding claim 17:
.
Response to Arguments
6.	Applicant's arguments filed 03/22/2021 related to claims 1-26 have been fully considered but they are not persuasive.
In reference to Applicant's argument:
“Hill fails to teach or suggest that the sensor data is associated with environmental conditions of an observer of the video content.”
Examiner’s response:
Examiner respectfully disagrees. Specification cites in paragraph: “Some examples of sensors 205 include a video camera, microphone, motion sensor, depth camera, photodetector, and so forth. For example, sensors 205 can be used to detect and identify users, determine if children watch or access certain video content, determine lighting conditions, measure noise levels, track user's behavior, detect user's mood, and so forth.” The claim cites:” Independent claim 1 cites: “…machine-learning classifiers on the one or more video frames to collect sensor data associated with environmental conditions of an observer of the video content and create classification metadata associated with the one or more video frames.” Examiner’s interpretation of this citation is collection of sensor data associating with environmental conditions of an observer, for example,…”detect and identify users, … determine lighting conditions, measure noise levels” as cited in paragraph 40 from the specification. Hill clearly teaches these limitations. Hill teaches sensor array 108 may include… proximity sensors, light sensors (e.g., light intensity detectors), photodetectors, photoresistors, photodiodes, biometrics sensors (e.g., heart rate monitors, blood pressure monitors, skin temperature monitors), microphones, etc.”, see paragraph 43. These sensory data definitely is associated with environmental conditions of an observer. In addition, paragraph 81 of Hill cites: “These metrics may include any metrics suitable for the classification of live video data images by 
 112 rejection is withdrawn. 
Conclusion
7.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Contact Information
8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIVKA A RABOVIANSKI whose 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on (571) 272-4195.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIVKA A RABOVIANSKI/Primary Examiner, Art Unit 2426                                                                                                                                                                                                        May 10, 2021