DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
 	This is in response to applicant’s amendment/response filed on 08/15/2022, which has been entered and made of record.  Claims 43, 46-52, 55-60, and 62 have been amended.  Claims 43-62 are pending in the application. 

Response to Arguments
 	Applicant's arguments filed on 08/15/2022 have been fully considered but they are rendered moot in view of the new grounds of rejection presented below (as necessitated by the amendment to claims 43, 55, and 62).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 43-44, 46, 51, 53-56 and 61-62 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 8,984,405 Geller et al. in view of U.S. PGPubs 2015/0382079 to Lister et al., further in view of U.S. PGPubs 2014/0176604 to Venkitaraman et al..

Regarding claim 43, Geller et al. teach a  method comprising (abstract): at a device and including a processor, non-transitory memory, a speaker, and a display (Fig 1, col 4:36-60): 
storing, in the non-transitory memory (Fig 14, col 20:25-41), an audio file having an associated timeline (abstract, col 5:12-21, “obtaining access to a digitally stored video program and causing playing the video program in a player window of a second computer, wherein the player window includes a linear graphical timeline representing the video program”, col 5 57-67 and col 6:1-04, “database record 202 comprises and associates a video identifier 204, comment type 206, comment input 208, time value 210, and category 214; optionally, the record also may include an identifier of a taxonomy 202”);
storing, in the non-transitory memory in association with the audio file (Fig 14, col 20:25-41), a plurality of content events (col 5 57-67 and col 6:1-04, “database record 202 comprises and associates a video identifier 204, comment type 206, comment input 208, time value 210, and category 214; optionally, the record also may include an identifier of a taxonomy 202”, col 7:41-67 and col 8:1-13, “the taxonomy and category applicable to a particular video may be specified as part of a registration process by which a user adds metadata about a particular video to the system … “the process creates and stores a record associating the identifier of the video program, the time value, the comment type and the comment text. In this manner, the particular type of a particular comment, as well as the comment itself and the time value of the location or point at which the user signaled making the comment, become bound and associated with an identifier of the video program and stored in a record”), 
wherein each of the plurality of content events is associated with a respective temporal criterion and a respective environmental criterion (col 7:41-67 and col 8:1-13, “the taxonomy and category applicable to a particular video may be specified as part of a registration process by which a user adds metadata about a particular video to the system … “the process creates and stores a record associating the identifier of the video program, the time value, the comment type and the comment text. In this manner, the particular type of a particular comment, as well as the comment itself and the time value of the location or point at which the user signaled making the comment, become bound and associated with an identifier of the video program and stored in a record”, col 8:13-52, “the process causes displaying, in the timeline at a time-synchronized position, a marker having a particular distinctive appearance from among a plurality of appearances that correspond respectively to the plurality of comment types. Thus, in an embodiment, a marker in the timeline indicates that a comment is associated with that point in the playback time of the video, and the appearance of the marker indicates a type of the comment … where each comment item in the plurality of comment items is for a previously entered annotation and comprises a particular time value, particular comment input data, and a particular graphical icon having a particular distinctive appearance from among a plurality of distinctive appearances that correspond respectively to the plurality of available comment types, and where a first comment item in the plurality of comment items has a first particular time value that is at or after a current time value corresponding to a current point of play of the video program”; Fig 5, col 10:58-67 and col 11:1-20, “FIG. 5 illustrates an example screen display of a graphical user interface showing a video player window, timeline, markers with distinctive appearances, and comments …. a timeline 512 is configured below or near the video player window 510, and comprises one or more markers 514 that indicate locations at which comments have been entered in a comment list 540 comprising a plurality of comments 524, 526, 53”); and while playing, the audio file (Fig 5, col 10:58-67, play a video file): 
determining, using the processor (Fig 14, processor), that the respective temporal criterion of a particular one of the plurality of content events is met based on a current position in the timeline of the audio file (Fig 5, col 10:58-67 and col 11:1-20, “col 11:52-67 and col 12:1-2, “the time indicator 522 has a distinctive appearance that corresponds to one of the comment types 520. For example, if one of the comment types 520 is Strength and has a distinctive appearance of blue color, then the time indicator 522 is also shown in blue, and includes a time value indicating a specific time in the video at which the comment was made. In the example of FIG. 5, the time value is "4:37". The text 524 reproduces any comment text that a user entered. Note that a second comment text 530 is associated with a time indicator 528 having a different distinctive appearance--orange color in the example of FIG. 5--corresponding to a different one of the comment types 520. In an embodiment, selecting the time indicator 522 enables the user to control the video playback by changing the play head to a different specified time or time location in the timeline 512”)”  …determine a running time reach a specified time point with marker); 

    PNG
    media_image1.png
    423
    453
    media_image1.png
    Greyscale

determining, using the processor (Fig 14, processor), that the respective environmental criterion of the particular one of the plurality of events is met based on environment data of an environment (Fig 5, col 11:52-67 and col 12:1-2, “the time indicator 522 has a distinctive appearance that corresponds to one of the comment types 520. For example, if one of the comment types 520 is Strength and has a distinctive appearance of blue color, then the time indicator 522 is also shown in blue, and includes a time value indicating a specific time in the video at which the comment was made. In the example of FIG. 5, the time value is "4:37". The text 524 reproduces any comment text that a user entered. Note that a second comment text 530 is associated with a time indicator 528 having a different distinctive appearance--orange color in the example of FIG. 5--corresponding to a different one of the comment types 520. In an embodiment, selecting the time indicator 522 enables the user to control the video playback by changing the play head to a different specified time or time location in the timeline 512” … determine a running time reach a specified time point with marker related to a content); and 
in response to determining that the respective temporal criterion and the respective environmental criterion of the particular one of the plurality of content events are met, displaying, on the display, the particular one of the plurality of content events in association with the environment (Figs 5-6, col 10:58-67 and col 11:1-20, col 11:52-67 and col 12:1-2, col 12:38-59, display the content based on marker at the specified time).  
But Geller et al. keep silent for teaching at a device and including an image sensor, a speaker and while playing, via the speaker, the audio file:  captured, by the image sensor, an image of a physical environment of the device; 

    PNG
    media_image2.png
    342
    476
    media_image2.png
    Greyscale

In related endeavor, Lister et al. teach at a device and including a processor, non-transitory memory, an image sensor (par 0040), a speaker, and a display (Figs 1-2, par 0031, par 0038-0043, a device includes a processor, non-transitory memory, a speaker, and a display) and while playing, via the speaker, the audio file (par 0045, par 0053, par 0068, “data feed 510 can include media stream events 516 correlated with media stream times 514. Media stream times 514 can be designated in a variety of different ways, including using Coordinated Universal Time (abbreviated “UTC”), local time for the user, time at the virtual assistant server, time at the media server, time at the source of the media (e.g., a sports venue), or a variety of other time zones. In other examples, media stream times 514 can be provided as offsets from the beginning of media content (e.g., from the beginning of a movie, episode, sporting event, audio track, etc.)”): determining, using the processor, that the respective temporal criterion of a particular one of the plurality of content events is met based on a current position in the timeline of the audio file (Fig 5, par 0061-0073, “FIG. 5 illustrates exemplary data feed 510 associating events in media stream 512 with particular times 514 in the media stream”, Fig 6, par 0075-0076, “cued time 624 can correspond to the media stream time 514 associated with the corresponding media stream event 516. In other examples, cued time 624 can be shifted earlier or later than media stream time 514 depending on how media stream events 516 are associated with media stream times 514. For example, cued time 624 can be thirty seconds, a minute, two minutes, or another amount before the corresponding media stream time 514 to capture play just prior to a goal being scored. In some examples, data feed 510 can include precise time designations of where playback should begin for particular events (e.g., designating when a hockey player began to make a drive for the eventual goal, designating when penalty behavior was first seen, etc.)”); captured, by the image sensor, an image of a physical environment of the device (par 0040, par 0103, whether shown on a separate user device or not, secondary screen experience data, secondary camera view data, and the like can be received and used as part of a data feed to identify relevant points of interest and associated times in a media stream. For example, a secondary screen experience can include descriptions of highlights in a game. Those descriptions can be included in virtual assistant knowledge as relevant media stream events with associated media stream times, and can be used to respond to user requests”); determining, using the processor, that the respective environmental criterion of the particular one of the plurality of events is met based on environment data of an environment (Fig 5, par 0061-0073, “As shown, various other media stream events 516 can likewise be included and associated with particular media stream times 514. Details for different events can vary, and some or all of the information can be incorporated into virtual assistant knowledge. For example, details of a goal can include the player attributed with the goal and any assisting players. Details of the end of a power play can include identifying information for the team losing power play status and the team back at full force”, Fig 6, par 0075-0077, “data feed 510 can include precise time designations of where playback should begin for particular events (e.g., designating when a hockey player began to make a drive for the eventual goal, designating when penalty behavior was first seen, etc.).”, par 0103, ); and in response to determining that the respective temporal criterion and the respective environmental criterion of the particular one of the plurality of content events are met, displaying, on the display, the particular one of the plurality of content events in association with the environment (Fig 6, par 0075-0079, “playback of a media stream can be caused to commence at a time in the media stream associated with an event in a user request. For example, knowledge incorporated in the virtual assistant knowledge base from data feed 510 can be used to determine a particular time in a media stream associated with a user's request for particular content …the system can time-shift video 620 to commence playback at cued time 624 indicated on playback indicator 622. As shown, cued time 624 can differ from live time 626 (e.g., the time associated with the live televised or otherwise live distributed stream of content). In some examples, cued time 624 can correspond to the media stream time 514 associated with the corresponding media stream event 516. In other examples, cued time 624 can be shifted earlier or later than media stream time 514 depending on how media stream events 516 are associated with media stream times 514 … The content shown on display 112 and associated metadata (e.g., from data feed 510 or otherwise) can thus be used to disambiguate user requests and determine user intent. For example, on-screen actors, on-screen players, a list of game participants, a list of actors in a show, a team roster, or the like can be used to interpret user requests”).  
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Geller et al. to include at a device and including an image sensor, a speaker and while playing, via the speaker, the audio file:  captured, by the image sensor, an image of a physical environment of the device as taught by Lister et al.  to  updated media data relating events to particular times in a media stream with timely information associated with playing media (e.g., a sporting event, a television show, or the like) to playback at a time in the media stream associated with the event referred to in the request to response with audio and displayed, etc.) to interact with media control devices (e.g., televisions, television set-top boxes, cable boxes, gaming devices, streaming media devices, digital video recorders, etc.) to obtain desired content, such as specific moments in a television program.
But Geller et al. as modified by Lister et al. keep silent for teaching determining, using the processor, that the respective environmental criterion of the particular one of the plurality of events is met based on the image of the physical environment, and in response to determining that the respective temporal criterion and the respective environmental criterion of the particular one of the plurality of content events are met, displaying, on the display, the particular one of the plurality of content events in association with the physical environment.
In related endeavor, Venkitaraman et al. teach captured, by the image sensor (par 0016, par 0059, a video capture unit), an image of a physical environment of the device (par 0016-0017, par 0025, “the companion device 142 may be a smartphone that has a built-in camera and a built-in display, or a head-mounted display. The video capture unit 144 may capture the neighborhood (e.g., a user's surroundings) 152 of a user of the companion device 142 and generate a captured media stream 148 comprising images, video, audio, etc. The video capture unit 144 may deliver the captured media stream 148 to the display unit 146, which may then present the captured media stream to the user as displayed scenes 154 of the user's neighborhood 152”, par 0033-0034, “the AR system 100 may access second context information that is based on the content of captured media stream 148 captured by the companion device 142. The captured media stream 148 may be video of the user's neighborhood 152”); determining, using the processor (Fig 1, AR system), that the respective environmental criterion of the particular one of the plurality of events is met based on the image of the physical environment ( par 0018-0019, “the information 112 may be the captured media stream 148 itself; e.g., image data, video data, etc. In other embodiments, the information 112 may be data extracted from an analysis of the captured media stream 148, or data that otherwise relates to information contained in the captured media stream 148 (sometimes referred to as metadata) … the companion device 142 may render the virtual objects 134 immediately in user's field view and in some other instances, the virtual objects may be rendered if and only if the user's field of view satisfies certain criteria, such as user's field of view is heading north and up, or there is a second object in the user's field of view, etc.”, par 0033-0038, “the companion device 142 may generate the second context information using known image processing techniques to perform object detection and feature recognition to identify objects, their spatial locations, and other context in the captured media stream 148. The companion device 142 may provide the generated second context information to the AR system 100 in the form of information 112.  In some embodiments, the second context information may include the user's interactions with the companion device 142. For example, if the companion device 142 includes a motion sensor, the user may shake the companion device to indicate a significant event in the captured media 148 … the virtual objects 234 may be identified based on a comparison of the objects and events occurring the delivered media stream 104 (e.g., as represented in the media context information 232) and the objects and events occurring in the captured media stream 148 (e.g., as represented in information 112)”), and in response to determining that the respective temporal criterion and the respective environmental criterion of the particular one of the plurality of content events are met, displaying, on the display, the particular one of the plurality of content events in association with the physical environment (par 0018-0019, par 0025, par 0035-0038, “the virtual objects 234 may be identified based on a comparison of the objects and events occurring the delivered media stream 104 (e.g., as represented in the media context information 232) and the objects and events occurring in the captured media stream 148 (e.g., as represented in information 112). In some embodiments the information 112 may be expressed as user neighborhood metadata (UNM).”, par 0047-0048, “present disclosure, a time t.sub.x may be determined based on event(s) identified in the delivered media stream 104 that is delivered to the receiving device 122. For example, an event may be detected in segment x of the delivered media stream 104. The time t.sub.x associated with media segment x of the delivered media stream 104 may serve as a time relative to the timeline of the captured media stream 148 at which to introduce one or more of the identified virtual objects 234 into the user's field of view (e.g., by augmenting the displayed scenes 154). …  the object rendering data may be image data that the companion device 142 can use to display virtual objects 134 in the user's field of view, for example, in the displayed scenes 154 …. the companion device 142 may render the virtual object on its display unit 146 to create an augmented reality experience for the user. Referring to FIG. 1, for example, the displayed scenes 154 presented on the display unit 146 may represent a field of view of the user's neighborhood 152” ……identify first context from media stream and second context from captured media with time line to render and display a AR scene base on the information from media stream and captured stream).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Geller et al. as modified by Lister et al. to include determining, using the processor, that the respective environmental criterion of the particular one of the plurality of events is met based on the image of the physical environment, and in response to determining that the respective temporal criterion and the respective environmental criterion of the particular one of the plurality of content events are met, displaying, on the display, the particular one of the plurality of content events in association with the physical environment as taught by Venkitaraman et al. to provide more information about the object around user overlaid on the video to create opportunities for bringing the AR experience to users of mobile computing devices.

Regarding claim 44, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, and further teach wherein determining that the respective temporal criterion of the particular one of the plurality of content events is met comprises determining that the current position in the timeline of the audio file matches a trigger time of the particular one of the plurality of content events (Geller et al.: Fig 5, col 10:58-67 and col 11:1-20, “col 11:52-67 and col 12:1-2, “the time indicator 522 has a distinctive appearance that corresponds to one of the comment types 520. For example, if one of the comment types 520 is Strength and has a distinctive appearance of blue color, then the time indicator 522 is also shown in blue, and includes a time value indicating a specific time in the video at which the comment was made. In the example of FIG. 5, the time value is "4:37"”, Lister et al.: Fig 5, par 0061-0073, “FIG. 5 illustrates exemplary data feed 510 associating events in media stream 512 with particular times 514 in the media stream”, Fig 6, par 0075-0076, “cued time 624 can correspond to the media stream time 514 associated with the corresponding media stream event 516”).

Regarding claim 46, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, and further teach wherein determining that the respective environmental criterion of the particular one of the plurality of content events is met comprises determining, based on the image, that the physical environment is a particular environment class (Lister et al.: par 0064, “virtual assistant queries can include requests for particular media cued to a particular time. For example, a user might want to see a particular play in a game, a particular performance during a show, a particular scene in a movie, or the like”, par 0103, “secondary camera view data (e.g., video with an alternative view or vantage point than what is primarily displayed for a particular program), or the like. Such information can be used to improve speech recognition accuracy and determine user intent in a similar manner as discussed above. In addition, whether shown on a separate user device or not, secondary screen experience data, secondary camera view data, and the like can be received and used as part of a data feed to identify relevant points of interest and associated times in a media stream. For example, a secondary screen experience can include descriptions of highlights in a game. Those descriptions can be included in virtual assistant knowledge as relevant media stream events with associated media stream times, and can be used to respond to user requests. Similarly, secondary camera view data can be included in virtual assistant knowledge as relevant media stream events identifying particular media stream times where alternative camera content may be available (which can, for example, be used in responding to certain user requests)”, Venkitaraman et al.: par 0017-0019, “the information 112 may be the captured media stream 148 itself; e.g., image data, video data, etc. In other embodiments, the information 112 may be data extracted from an analysis of the captured media stream 148, or data that otherwise relates to information contained in the captured media stream 148 (sometimes referred to as metadata) …. the companion device 142 may render the virtual objects 134 immediately in user's field view and in some other instances, the virtual objects may be rendered if and only if the user's field of view satisfies certain criteria, such as user's field of view is heading north and up, or there is a second object in the user's field of view, etc.”, par 0033-0038, “the companion device 142 may generate the second context information using known image processing techniques to perform object detection and feature recognition to identify objects, their spatial locations, and other context in the captured media stream 148. The companion device 142 may provide the generated second context information to the AR system 100 in the form of information 112.  In some embodiments, the second context information may include the user's interactions with the companion device 142. For example, if the companion device 142 includes a motion sensor, the user may shake the companion device to indicate a significant event in the captured media 148 … the virtual objects 234 may be identified based on a comparison of the objects and events occurring the delivered media stream 104 (e.g., as represented in the media context information 232) and the objects and events occurring in the captured media stream 148 (e.g., as represented in information 112)”). This would be obvious for the same reason given in the rejection for claim 43.

Regarding claim 51, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, and further teach wherein displaying, on the display, the particular one of the plurality of content events in association with the physical environment includes playing, via the speaker concurrently with the audio file, a supplemental audio file associated with the particular one of the plurality of content events (Geller et al.: col 4:10-12, col 6: 26-41, “comment input at block 218 may comprise a video file, an audio track, or other audiovisual input that the user records at the time of commenting on another video, or obtains from storage and associates with the record 202. In an embodiment, the time value 210 indicates a playback time point in the video program 124 with which the comment input 208 is associated”, col 7:62-67 and col 8:1-13, col 11:1-15, “the comments 524, 526, 530 are shown as text comments, but in other embodiments comments may comprise links or icons representing video files or audio files that contain comments and have been associated with the video shown in the window 510. Timeline 512 further comprises a play head indicator 516 which represents a current position of playback of the video”, Lister et al.: par 0029, “the virtual assistant can also provide responses in other visual or audio forms (e.g., as text, alerts, music, videos, animations, etc.). Moreover, as discussed herein, an exemplary virtual assistant can control playback of media content (e.g., playing video on a television) and cause information to be displayed on a display”, Venkitaraman et al.: par 0030, par 0037, par 0049, “the object rendering data may be audio data that the companion device 142 can use to incorporate virtual objects 134 in the displayed scenes 154, for example, in the form of accompanying sound tracks. In still other embodiments, the object rendering data may be a combination of visual data and audio data”).

Regarding claim 53, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, and further teach wherein: Amendment427753-50223US1App. No.: 17/053,676 a first content event of the plurality of content events is associated with a first temporal criterion and a first environmental criterion; a second content event of the plurality of content events is associated with a second temporal criterion and a second environmental criterion; the first temporal criterion is different than the second temporal criterion; the first environment criterion is the same as the second environmental criterion; and the first content event is different than the second content event; the method comprising displaying, on the display, the first content event in accordance with the first temporal criterion and displaying the second content event in accordance with the second temporal criterion (Geller et al.: Fig 6, par 0075-0076, “cued time 624 can correspond to the media stream time 514 associated with the corresponding media stream event 516. In other examples, cued time 624 can be shifted earlier or later than media stream time 514 depending on how media stream events 516 are associated with media stream times 514. For example, cued time 624 can be thirty seconds, a minute, two minutes, or another amount before the corresponding media stream time 514 to capture play just prior to a goal being scored. In some examples, data feed 510 can include precise time designations of where playback should begin for particular events (e.g., designating when a hockey player began to make a drive for the eventual goal, designating when penalty behavior was first seen, etc.)” ….disclose display different content based on the time in the timeline in a same environment, Lister et al.: Figs. 5-6, par 0071, “In the example of FIG. 5, data feed 510 can include media stream events 516 associated with events in an ice hockey game. For example, puck drop beginning the first period of the game may have occurred at 5:07 (UTC), and data feed 510 can include an associated media stream event 516 at a particular media stream time 514 for that event. At 5:18 (UTC), a penalty may have been called against Player X for slashing Player Z, resulting in a two minute penalty. The details of the penalty (e.g., penalty type, players involved, penalty time, etc.) can be included in the media stream event 516 associated with the penalty at that particular media stream time 514. At 5:19 (UTC), a power play may have begun for Team A, and a media stream event 516 can be included that can be associated with the beginning of the power play with a particular media stream time 514.” …….disclose `display different content based on the time in the timeline in a same environment).

Regarding claim 54, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, and Geller et al. further teach comprising: storing, in the non-transitory memory (Fig 14, col 20:25-41), a plurality of audio files (col 19:25-35, “computer 102 is configured to provide lists or links to all videos, or all parts of videos (clips) that are associated with a particular category in a specified framework. In such an embodiment, computer 102 may be configured to display user interface panels or widgets that prompt for selection of a particular taxonomy and a particular category. In response, the computer 102 searches database 108 using the specified taxonomy and category as keys, and returns a list of videos that match the specified taxonomy and category”), each having an associated timeline (abstract, col 5:12-21, “obtaining access to a digitally stored video program and causing playing the video program in a player window of a second computer, wherein the player window includes a linear graphical timeline representing the video program”, col 5:57-67 and col 6:1-04, “database record 202 comprises and associates a video identifier 204, comment type 206, comment input 208, time value 210, and category 214; optionally, the record also may include an identifier of a taxonomy 202”); and storing, in the non-transitory memory in association with respective ones of the plurality of audio files (Fig 14, col 20:25-41), a plurality of content packages (col 19:25-35, “computer 102 is configured to provide lists or links to all videos, or all parts of videos (clips) that are associated with a particular category in a specified framework. In such an embodiment, computer 102 may be configured to display user interface panels or widgets that prompt for selection of a particular taxonomy and a particular category. In response, the computer 102 searches database 108 using the specified taxonomy and category as keys, and returns a list of videos that match the specified taxonomy and category”), each including a plurality of content events associated with a respective temporal criterion and a respective environmental criterion (col 5 57-67 and col 6:1-04, “database record 202 comprises and associates a video identifier 204, comment type 206, comment input 208, time value 210, and category 214; optionally, the record also may include an identifier of a taxonomy 202”, col 7:41-67 and col 8:1-13, “the taxonomy and category applicable to a particular video may be specified as part of a registration process by which a user adds metadata about a particular video to the system … “the process creates and stores a record associating the identifier of the video program, the time value, the comment type and the comment text. In this manner, the particular type of a particular comment, as well as the comment itself and the time value of the location or point at which the user signaled making the comment, become bound and associated with an identifier of the video program and stored in a record”).

Regarding claims 55-56 and 61, the claims 55-56 and 61 are similar in scope to claims 43, 46, and 53  and are rejected under the same rational.

Regarding claim 62, Geller et al. teach a non-transitory computer-readable medium having instructions encoded thereon which, when executed by one or more processors of a device including a speaker and a display, cause the device to (col 20:55-67 and col 21:1-17). The remaining limitations of the claim are similar in scope to claim 1 and rejected under the same rationale.

Claims 45 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 8,984,405 Geller et al. in view of U.S. PGPubs 2015/0382079 to Lister et al., further in view of U.S. PGPubs 2014/0176604 to Venkitaraman et al., further in view of U.S. Patent 10,503,964 to Valgardsson et al.

    PNG
    media_image3.png
    333
    492
    media_image3.png
    Greyscale

Regarding claim 45, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, but keep silent for teaching wherein determining that the respective temporal criterion of the particular one of the plurality of content events is met comprises determining that the current position in the timeline of the audio file is within a trigger time range of the particular one of the plurality of content events.
In related endeavor, Valgardsson et al. teach wherein determining that the respective temporal criterion of the particular one of the plurality of content events is met comprises determining that the current position in the timeline of the audio file is within a trigger time range of the particular one of the plurality of content events (Fig. 8, col 15:20-31 and col 16:1-31, “At a fourth event 810, behavior data segment recording of tracking data begins. At a fifth event 812, behavior data segment stops recording and the data is sent to the server 204. At a sixth event 814, scene data, which may contain averages of User behavior metrics, is sent to the server 204. At a seventh event 816, the User uses the item. At an eighth event 818, action data (which may describe the User using the item) is gathered and sent to the server 204” … disclose a behavior happen with a time range).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Geller et al. as modified by Lister et al. and Venkitaraman et al. to include wherein determining that the respective temporal criterion of the particular one of the plurality of content events is met comprises determining that the current position in the timeline of the audio file is within a trigger time range of the particular one of the plurality of content events as taught by Valgardsson et al. to provide visual observation of User behavior with dynamic events on a linear timeline in VR/AR environments to visual comparison of User behavior and offering more efficient objective evaluation of VR/AR User experiences.

Claims 47-49 and 57-59 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 8,984,405 Geller et al. in view of U.S. PGPubs 2015/0382079 to Lister et al., further in view of U.S. PGPubs 2014/0176604 to Venkitaraman et al., further in view of U.S. PGPubs 2017/0078825 to Mangiat et al.

Regarding claim 47, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, but keep silent for teaching wherein determining that the respective environmental criterion of the particular one of the plurality of content events is met comprises performing image analysis of the image of the physical environment.
In related endeavor, Mangiat et al. teach wherein determining that the respective environmental criterion of the particular one of the plurality of content events is met comprises performing image analysis of the image of the physical environment (par 0039, par 0128, “ The wearable system may recognize objects in the environment 470 (e.g., the sofa 1312 in the room 1310), for example, by analyzing the images acquired by the outward-facing imaging system or may be in communication with totems or electronic trackers disposed in the environment 470 that can be used to assist in the display of the visual graphics” ….rendering the virtual object related to recognized physical object in the environment through image analysis).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Geller et al. as modified by Lister et al. and Venkitaraman et al. to include wherein determining that the respective environmental criterion of the particular one of the plurality of content events is met comprises performing image analysis of the image of the physical environment as taught by Mangiat et al. to recognize the type and shape of object in real environment through image process to produce new environments that facilitates a comfortable, natural-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements where physical and virtual objects co-exist and interact.

Regarding claim 48, Geller et al. as modified by Lister et al., Venkitaraman et al., and Mangiat et al. teach all the limitation of claim 47, and Mangiat et al. further teach wherein determining that the respective environmental criterion of the particular one of the plurality of content events is met comprises determining that the image of the physical environment includes an object of a particular shape (par 0039, par 0128, “ The wearable system may recognize objects in the environment 470 (e.g., the sofa 1312 in the room 1310), for example, by analyzing the images acquired by the outward-facing imaging system or may be in communication with totems or electronic trackers disposed in the environment 470 that can be used to assist in the display of the visual graphics” ….rendering the virtual object related to recognized physical object in the environment through image analysis such as shape or type (sofa)). This would be obvious for the same reason given in the rejection for claim 46.

Regarding claim 49, Geller et al. as modified by Lister et al., Venkitaraman et al., and Mangiat et al. teach all the limitation of claim 47, and Mangiat et al. further teach wherein determining that the respective environmental criterion of the particular one of the plurality of content events is met comprises determining that the image of the physical environment includes an object of a particular type (par 0039, par 0128, “ The wearable system may recognize objects in the environment 470 (e.g., the sofa 1312 in the room 1310), for example, by analyzing the images acquired by the outward-facing imaging system or may be in communication with totems or electronic trackers disposed in the environment 470 that can be used to assist in the display of the visual graphics” ….rendering the virtual object related to recognized physical object in the environment through image analysis such as shape or type (sofa)). This would be obvious for the same reason given in the rejection for claim 46.

Regarding claims 57-59, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 55, the claims 57-59 are similar in scope to claims 47-49  and are rejected under the same rational.

Claims 50 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 8,984,405 Geller et al. in view of U.S. PGPubs 2015/0382079 to Lister et al., further in view of U.S. PGPubs 2014/0176604 to Venkitaraman et al., further in view of U.S. PGPubs 2018/0284955 to Canavor et al.

Regarding claim 50, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, but keep silent for teaching wherein displaying, on the display, the particular one of the plurality of content events in association with the physical environment is further performed in response to determining that one or more additional criterion is met.
In related endeavor, Canavor et al. teach wherein displaying, on the display, the particular one of the plurality of content events in association with the physical environment is further performed in response to determining that one or more additional criterion is met (par 0074, par 0090, par 0105-0107, “The visual representations of the items may be provided based in part on an image comparison algorithm. For example, the visual representations of televisions may be similar to cause the algorithm to compare the pixel composition of each image. The visual representations of the items may be similar based at least in part on various similarities between the images, including, for example, four corners, black plastic, and glass in the center of the black plastic square. Other visual characteristics may be compared and used by the computer system 304 to determine physical similarities between the items” … determine the rendering object based on the similarity or comparison algorithm wherein similarity or an image comparison algorithm as an additional criterion to determine content).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Geller et al. as modified by Lister et al. and Venkitaraman et al. to include wherein displaying, on the display, the particular one of the plurality of content events in association with the physical environment is further performed in response to determining that one or more additional criterion is met as taught by Canavor et al. to quickly search method to  recognize object in real environment and determine similar virtual object based on the similarity between items over network to save on bandwidth and computing resource consumption since less data are provided to the user.

Claims 52 and 60 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 8,984,405 Geller et al. in view of U.S. PGPubs 2015/0382079 to Lister et al., further in view of U.S. PGPubs 2014/0176604 to Venkitaraman et al., further in view of U.S. PGPubs 2014/0075317 to Dugan et al.

Regarding claim 52, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 43, and further teach wherein: a first content event of the plurality of content events is associated with a first temporal criterion and a first environmental criterion; a second content event of the plurality of content events is associated with a second temporal criterion and a second environmental criterion; the first environment criterion is different than the second environmental criterion; and the first content event is different than the second content event; the method comprising displaying, on the display, the first content event in association with the environment without displaying the second content event (Geller et al.: Fig 6, par 0075-0076, “cued time 624 can correspond to the media stream time 514 associated with the corresponding media stream event 516. In other examples, cued time 624 can be shifted earlier or later than media stream time 514 depending on how media stream events 516 are associated with media stream times 514. For example, cued time 624 can be thirty seconds, a minute, two minutes, or another amount before the corresponding media stream time 514 to capture play just prior to a goal being scored. In some examples, data feed 510 can include precise time designations of where playback should begin for particular events (e.g., designating when a hockey player began to make a drive for the eventual goal, designating when penalty behavior was first seen, etc.)”), Lister et al.: Fig. 5, par 0071, “In the example of FIG. 5, data feed 510 can include media stream events 516 associated with events in an ice hockey game. For example, puck drop beginning the first period of the game may have occurred at 5:07 (UTC), and data feed 510 can include an associated media stream event 516 at a particular media stream time 514 for that event. At 5:18 (UTC), a penalty may have been called against Player X for slashing Player Z, resulting in a two minute penalty. The details of the penalty (e.g., penalty type, players involved, penalty time, etc.) can be included in the media stream event 516 associated with the penalty at that particular media stream time 514. At 5:19 (UTC), a power play may have begun for Team A, and a media stream event 516 can be included that can be associated with the beginning of the power play with a particular media stream time 514.”), but do not explicitly teach the first temporal criterion is the same as the second temporal criterion.

    PNG
    media_image4.png
    334
    449
    media_image4.png
    Greyscale

In related endeavor, Dugan et al. teach wherein: a first content event of the plurality of content events is associated with a first temporal criterion and a first environmental criterion; a second content event of the plurality of content events is associated with a second temporal criterion and a second environmental criterion; the first temporal criterion is the same as the second temporal criterion; the first environment criterion is different than the second environmental criterion; and the first content event is different than the second content event; the method comprising displaying, on the display, the first content event in association with the environment without displaying the second content event (par 0084, “if the user wish to comment on a piece of media or content, he or she may simply click the timeline to place a `comment thumb` on the timeline, as illustrated above as call-out boxes 244 in FIG. 2. In certain embodiments, a comment thumb may be pinned to the timeline where the activity takes place, wherein the comment only becomes active when the user views the content. As an example, if the user wants to comment three times on a specific media item, he or she may find three windows waiting for them at the end of the media, (e.g., 244 as shown in FIG. 2)”, par 0106-0109, “The gallery of icons 530 includes icons representing user interaction activity associated with the media content. In an embodiment, different colors, sizes, positions, or other differentiation may represent different social media platforms, or the like. For example, a top row (e.g., including orange icons) may represent Facebook activity associated with the media content. Icons of different sizes may be utilized to illustrate varying degrees of activity concentration. Further, in an embodiment a second row (e.g., green icons) may represent Instagram posts, while a third row (e.g., blue icons) represents Twitter posts.” ….disclose generate different contents at same time in timeline and display the content based on the selection of user).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Geller et al. as modified by Lister et al. and Venkitaraman et al. to include the first temporal criterion is the same as the second temporal criterion as taught by Dugan et al. to  updated media data relating events to particular times in a media stream with timely information associated with playing media (e.g., a sporting event, a television show, or the like) to playback at a time in the media stream associated with the event referred to in the request to response with audio and displayed, etc.) to attach media contents to the media player at the relevant frame in the video to display a `smart` content stream that continuously adjusts itself according to the emotional categorization of the user's profile to allow the ability for the user to contribute his or her own content at certain places in the timeline of the displayed media.

Regarding claim 60, Geller et al. as modified by Lister et al. and Venkitaraman et al. teach all the limitation of claim 55, the claim 60 is similar in scope to claim 52  and is rejected under the same rational.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jin Ge whose telephone number is (571)272-5556. The examiner can normally be reached 8:00 to 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee M Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JIN . GE
Examiner
Art Unit 2616



/JIN GE/Primary Examiner, Art Unit 2616