Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114 was filed in this application after appeal to the Patent Trial and Appeal Board, but prior to a decision on the appeal. Since this application is eligible for continued examination under 37 CFR 1.114 and the fee set forth in 37 CFR 1.17(e) has been timely paid, the appeal has been withdrawn pursuant to 37 CFR 1.114 and prosecution in this application has been reopened pursuant to 37 CFR 1.114. Applicant’s submission filed on 11/7/22 has been entered.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over “Personalized Coverage of Large Athletic Events” by Charalampos Z. Patrikakis, et al., (hereinafter Patrikakis) in view of U.S. Patent Application Publication 2015/0362733 A1 (hereinafter Spivack) in view of U.S. Patent 5,850,352 (hereinafter Moezzi).
Regarding claim 1, the limitation “A virtual reality data system, comprising: a virtual reality data backend … the virtual reality data backend having a data mining engine that retrieves a piece of virtual reality data content and processes the piece of virtual reality data content to generate interactivity meta-data … for the piece of virtual reality data content” is taught by Patrikakis (section “Supporting interactivity and personalization”, paragraphs 1-2, section “General platform architecture”, paragraph 3, section “Platform modules”, paragraphs 2, 4, “Two issues need to be considered when addressing the requirements of a platform that facilitates creating personalized coverage of an athletic event, specifically adapting to the particular needs and conditions of each user: … Second, two kinds of user preferences are of interest. Viewers have content preferences in terms of preferred sports, athletes, and particular incidents of interest at specific times within an event (for example, a corner kick in a football match). Viewers might have preferences for view interaction—that is, the ability to manage the stream, either in terms of space (camera angle or zoom level) or in time (pause, rewind, or slow motion). With the My-eDirector 2012 platform, we have met these requirements.”, “Figure 1 (next page) presents a simplified view of the architecture, which illustrates how content traverses the different stages involved in the distribution of My-eDirector 2012 media streams to provide personalized video services to consumers. The content-creation and annotation stage is responsible for preparing media content (raw video, sensor data, and sport information data) derived from original sources; processing the raw media content to produce rich media content (encoded video and annotation metadata that includes information derived from the sports information systems and from image analysis of the video feeds) delivering media content ready for service provision (metadata and annotation information for the media streams is exploited to provide personalized video streams and corresponding recommendations based on user selections).”, “Scene analysis and person tracking is performed in the video-annotation and characterization module, which receives the video stream from each camera (or live streaming server), and subsequently produces a stream of metadata (per camera) containing information like visible athlete identities, athlete locations, and relevant game events such as goals. In addition to this metadata production, sports information and statistics data—including data such as player’s full names and their heights— are fused into the platform from external sources, resulting in the creation of a rich metadata stream that’s delivered to the knowledge base module.”, “The video streams, metadata, and recommendation information are then forwarded to the content-distribution system, responsible for synchronizing streams and multiplexing them into one or more new streams, encoded according to the standards enforced by the service provider’s infrastructure. Then the new enriched feed is finally delivered to the user(s).”  Patrikakis’ e-director system includes a backend, i.e. the system shown in figure 1 on the left-hand side in communication with terminal devices, where the backend is responsible for performing the data mining to generate annotation metadata which is provided with the encoded video to user devices for display.  As described in sections “Enhanced user interactivity” and “Use-case example”, the received metadata allows the user to control overlay of data, zoom the interface, or select different viewpoints, i.e. corresponds to interactivity metadata.  Patrikakis’ data mining operations are further described in the section “Automatic visual scene understanding and annotation”.)
The limitation “interactivity meta-data including a nature of the content, a set of frame classifications and a location of items in a scene of the content for the piece of virtual reality data content” is taught by Patrikakis (Patrikakis’ metadata indicates the nature of content, i.e. the type of sporting event depicted in a given piece of content (e.g. section “Automatic recommendation based on user preferences”, paragraph 1, indicates that the user is prompted to select different content to view based on the type of event and user preferences, and section “Enhanced user interactivity”, paragraph 2, indicates that this metadata can be requested by the player for overlaying the information onto the content view.), as well as a set of frame classifications indicating foreground content and player location data (e.g. section “Platform Modules”, paragraph 2, indicating the metadata includes identification and locations of athletes in the scene, or section “Use-case example”, paragraph 2, figure 5, where players in the foreground of the scene are identified and annotated).)
The limitation “a plurality of virtual reality devices coupled to the virtual reality data backend, each virtual reality device having a … display … and a player” is taught by Patrikakis (section “Platform modules”, paragraph 4, section Use-case example, paragraph 2, "The video streams, metadata, and recommendation information are then forwarded to the content-distribution system, responsible for synchronizing streams and multiplexing them into one or more new streams, encoded according to the standards enforced by the service provider’s infrastructure. Then the new enriched feed is finally delivered to the user(s).”, “The terminal application runs as a Web browser plug-in on Microsoft Silverlight, supporting PC and mobile devices. Users can select athletes of interest from a Web service list. During the event, selected athletes are annotated with labels, like those shown in Figure 5.”  Patrikakis indicates support for a plurality of users/devices receiving interactive personalized streaming video content.)
The limitation “a video encoding engine that generates optimized virtual reality data that includes the virtual reality data content and the generated meta-data” is taught by Patrikakis (section “Platform modules”, paragraph 4, “The video streams, metadata, and recommendation information are then forwarded to the content-distribution system, responsible for synchronizing streams and multiplexing them into one or more new streams, encoded according to the standards enforced by the service provider’s infrastructure. Then the new enriched feed is finally delivered to the user(s).”  Patrikakis’ content distribution system encodes the video and metadata into one (or more) encoded streams, i.e. as shown in figure 1, the video and metadata are transmitted as part of the same stream to the user devices.)
The limitation “the player in each virtual reality device receiving the optimized virtual reality data from the virtual reality data backend, viewing a particular scene of virtual reality content using the optimized virtual reality data … and generating interactivity for the particular viewed scene of the piece of virtual reality content using the interactivity metadata when the particular scene of virtual reality data is being displayed in the ... display” is taught by Patrikakis (section “Enhanced user activity”, paragraphs 1-2, “Use-case example”, paragraphs 1-3, “Digital audio-video terminals—in the form of programmable, embedded systems of a digital TV set, a set-top box, or an application running on a computer—offer a greater degree of user interaction with live content enhancing the user experience. The custom audio-video player supported in this system enhances the interaction when viewing live sports events to control metadata video overlays, slow-motion live play, and a zoomable user interface.  Static system metadata (for example, event information, schedule, camera views, and athletes competing in specific events) can be transmitted upon request. The viewer, as the director, can decide whether to display this information, either by pausing the live content while viewing the metadata, or by simultaneously overlaying this information into the live content view. This is in contrast to current systems in which the network director decides when to show event information that the user might not be interested in seeing. Viewers can interact with the live-stream to decide when to trigger slow motion, or when to deliver a clearer view (for example, which athletes finish a race in ascending order or to see how sports equipment is positioned with respect to reference points).”, “To demonstrate platform operation, as well as its potential for use with existing systems, a use-case example is provided. The use-case demonstrates how the advanced annotation and user-centered zooming interaction adapts the system’s content. The terminal application runs as a Web browser plug-in on Microsoft Silverlight, supporting PC and mobile devices. Users can select athletes of interest from a Web service list. During the event, selected athletes are annotated with labels, like those shown in Figure 5. A comprehensive set of controls are offered to the end user, giving the user the ability to manipulate the received stream beyond traditional DVR controls (see Figure 6). Pause, resume, go-back, and go-live controls are supported for live transmission, together with slow-motion control. By caching the live video stream on the server, users can replay transient actions such as crossing the finish-line, long-jump take-off, and so on, during a live sports event in slow-motion. Users can select a particular event or camera of interest. They can also enable the recommendation feature, based on their personal preferences. The automatic annotation of the live stream can provide on-screen recommendations for switching to another camera when other interesting events are taking place.”  Patrikakis teaches that the user device receives the encoded video and metadata and generates a display with interactive controls for the user, based on metadata received for the scene, e.g. controlling metadata overlays, camera selection, playback speed, as shown in figures 5 and 6.)
The limitation(s) “each virtual reality device having a head mounted display … a player … generating interactivity when the virtual reality data is being displayed in the head mounted display” is not explicitly taught by Patrikakis (section “Supporting interactivity and personalization”, paragraphs 1-2, “First, we must consider the specific conditions that exist on the user side with regards to terminal and networking capabilities. In this case, the proliferation of Internet and wireless network technologies in addition to the diversity of personal devices (for example, PDAs, smartphones, Palmtops, and tablets) has created a need for media streaming techniques that can be deployed over heterogeneous architectures with rapidly changing conditions, such as bandwidth, congestion, processing power, and memory availability. … With the My-eDirector 2012 platform, we have met these requirements.”  Patrikakis teaches support for a variety of user devices, e.g. section “Use-case example, paragraph 2, figure 1, mentions/shows a mobile device, PC, and a laptop, but Patrikakis does not explicitly address the use of a head mounted display device.)  However, this limitation is suggested by Spivack (Spivack describes a head mounted display (e.g. paragraphs 17-18) which displays augmented captured video content to a user (e.g. paragraphs 19-23, 68, 69, 71, 73) along with support for user interactivity/controls of the augmented captured video content (e.g. paragraphs 50, 54, 119).  Further, like Patrikakis, Spivack suggests that the display device may present views based on remotely recorded video with received metadata overlays (i.e. Spivack’s simulated objects indicating information about the batter, game status, or last play, as in paragraphs 71, 73).)
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Patrikakis’ e-Director system to provide personalized video to users of Spivack’s HMD display device because Patrikakis’ system is intended to provide personalized video to many users using a variety of systems (section “Supporting interactivity and personalization”) and Spivack indicates that at least one application for the HMD is for presenting remotely recorded video with received metadata overlays, which is the purpose of Patrikakis’ system, such that the references are complementary (i.e. describing a video content generation system and a display system for displaying the generated video content) and analogous art.
The limitations “each virtual reality device having a head mounted display, a positional sensor on the virtual reality device that detects a position of the virtual reality device and a player, wherein the virtual reality device provides an immersive visual experience to each eye of user based on the positional sensor … the virtual reality data backend having … a video encoding engine that generates optimized virtual reality data that includes the virtual reality data content and the generated meta-data; and the player … receiving the optimized virtual reality data … viewing a particular scene in the piece of virtual reality content using the optimized virtual reality data and in response to the positional sensor and generating interactivity for the particular viewed scene of the piece of virtual reality content using the interactivity meta-data when the particular scene of the virtual reality data is being displayed in the head mounted display” is partially taught by Patrikakis in view of Spivack (Patrikakis’ e-director system, as discussed above, includes a backend system performing datamining on received video to generate metadata which is provided with encoded video to user devices for display, and where the metadata is used to allow the user to interactively control overlay of data and/or select different camera viewpoints.  Further, as discussed Patrikakis’ system can be used to provide video for display in Spivack’s HMD display device, e.g. as in paragraph 73, delivering a video feed for presentation in the HMD.  Spivack, e.g. paragraph 73, figure 2, teaches that the HMD display device can present stereoscopic 3D images, i.e. a distinct image presented for each of the user’s eyes, and further, e.g. paragraphs 39, 42, 51, that the HMD can include a location and motion sensors, which can be used for adjusting a perspective of a simulated environment based on detected motion, i.e. a position sensors for detecting the position of the HMD which can be used to adjust the viewing orientation of a virtual reality scene.  However, while Patrikakis’ system supports allowing a viewer to select different viewpoints, they are limited to the recorded video viewpoints captured using conventional 2D cameras, such that although Spivack teaches that the HMD can support rendering virtual reality images of various simulated environments, e.g. paragraphs 116, 121, 127, based on the detected position and location of the HMD and for each of the user’s eyes, as noted, Patrikakis’ system, as disclosed, would not provide Spivack’s HMD with the necessary information for generating stereoscopic immersive video to the user’s eyes, i.e. Patrikakis’ video and metadata does not describe a virtual simulated environment, per se.)  However, this limitation is taught by Moezzi (Moezzi, e.g. abstract, cols 9-20, describes a system for generating immersive video which generates stereoscopic synthetic images based on a virtual environment/scene description derived from a real world scene captured on video from multiple viewpoints.  Moezzi, e.g. cols 22-39, describes details of how the immersive video system constructs the environment models, including, cols 26-29, the video data analyzer that initially analyzes every frame of video to detect and track objects, and identify events, analogous to Patrikakis’ system, e.g. section Automatic visual scene understanding and annotation, which can then be used to generate an environment model for rendering synthetically generated views of the real world scene, e.g. col 26, lines 34-37, col 35, line 1 – col 36, line 27.  Moezzi, e.g. col 10, lines 1-37, further teaches that stereoscopic views can be synthesized for a viewer based on the known position and orientation of the viewer’s head and eyes, which could be determined from a helmet worn by the viewer, which as shown in figure 5 corresponds to a head mounted display.)
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Patrikakis’ e-Director system, providing personalized video to users of Spivack’s HMD display device, to include Moezzi’s immersive video generation system in order to allow users of Spivack’s HMD display device to receive, in addition to the encoded video and interactivity metadata, Moezzi’s environment model and associated data, for the purpose of rendering synthetic stereoscopic views for the user(s) of Spivack’s HMD in response to the detected position and orientation of the user’s eyes as taught by Moezzi.  As noted above, Spivack suggests generating stereoscopic 3D images based on synthetic environment models, as well as changing the viewpoint based on detected motion of the HMD, and Patrikakis’ e-director system already performs some 2D video analysis for tracking players and detecting events, such that modifying Patrikakis’ e-Director system to include Moezzi’s environment model generation system would be a straightforward improvement, i.e. instead of stopping at tracking players and detecting events, which corresponds to the first set of tasks in Moezzi's environment model generation procedure, the modified e-Director system would perform the remaining steps in Moezzi’s procedure to generate the environment model and associated data, which would be provided along with the encoded video and interactivity metadata, thereby allowing Spivack’s HMD to render stereoscopic synthetic images based on the position of the viewer’s head and eyes as determined by the HMD sensors, as taught by Moezzi, e.g. cols 6, 10, figure 5.  The rendered stereoscopic synthetic images based on the position of the viewer’s head and eyes as determined by the HMD sensors correspond to the claimed “immersive visual experience [provided] to each eye of a user based on the positional sensor”, where the virtual reality scene is rendered by Spivack’s HMD using the virtual reality data received from the e-Director system and in response to the position/motion sensors.
Regarding claim 2, the limitation “wherein the data mining engine further comprises a machine learning element that performs machine learning to improve the generated meta-data” is taught by Patrikakis (section “Automatic visual scene understanding and annotation”, paragraphs 9, 10, 12-15, “The reliable identification of an athlete in the scene, captured solely by sports camera feeds, depends upon many factors and can also vary in time. Such issues include the nature of the event being observed (for example, the number of athletes and whether the event dictates that they stay in lanes); the readability of the text affixed to the athletes (usually a name or number), how fast the text moves, size changes, distortions, occlusions, and illumination changes; the visibility of the face, its orientation toward the camera, its size, facial distortions due to effort, and illumination changes; and the appearance of athletes’ clothing, team colors, illumination changes, and body dynamics. Athlete identification hypotheses are generated through the temporal fusion of bibliographical text hypotheses (checked against competing athlete’s names), face ID estimates, and color appearance likelihoods (generated by person tracking). … Due to the complexity of the athletics scenario, it’s infeasible to rely on a single visual cue to detect and track the presence of competitors. Because positive ID matches are not always available, an additional person-tracking algorithm was implemented. This algorithm is based upon the fused output from the aforementioned vision modules and a mean-shift approach using a color histogram composed of three sections (head, torso, and hips). The fusion of the processed outputs generates athlete probability regions, which optimize the person-tracking algorithm. Figure 3c shows a typical frame from this fusion. For each region that exhibits the characteristics of a person, an appearance model is generated (if a model has not already been assigned, or the new confidence is higher). The meanshift tracker subsequently attempts to maintain and modify the target model by maximizing a similarity metric (using a Bhattacharyya coefficient, which measures the amount of overlap between two statistical samples) with respect to the stored model. Changes to the apparent size of the target (scale) are also taken into account, thus making small adaptations to the dimensions of the model. The model is updated if a more reliable one is generated by the fusion step, and when a track confidence becomes unreliable, it’s terminated.”  Patrikakis indicates that the person tracking algorithm generates appearance models based on visual cues related to the athlete(s), where the appearance models are updated over time in order to improve and maintain the tracking.  As discussed on page 7 of Applicant’s disclosure, using a plurality of visual cues to derive the motion of players on the field over time, i.e. learning a model for different tracked players, corresponds to performing machine learning, such that Patrikakis’ system, which generates and updates (i.e. learns) a model for tracking players, comprises a machine learning element performing machine learning to improve the meta-data.) 
Regarding claim 3, the limitation “wherein the backend further comprises a storage that stores the meta-data generated for each piece of virtual reality content” is taught by Patrikakis (section “Platform modules”, paragraph 2 “Scene analysis and person tracking is performed in the video-annotation and characterization module, which receives the video stream from each camera (or live streaming server), and subsequently produces a stream of metadata (per camera) containing information like visible athlete identities, athlete locations, and relevant game events such as goals. In addition to this metadata production, sports information and statistics data—including data such as player’s full names and their heights— are fused into the platform from external sources, resulting in the creation of a rich metadata stream that’s delivered to the knowledge base module.” Patrikakis teaches that the metadata is stored in the knowledge base module, which is part of the e-director system, as shown in figure 1, left hand side. )
Regarding claim 4, the limitation “wherein the storage stores the optimized virtual reality data from the piece of virtual reality content” is implicitly taught by Patrikakis (section “Platform modules”, paragraph 4, section “Use-case example”, paragraph 3, “The video streams, metadata, and recommendation information are then forwarded to the content-distribution system, responsible for synchronizing streams and multiplexing them into one or more new streams, encoded according to the standards enforced by the service provider’s infrastructure. Then the new enriched feed is finally delivered to the user(s).”, “A comprehensive set of controls are offered to the end user, giving the user the ability to manipulate the received stream beyond traditional DVR controls (see Figure 6). Pause, resume, go-back, and go-live controls are supported for live transmission, together with slow-motion control. By caching the live video stream on the server, users can replay transient actions such as crossing the finish-line, long-jump take-off, and so on, during a live sports event in slow-motion.”  As discussed in the claim 1 rejection above, Patrikakis’ content distribution system encodes the video and metadata into one (or more) encoded streams, i.e. as shown in figure 1, the video and metadata are transmitted as part of the same stream to the user devices.  Patrikakis also teaches that the stream(s) are cached at the server in order to allow the user to replay events later, such that the video content and corresponding meta-data are stored in the e-Director system.  While Patrikakis teaches that the metadata and video are encoded together for transmission to the user (as in figure 1) and that the metadata and video are cached for later playback, Patrikakis does not address whether the video stream(s) being cached are the combined encoded metadata and video stream(s) or the separate input video feed(s).  One of ordinary skill in the art would have found it implicit that the video stream(s) being cached are the combined encoded metadata and video stream(s), because while there is an apparent advantage to storing the combined encoded metadata and video stream(s), i.e. avoiding repetition of the combining operation(s), there is no apparent advantage to storing the separate input video feeds, i.e. there is no stated use for the input video feeds after content analysis/metadata generation has been performed other than being combined with the metadata for transmission to the user.  Furthermore, one of ordinary skill in the art would have found it obvious to try implementing Patrikakis’ e-director system with caching of the combined encoded metadata and video streams as there are only two apparent possible implementation choices, i.e. caching the combined encoded metadata and video stream(s) or caching the separate input video feed(s) and repeating the combining operation(s).)
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date to implement Patrikakis’ e-Director system, providing personalized video to users of Spivack’s HMD display device, including Moezzi’s immersive video generation system, with caching of the combined encoded metadata and video streams for later playback by user devices as taught by Patrikakis, because one of ordinary skill in the art would have found it implicit that the video stream(s) being cached are the combined encoded metadata and video stream(s), because there is an apparent advantage to storing the combined encoded metadata and video stream(s), but there is no apparent advantage to storing the separate input video feeds.  Furthermore, one of ordinary skill in the art would have found it obvious to try implementing Patrikakis’ e-director system with caching of the combined encoded metadata and video streams as there are only two apparent possible implementation choices, i.e. caching the combined encoded metadata and video stream(s) or caching the separate input video feed(s) and repeating the combining operation(s).  It is additionally noted that Moezzi’s environment model and associated data are part of the metadata generated for each scene in Patrikakis’ e-Director system, i.e. the model and associated data would also be combined with the video streams in this modification.
Regarding claim 5, the limitation “wherein the player generates a trigger that initiates the interactivity based on the received meta-data received from the backend” is taught by Patrikakis in view of Spivack (section “Enhanced user activity, paragraphs 1-3, section “Use-case example”, paragraph 3, paragraph 50, 54, section “Enhanced user activity”, paragraphs 1-2, “Use-case example, paragraphs 2-3, “The custom audio-video player supported in this system enhances the interaction when viewing live sports events to control metadata video overlays, slow-motion live play, and a zoomable user interface.  Static system metadata (for example, event information, schedule, camera views, and athletes competing in specific events) can be transmitted upon request. The viewer, as the director, can decide whether to display this information, either by pausing the live content while viewing the metadata, or by simultaneously overlaying this information into the live content view. This is in contrast to current systems in which the network director decides when to show event information that the user might not be interested in seeing. Viewers can interact with the live-stream to decide when to trigger slow motion, or when to deliver a clearer view (for example, which athletes finish a race in ascending order or to see how sports equipment is positioned with respect to reference points).”, “To demonstrate platform operation, as well as its potential for use with existing systems, a use-case example is provided. The use-case demonstrates how the advanced annotation and user-centered zooming interaction adapts the system’s content. The terminal application runs as a Web browser plug-in on Microsoft Silverlight, supporting PC and mobile devices. Users can select athletes of interest from a Web service list. During the event, selected athletes are annotated with labels, like those shown in Figure 5. A comprehensive set of controls are offered to the end user, giving the user the ability to manipulate the received stream beyond traditional DVR controls (see Figure 6). Pause, resume, go-back, and go-live controls are supported for live transmission, together with slow-motion control. By caching the live video stream on the server, users can replay transient actions such as crossing the finish-line, long-jump take-off, and so on, during a live sports event in slow-motion. Users can select a particular event or camera of interest. They can also enable the recommendation feature, based on their personal preferences. The automatic annotation of the live stream can provide on-screen recommendations for switching to another camera when other interesting events are taking place.”, “The external stimulus occurring in the real world that can affect characters of simulated objects can include, environmental factors in a physical location, user stimulus, provided by the user of the device 102 or another user using another device and/or at another physical location, motion/movement of the device 102, gesture of the user using the device 102.”, “In one embodiment, the rendering module 314 generates or renders a user interface for display on via the head-mounted device 102. … In one embodiment, the user interface is interactive in that the user is able to select a region on the map in the user interface.”  As discussed in the claim 1 rejection above, Patrikakis teaches that the user device receives the encoded video and metadata and generates a display with interactive controls for the user, e.g. controlling metadata overlays, camera selection, playback speed, as shown in figures 5 and 6.  Further, as in the modification of claim 1, Patrikakis’ system may provide personalized video to users of Spivack’s HMD device, where Spivack’s HMD device supports display of interactive user interfaces, where the user is able to provide input through user stimulus, motion of the device, or gesture recognition, such that in the combination Spivack’s HMD device generates an interactive user interface (i.e. trigger(s)) that affect the personalized display settings (i.e. initiate(s) the interactivity) based on the metadata received from Patrikakis’ system).
Regarding claim 6, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 1 above.
Regarding claim 7, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 2 above.
Regarding claim 8, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 3 above.
Regarding claim 9, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 4 above.
Regarding claim 10, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 5 above.
Regarding claim 11, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 1 above, except that the limitations are written with respect to the elements and functions performed by the user HMD device(s) rather than with respect to the elements and functions performed by the virtual reality backend, although the same elements and functions are effectively required by both claims, i.e. a backend/source of optimized virtual reality data generated by processing a retrieved piece of content to generate interactivity metadata to be combined with the retrieved content which is transmitted to a HMD display device, where the HMD display device uses the received optimized virtual reality data to display the virtual reality data and generate interaction controls based on the received metadata, including generating stereoscopic synthetic views for the viewer using Moezzi’s environment model.
Regarding claim 12, the limitation “wherein the nature of the content is selected from a group including a sport, a movie and talking heads” is taught by Patrikakis (Patrikakis’ metadata indicates the nature of content, i.e. the type of sporting event depicted in a given piece of content (e.g. section “Automatic recommendation based on user preferences”, paragraph 1, indicates that the user is prompted to select different content to view based on the type of event and user preferences, and section “Enhanced user interactivity”, paragraph 2, indicates that this metadata can be requested by the player for overlaying the information onto the content view.).) 
The limitation “wherein the set of frame classifications is selected from a group including a foreground location, a background location, and main elements detection and wherein the location of items in a scene of the content is selected from a group including a location of a score board in a sport, a location of an ad banner in a sport, and a location of each player in a field” is taught by Patrikakis (Patrikakis’ metadata indicates a set of frame classifications indicating foreground content and player location data (e.g. section “Platform Modules”, paragraph 2, indicating the metadata includes identification and locations of athletes in the scene, or  section “Use-case example”, paragraph 2, figure 5, where players in the foreground of the scene are identified and annotated).)
Regarding claims 13 and 14, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 12 above.
	Regarding claim 15, the limitation “wherein the player in each virtual reality device receives a subset of the optimized virtual reality data from the virtual reality data backend based on the particular scene being viewed by the user” is taught by Patrikakis (Patrikakis’ system delivers a selected video+metadata stream to each end user/player (e.g. figure 1, sections “Platform modules”, “Dynamic adaptation of the media stream”), i.e. of all of the processed video stored in the content-distribution system, each user/player only received the selected content they are viewing, associated metadata (including Moezzi’s environment model and associated data as discussed in the claim 1 rejection above), and recommendation data, where the user can switch between different scenes/videos/events (e.g. section “Automatic recommendation based on user preferences”), as well as select different bitrates of content (section “Dynamic adaptation of the media stream”, paragraphs 1-2) which would lead to receiving a different subset of the processed video stored in the content-distribution system based on the user’s scene selection(s) and other factors.)
Regarding claims 16, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 1 above.
Regarding claims 17 and 18, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 15 above.

Response to Arguments
Applicant's arguments filed 11/7/22 have been fully considered but they are not persuasive. 
Applicant asserts that none of the sensors in Spivack are a positional sensor that determines a direction of view of the virtual reality device.  As noted in the above rejection, Spivack, paragraph 51, discloses a accelerometer motion sensor for determining the perspective of the HMD, which corresponds to Applicant’s disclosure, e.g., page 3, paragraph 2, indicating that an accelerometer may be the sensor for determining the current position/orientation of the device.  Therefore, Applicant’s assertion is not persuasive.
Applicant’s arguments with respect to claims 1, 6 and 11 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
That is, although Patrikakis and Spivack are still relied on, Moezzi’s environment model construction system and synthetic stereoscopic viewpoint rendering technique are relied on for extending the video analysis already performed by Patrikakis’ system to generate an environment model that can be used by Spivack’s system to render synthetic stereoscopic viewpoints for each eye of the viewer based on the position and orientation of the user’s eyes determined using sensors in the HMD.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT BADER whose telephone number is (571)270-3335. The examiner can normally be reached 10-6 m-f.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on 571-272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT BADER/Primary Examiner, Art Unit 2619