Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-2, 6-20 of the instant application 17/693881,  rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1,16, 2-18 of U.S. Patent No. 11308696. Although the claims at issue are not identical, they are not patentably distinct from each other because 
Claims 1-2, 6-20 of the instant application 17/693881 are anticipated by claims 1, 16, 2-18 of U.S. Patent No. 11308696 in that claims 1, 16, 2-18 contain all limitations of claims 1-2, 6-20 of the instant application 17/693881. The instant application claim is broader in every aspect than U.S. Patent No. 11308696 and is therefor an obvious variant thereof.
A chart showing the similar between instant application 17/693881 and U.S. Patent No. 11308696

Application No. 17693881
Patent No.11308696
Claim 1
A method comprising: at a device having a processor: 

obtaining a first data stream comprising rendered frames representing real content and virtual content rendered during a user experience for at least two different viewpoints; 

obtaining a second data stream comprising additional data relating to the user experience; 

determining another viewpoint that is a different viewpoint than the at least two different viewpoints rendered during the user experience; and generating a composited stream based on aligning the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience and the another viewpoint.
Claim 1
A method comprising: at a device having a processor and a computer-readable storage medium: 
obtaining a first data stream comprising rendered frames, the rendered frames comprising rendered frame content representing real content and virtual content rendered during a user experience at a plurality of instants in time for at least two different viewpoints;
 obtaining a second data stream comprising additional data relating to the user experience at the plurality of instants in time and based on the at least two different viewpoints; and forming a composited stream based on the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience, wherein the composited stream: comprises composited frames that each comprise a time-stamped n-dimensional image corresponding to a single instant in time, aligns the rendered frame content with the additional data to record content for the plurality of instants in time, and comprises another viewpoint of the recorded content for a viewer, wherein the another viewpoint is a different viewpoint than the at least two different viewpoints rendered during the user experience.
Claim 2
The method of claim 1, wherein the second data stream comprises a frame stream of two-dimensional (2D) views of the user experience associated with user viewpoints and cropped frames each formed by identifying a common portion of a left eye view and a right eye view of the user experience.
Claim 16
The method of claim 1, wherein the second data stream comprises a frame stream of 2D views of the user experience associated with user viewpoints for the plurality of instants in time and cropped frames each formed by identifying a common portion of a left eye view and a right eye view of the user experience.
Claim 6
 The method of claim 1, further comprising recording the user experience by recording the composited stream on a non-transitory computer-readable medium.
Claim 2
The method of claim 1, further comprising recording the user experience by recording the composited stream on a non-transitory computer-readable medium.
Claim 7
The method of claim 1, further comprising live streaming the user experience by encoding the composited stream according to a predetermined live streaming format.
Claim 3
The method of claim 1 further comprising live streaming the experience by encoding the composited stream according to a predetermined live streaming format.
Claim 8
The method of claim 1, wherein the composited stream comprises three-dimensional (3D) models representing 3D geometries of the virtual content or the real content.
Claim 4
The method of claim 1, wherein the composited stream comprises three dimensional (3D) models representing 3D geometries of the virtual content or the real content
Claim 9
The method of claim 1, further comprising identifying real or virtual audio sources producing audio during the user experience, wherein the composited stream identifies the real or virtual audio sources.

Claim 6
The method of claim 1 further comprising identifying real or virtual audio sources producing audio during the user experience, wherein the composited stream identifies the real or virtual audio sources
Claim 10
The method of claim 1, wherein the additional data comprises metadata associated with individual instants in time of a plurality of instants in time.

Claim 7
The method of claim 1, wherein the additional data comprises metadata associated with individual instants in time of the plurality of instants in time
Claim 11
The method of claim 10, wherein the metadata identifies a real physical property of the user experience
Claim 8
The method of claim 7, wherein the metadata identifies a real physical property of the user experience
Claim 12
The method of claim 10, wherein the metadata identifies a person detected via computer-implemented object detection.
Claim 9
The method of claim 7, wherein the metadata identifies a person detected via computer-implemented object detection
Claim 13
The method of claim 10, wherein the metadata identifies a body part of a user detected via computer-implemented object detection.
Claim 10
The method of claim 7, wherein the metadata identifies a body part of a user detected via computer-implemented object detection.
Claim 14
The method of claim 1, wherein the additional data comprises second rendered content of the user experience from a second device separate from the device.
Claim 11
The method of claim 1, wherein the additional data comprises second rendered content of the user experience from a second device separate from the device
Claim 15
The method of claim 1, wherein the virtual content is provided by an app executing within a framework that provides the user experience, wherein the app has exclusive use of resources of the framework.

Claim 12
The method of claim 1, wherein the virtual content is provided by an app executing within a framework that provides the user experience, wherein the app has exclusive use of resources of the framework
Claim 16
The method of claim 1, wherein the virtual content is provided by an app executing within a framework that provides the user experience, wherein the app shares use of resources of the framework with other apps.
Claim 13
The method of claim 1, wherein the virtual content is provided by an app executing within a framework that provides the user experience, wherein the app shares use of resources of the framework with other apps.
Claim 17
The method of claim 1 further comprising applying an inverse transform to unwarp foveated images in the rendered frames to produce un-foveated images, wherein the composited stream comprises the un-foveated images
Claim 14
The method of claim 1 further comprising applying an inverse transform to unwarp foveated images in the rendered frames to produce un-foveated images, wherein the composited stream comprises the un-foveated images
Claim 18
The method of claim 1, wherein the device is a head-mounted device (HMD), a controller communicative coupled to the HMD in the same physical environment as the HMD, or a server communicatively coupled to the HMD in a separate physical environment from the HMD.
Claim 15
The method of claim 1, wherein the device is a head-mounted device (HMD), a controller communicative coupled to the HMD in the same physical environment as the HMD, or a server communicatively coupled to the HMD in a separate physical environment from the HMD
Claim 19
A system comprising: a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising: obtaining a first data stream comprising rendered frames representing real content and virtual content rendered during a user experience for at least two different viewpoints; obtaining a second data stream comprising additional data relating to the user experience; determining another viewpoint that is a different viewpoint than the at least two different viewpoints rendered during the user experience; and generating a composited stream based on aligning the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience and the another viewpoint.
Claim 17
A system comprising: a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising: obtaining a first data stream comprising rendered frames, the rendered frames comprising rendered frame content representing real content and virtual content rendered during a user experience at a plurality of instants in time for at least two different viewpoints; obtaining a second data stream comprising additional data relating to the user experience at the plurality of instants in time and based on the at least two different viewpoints; and forming a composited stream based on the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience, wherein the composited stream: comprises composited frames that each comprise a time-stamped n-dimensional image corresponding to a single instant in time, aligns the rendered frame content with the additional data to record content for the plurality of instants in time, and comprises another viewpoint of the recorded content for a viewer, wherein the another viewpoint is a different viewpoint than the at least two different viewpoints rendered during the user experience.
Claim 20
A non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising: obtaining a first data stream comprising rendered frames representing real content and virtual content rendered during a user experience for at least two different viewpoints; obtaining a second data stream comprising additional data relating to the user experience; determining another viewpoint that is a different viewpoint than the at least two different viewpoints rendered during the user experience; and generating a composited stream based on aligning the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience and the another viewpoint.


Claim 18
A non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising: obtaining a first data stream comprising rendered frames, the rendered frames comprising rendered frame content representing real content and virtual content rendered during a user experience at a plurality of instants in time for at least two different viewpoints; obtaining a second data stream comprising additional data relating to the user experience at the plurality of instants in time and based on the at least two different viewpoints; and forming a composited stream based on the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience, wherein the composited stream: comprises composited frames that each comprise a time-stamped n-dimensional image corresponding to a single instant in time, aligns the rendered frame content with the additional data to record content for the plurality of instants in time, and comprises another viewpoint of the recorded content for a viewer, wherein the another viewpoint is a different viewpoint than the at least two different viewpoints rendered during the user experience.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
1.	 Claim 1,3-6, 15, 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”)
Regarding independent claim 1, Kohler teaches a method comprising: 
at a device having a processor  (¶0110 “The logic machine 1202 may include one or more processors configured to execute software instructions.”) and a computer-readable storage medium (¶0112] Storage machine 1204 may include removable and/or built-in devices. Storage machine 1204 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory”): 
obtaining a first data stream comprising rendered frames representing real content and virtual content rendered during a user experience for at least two different viewpoints (¶0046 “FIG. 8 shows example video streams that may be produced by an augmented-reality device (e.g., the HMD 104 shown in FIG. 1 or the mobile computing device 202 shown in FIG. 2). An augmentation image stream 800 may include a plurality of augmentation image frames 802 (e.g., 802A, 802B, 802C, 802D, 802E). Each of the plurality of augmentation image frames 802 may be rendered from a three-dimensional model. Each of the plurality of augmentation image frames 802 may include a virtual timestamp 804. A real-world (e.g., visible-light) image stream 806 may include a plurality of real-world image frames 808 (e.g., 808A, 808B, 808C, 808D, 808E, 808F, 808G, 808H, 808I, 808J). Each of the plurality of real-world image frames 808 may be obtained from the point-of-view camera. Each of the plurality of real-world image frames 808 may include a real-world timestamp 810. In the illustrated example, the augmentation image stream 800 and the real-world image stream 806 are depicted as having different frame rates. An updated augmentation image stream 812 may include a plurality of updated augmentation image frames 814 (e.g., 814A, 814B, 814C, 814D, 814E, 814F, 814G, 814H, 814I). Each of the plurality of augmentation image frames 802 may be generated by applying a transformation to a corresponding augmentation image 802.”; ¶0014  “The present disclosure is directed to an approach for controlling an augmented-reality device to output augmentation imagery in a performant manner by reducing a number of image rendering operations that are performed by the augmented-reality device. In particular, the augmented-reality device may eschew continuously performing image rendering operations to output augmentation imagery (e.g., for display or as a mixed-reality recording) by employing various post-rendering re-projection techniques to produce augmentation images that approximate different real-world perspectives. For example, an augmentation image may be rendered from a first real-world perspective of a display that visually presents the augmentation image, and a transformation may be applied to the augmentation image to yield an updated augmentation image that approximates a second real-world perspective of a point-of-view camera used to create a mixed-reality recording”; ¶0043 “In one example, the change in virtual perspective may correspond to a change in position between a point-of-view camera imaging the physical space and an at least partially see-through display that is viewed by a user eye. In another example, the change in virtual perspective may correspond to a change in position between different displays of a stereoscopic display (e.g., a left-eye display and a right-eye display). The change in virtual perspective may correspond to a change in position between any suitable different real-world perspectives.”; ¶0088 “In some implementations, the above described image transformations may be applied to two or more point-of-view cameras having different perspectives. For example, the image transformations could be applied to a stereoscopic configuration where two augmentation images with different perspective are visually presented to two displays (e.g., left eye, right eye)”; ¶0072 “ FIG. 10 shows an example virtual model 1000 that may be viewed from a first virtual perspective 1002 (e.g., the augmentation image) and a second virtual perspective 1004 (e.g., the updated augmentation image). When the virtual model 1000 is viewed from the first virtual perspective 1002 (e.g., rendered as the augmentation image), an occluded portion 1006 (indicated by dotted lines) is not visible from the first virtual perspective 1002. When the transformation is applied to the augmentation image to simulate the second virtual perspective 1004, the portion 1006 becomes visible” where rendered at different perspectives is considered as least two different viewpoints); 
obtaining a second data stream comprising additional data relating to the user experience (¶0084 In some implementations where a mixed-reality recording is output by the augmented-reality device, the mixed-reality recording may include an audio component. Accordingly, at 338, the method 300 optionally may include obtaining real-world audio data, via one or more microphones of the augmented-reality device. The real-world audio data may be timestamped in order to be synchronized with other layers of the mixed-reality recording. In particular, in order to provide an accurate mixed-reality experience, both virtual and real-world audio streams may be captured and synchronized with each other via virtual and real-world timestamps to form a composite audio stream”); and 
generating a composited stream based on aligning the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience (¶0048 “For example, real-world image frame 808B and real-world image frame 808C are obtained via the point-of-view camera after augmentation image frame 804A has been rendered but prior to augmentation image 804B being rendered. Instead of layering augmentation image 804A on both real-world image frames 808B and 808C, different transformations may be applied to augmentation image frame 804A to yield updated augmentation image frames 814A and 814B that correspond to real-world image frame 808B and real-world image frame 808C, respectively. In particular, the pose or extrinsic position data of timestamp 810B may be used to select/apply the transformation that yields the updated augmentation image 814A, and the extrinsic data of timestamp 810C may be used to select/apply a different transformation that yields the updated augmentation image 814B. The extrinsic data in each timestamp may be used to approximate the perspective of the augmented-reality device when the real-world image is obtained. In this way, the updated augmentation image frames may be generated to accurately represent the virtual content in between the augmentation images being rendered”; ¶0085 “In some implementations, at 340, the method 300 optionally may include outputting the composite audio stream as a layer of the mixed-reality recording. In particular, the composite audio stream may be synchronized with a composite video stream that includes virtual and real-world video layers. The composite audio stream and the composite video stream may be synchronized via virtual and real-world timestamps of the virtual and real-world audio and video data.”) Kohler is understood to be silent on the remaining limitations of claim 1.
In the same field of endeavor obtaining a first data stream comprising rendered frames representing real content and virtual content rendered during a user experience for at least two different viewpoints (¶0030-0032 “ FIG. 3 depicts computing device 302 in three different orientations, A, B, and C, and the impact these orientations have on the digital scene and an augmentation video feed. Computing device 302 may be configured to carry out this process through the integration of a content augmentation environment (e.g. content augmentation environment 122 of FIGS. 1 and 2). The content augmentation environment may be configured to cause a portion of textual content 304 to be rendered on one portion of the display device and augmentation video feed 308 having a portion of a digital scene, associated with the portion of textual content 304, to be rendered on another portion of the display device. [0031] At orientation A, computing device 302 may be positioned in an upwards orientation. As depicted, augmentation video feed 308 at orientation A may be composed of a portion of a digital scene selected by content augmentation environment, or a module thereof. This portion of the digital scene may include clouds 310 and a top portion of a boat 312 incorporated with a real-time video feed. The real-time video feed, captured by the integrated camera, may include ceiling fan 308. As a result, augmentation video feed 308 at orientation A reflects that computing device 302 is positioned in an upwards orientation through the selection of a portion of the digital scene corresponding with such an orientation. [0032] As computing device 302 is moved downwards to orientation B, the selected portion of the digital scene may change in a manner corresponding with such movement. As depicted, augmentation video feed 308 at orientation B may be composed of a different portion of the digital scene. The different portion of the digital scene may be selected by the content augmentation environment, or a module thereof, based on the downward movement. This different portion of the digital scene may still include clouds 310, however, the position of clouds 310 on the display device may move vertically on the display device as computing device 302 is moved down. Furthermore, while only a top of boat 312 was displayed at orientation A, almost the entirety of boat 312 is displayed at orientation B. In addition to the changes to the portion of the digital scene, the physical scene with which the portion of the digital scene is incorporated also changes based on the physical scene captured by the integrated camera at orientation B. As depicted, the physical scene still includes a portion of ceiling fan 308, however, ceiling fan 308 in the physical scene has moved in a similar manner to clouds 310. In addition, physical scene now includes a top of a table 314. As discussed above, content augmentation environment may be configured to calculate one or more planes created by table 314 and may adapt the portion of the digital scene to conform to the one or more planes. As depicted, boat 312 of the digital scene has been adapted in this manner to conform to table 314.”);
determining another viewpoint that is a different viewpoint than the at least two different viewpoints rendered during the user experience (¶0033 “As computing device 302 is moved further downwards to orientation C, the selected portion of the digital scene continues to change in a manner corresponding with such movement. As depicted, augmentation video feed 308 at orientation C may be composed of a third portion of the digital scene. The third portion of the digital scene may also be selected by the content augmentation environment, or a module thereof, based on the downward movement. As can be seen, clouds 310 depicted in orientations A and B have moved out of frame in orientation C along with ceiling fan 308 of the physical scene. Boat 312 has moved further vertically as computing device 302 moved further downwards and a larger portion of table 314 is now captured in the physical scene.”);
In the same field of endeavor. Yasutake teaches obtaining a first data stream comprising rendered frames representing real content and virtual content rendered during a user experience (¶0094 “The server device can then send the generated scene of the AR based environment (e.g., via signals) to the computer device of the primary user, such that the computer device of the primary user displays the scene to the primary user. As a result, the primary user can see the virtual objects related to other users, and thus interact with those virtual objects in real time within the AR based environment. In some embodiments, the scene of the AR based environment displayed to the primary user does not include a virtual object or any other type of representation related to the primary user. For example, as shown in the left part of FIGS. 7A and 7B, the primary user does not see himself/herself or any virtual object related to him/her in the scene displayed to him/her on his/her computer device. [0095] In some embodiments, although not seeing the virtual object related to the primary user, the primary user can control, navigate or manipulate his/her virtual object in the virtual world by making a movement, a gesture, or any other type of action in the real world. For example, the primary user can move his position within the room so that the relative location of his/her virtual object with respect to the virtual objects related to the non-primary users in the virtual world is changed accordingly. For another example, the primary user can make a gesture to indicate an intended action (e.g., attack, defense, communicate, etc.) in the game. As a result, his/her virtual object is triggered to perform the corresponding action accordingly.”);
data stream comprising additional data relating to the user experience (¶0092 “Each non-primary user is located at a location different from the primary user (i.e., not at the room). In some embodiments, as shown in FIGS. 7A and 7B, the non-primary users include more than one user, and at least two non-primary users are located at two different locations. Similar to the primary user, each non-primary user uses his/her computer device to capture data of himself/herself. In some embodiments, the captured data of a non-primary user includes, for example, face image data of that non-primary user. In such embodiments, the face image data of the non-primary user can include, for example, real-time video and audio data packets including extracted image data of that non-primary user's face, which is captured by a video camera of a computer device operated by that non-primary user. The computer device operated by each non-primary user can then send the captured data of that non-primary user to the server device (e.g., a cloud server);
determining another viewpoint that is a different viewpoint than the at least two different viewpoints rendered during the user experience (¶0111 “FIG. 9B depicts a scene of the AV based environment with 3D depth sensing of a female secondary player. The female secondary player watches a specified location in the AV based environment provided by the 3D geographical application corresponding to the 3D geographic location (e.g., LLA information) of the primary players. An AV application renders a virtual object (e.g., AR object) related to each primary player at an equivalent or substantially similar location to the 3D location data of the primary player in the 3D virtual world. In FIG. 9B, a primary player is rendered as a blue colored avatar wearing glasses and another primary player is rendered as a grey colored avatar with tablet. A 3D depth sensor installed at the screen can be used to measure the movement or gesture of the female secondary player's body. The dataset obtained from the 3D depth sensor includes real-time measurement of kinetic parameters that are utilized for the body gesture of the avatar model and changes in the 3D positions of the body. Thus, navigation commands of the avatar's body can be generated by the movement or gesture of the female secondary player when the initial condition of 3D location and body pose parameters are also given as initial data by the female secondary player. In FIG. 9A, the avatar of the female secondary player is rendered as a "zombie" lady in the camera view screen of the primary player's device in the real world. In the corresponding AV environment of FIG. 9C, the avatar of the female secondary player is also rendered as the "zombie" lady in the large PC screen in the AV based environment for another male secondary player. [0112] FIG. 9C depicts another scene of the AV based environment with web camera for capturing the face of another male secondary player. The male secondary player in the AV based environment can use computer peripherals (e.g. mouse, keyboard, game control pad, remote controller, etc.) to generate LLA based 3D location changes. The LLA location data is utilized to navigate the virtual flying vehicle related to the male secondary player that is rendered in the camera view screen of the primary players in the real world environment (as shown in FIG. 9A). The 3D location data is also used to display the virtual flying vehicle in the large PC screen for the female secondary player (as shown in FIG. 9B). The web camera captures the male secondary player's face and sends its texture image (e.g., real-time streaming data of face images) to map the face onto the virtual flying vehicle rendered in the mixed reality environment (both the AR based environment and the AV based environment), and
generating a composited stream based on aligning the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience and the another viewpoint (¶0092 “Each non-primary user is located at a location different from the primary user (i.e., not at the room). In some embodiments, as shown in FIGS. 7A and 7B, the non-primary users include more than one user, and at least two non-primary users are located at two different locations. Similar to the primary user, each non-primary user uses his/her computer device to capture data of himself/herself. In some embodiments, the captured data of a non-primary user includes, for example, face image data of that non-primary user. In such embodiments, the face image data of the non-primary user can include, for example, real-time video and audio data packets including extracted image data of that non-primary user's face, which is captured by a video camera of a computer device operated by that non-primary user. The computer device operated by each non-primary user can then send the captured data of that non-primary user to the server device (e.g., a cloud server).;¶0096 “Similarly, the server device can generate a scene of an AV based environment for each non-primary user. The AV based environment includes the virtual objects (e.g., AR flying vehicles) related to the non-primary users and a virtual object related to the primary user. The virtual object related to the primary user is generated at the server device based on the captured data of the primary user such as his/her location, movement, gesture, face, etc. In some embodiments, the AV based environment is a virtualized realization of the AR based environment. In some embodiments, the server device can generate the scene for displaying a movement of a virtual object related to the primary user that corresponds to a movement of the primary user in real time. In some embodiments, the server device can generate a virtual object related to a non-primary user or the primary user by, for example, mapping a picture of that user onto a surface of an AR object associated with that user. For example, as shown in FIGS. 7A and 7B, the server device can map a picture of video clip of a user (e.g., the primary user or a non-primary user) onto an AR flying vehicle to generate an animated AR creature for that user.); ¶0112 as shown in Fig. 9B and 9C “FIG. 9C depicts another scene of the AV based environment with web camera for capturing the face of another male secondary player. The male secondary player in the AV based environment can use computer peripherals (e.g. mouse, keyboard, game control pad, remote controller, etc.) to generate LLA based 3D location changes. The LLA location data is utilized to navigate the virtual flying vehicle related to the male secondary player that is rendered in the camera view screen of the primary players in the real world environment (as shown in FIG. 9A). The 3D location data is also used to display the virtual flying vehicle in the large PC screen for the female secondary player (as shown in FIG. 9B). The web camera captures the male secondary player's face and sends its texture image (e.g., real-time streaming data of face images) to map the face onto the virtual flying vehicle rendered in the mixed reality environment (both the AR based environment and the AV based environment).”)
Therefore, it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with g a scene of an augmented reality (AR) based environment that includes the first user and a virtual object related to the second user; and generating a scene of an augmented virtuality (AV) based environment that includes the virtual object related to the second user and a virtual object related to the first user as seen in Yasutake because this modification would generate a hybrid reality environment includes receiving data of a first user and a second user at different locations (abstract of Yasutake).	Thus, the combination of Kohler and Yasutake teaches a method comprising: at a device having a processor: obtaining a first data stream comprising rendered frames representing real content and virtual content rendered during a user experience for at least two different viewpoints; obtaining a second data stream comprising additional data relating to the user experience; determining another viewpoint that is a different viewpoint than the at least two different viewpoints rendered during the user experience; and generating a composited stream based on aligning the first data stream and the second data stream for the at least two different viewpoints rendered during the user experience and the another viewpoint.
Regarding claim 3, Kohler and Yasutake teach the method of claim 1, wherein the composited stream comprises composited frames that each comprise a time-stamped n-dimensional image corresponding to a single instant in time (¶0046 of Kohler “FIG. 8 shows example video streams that may be produced by an augmented-reality device (e.g., the HMD 104 shown in FIG. 1 or the mobile computing device 202 shown in FIG. 2). An augmentation image stream 800 may include a plurality of augmentation image frames 802 (e.g., 802A, 802B, 802C, 802D, 802E). Each of the plurality of augmentation image frames 802 may be rendered from a three-dimensional model. Each of the plurality of augmentation image frames 802 may include a virtual timestamp 804. A real-world (e.g., visible-light) image stream 806 may include a plurality of real-world image frames 808 (e.g., 808A, 808B, 808C, 808D, 808E, 808F, 808G, 808H, 808I, 808J). Each of the plurality of real-world image frames 808 may be obtained from the point-of-view camera. Each of the plurality of real-world image frames 808 may include a real-world timestamp 810. In the illustrated example, the augmentation image stream 800 and the real-world image stream 806 are depicted as having different frame rates. An updated augmentation image stream 812 may include a plurality of updated augmentation image frames 814 (e.g., 814A, 814B, 814C, 814D, 814E, 814F, 814G, 814H, 814I). Each of the plurality of augmentation image frames 802 may be generated by applying a transformation to a corresponding augmentation image 802.”)
Regarding claim 4, Kohler and  Yasutake teach the method of claim 1, wherein the composited stream aligns the rendered frame content with the additional data to record content for a plurality of instants in time (¶0046  of Kohler “FIG. 8 shows example video streams that may be produced by an augmented-reality device (e.g., the HMD 104 shown in FIG. 1 or the mobile computing device 202 shown in FIG. 2). An augmentation image stream 800 may include a plurality of augmentation image frames 802 (e.g., 802A, 802B, 802C, 802D, 802E). Each of the plurality of augmentation image frames 802 may be rendered from a three-dimensional model. Each of the plurality of augmentation image frames 802 may include a virtual timestamp 804. A real-world (e.g., visible-light) image stream 806 may include a plurality of real-world image frames 808 (e.g., 808A, 808B, 808C, 808D, 808E, 808F, 808G, 808H, 808I, 808J). Each of the plurality of real-world image frames 808 may be obtained from the point-of-view camera. Each of the plurality of real-world image frames 808 may include a real-world timestamp 810. In the illustrated example, the augmentation image stream 800 and the real-world image stream 806 are depicted as having different frame rates. An updated augmentation image stream 812 may include a plurality of updated augmentation image frames 814 (e.g., 814A, 814B, 814C, 814D, 814E, 814F, 814G, 814H, 814I). Each of the plurality of augmentation image frames 802 may be generated by applying a transformation to a corresponding augmentation image 802.”; ¶0085 of Kohler “In some implementations, at 340, the method 300 optionally may include outputting the composite audio stream as a layer of the mixed-reality recording. In particular, the composite audio stream may be synchronized with a composite video stream that includes virtual and real-world video layers. The composite audio stream and the composite video stream may be synchronized via virtual and real-world timestamps of the virtual and real-world audio and video data.”)
Regarding claim 5, Kohler and Yasutake teach the method of claim 1, wherein the second data stream is obtained based on the at least two different viewpoints (0046-0048 “FIG. 8 shows example video streams that may be produced by an augmented-reality device (e.g., the HMD 104 shown in FIG. 1 or the mobile computing device 202 shown in FIG. 2). An augmentation image stream 800 may include a plurality of augmentation image frames 802 (e.g., 802A, 802B, 802C, 802D, 802E). Each of the plurality of augmentation image frames 802 may be rendered from a three-dimensional model. Each of the plurality of augmentation image frames 802 may include a virtual timestamp 804. A real-world (e.g., visible-light) image stream 806 may include a plurality of real-world image frames 808 (e.g., 808A, 808B, 808C, 808D, 808E, 808F, 808G, 808H, 808I, 808J). Each of the plurality of real-world image frames 808 may be obtained from the point-of-view camera. Each of the plurality of real-world image frames 808 may include a real-world timestamp 810. In the illustrated example, the augmentation image stream 800 and the real-world image stream 806 are depicted as having different frame rates. An updated augmentation image stream 812 may include a plurality of updated augmentation image frames 814 (e.g., 814A, 814B, 814C, 814D, 814E, 814F, 814G, 814H, 814I). Each of the plurality of augmentation image frames 802 may be generated by applying a transformation to a corresponding augmentation image 802. [0047] In the illustrated example, the real-world image stream 806 has a higher frame rate than the augmentation image stream 800. In order to generate an accurate mixed-reality recording in which each real-world image frame is layered with virtual content, updated augmentation image frames may be generated to accurately represent the virtual content in the mixed-reality recording in between successive augmentation image frames being rendered. In particular, a transformation may be selected to be applied to a given augmentation image frame to yield a corresponding updated augmentation image frame based on a real-world time stamp of a corresponding real-world image frame.”; ¶0085  In some implementations, at 340, the method 300 optionally may include outputting the composite audio stream as a layer of the mixed-reality recording. In particular, the composite audio stream may be synchronized with a composite video stream that includes virtual and real-world video layers. The composite audio stream and the composite video stream may be synchronized via virtual and real-world timestamps of the virtual and real-world audio and video data.”)
Regarding claim 6, Kohler and Yasutake teach the method of claim 1, further comprising recording the user experience by recording the composited stream on a non-transitory computer-readable medium (¶0079 of Kohler “In some implementations, the mixed-reality recording may be stored in a storage machine (e.g., either local to the augmented-reality device or a remote storage machine, such as a network-connected storage machine) for visual presentation at a later time”)
Regarding claim 15, Kohler and Yasutake teach the method of claim 1, wherein the virtual content is provided by an app executing within a framework that provides the user experience, wherein the app has exclusive use of resources of the framework (¶0030  of Kohler “At 302, the method 300 may include rendering from a three-dimensional model a two-dimensional augmentation image from a first virtual perspective. The three-dimensional model may include any suitable virtual content (e.g., hologram) that may be produced by any suitable application of the augmented-reality device. For example, the three-dimensional model may include a virtual scene or virtual objects of a video game.”)
Regarding claim 18, Kohler and Yasutake teach the method of claim 1, wherein the device is a head-mounted device (HMD), a controller communicative coupled to the HMD in the same physical environment as the HMD, or a server communicatively coupled to the HMD in a separate physical environment from the HMD.(¶0090 “The HMD 1100 includes an at least partially see-through display 1102 and a controller 1104. The controller 1104 may be configured to perform various operations related to visual presentation of augmented-reality and mixed-reality image on the at least partially see-through display 1102.”)
Regarding independent claim 19, Kohler teaches a system comprising: a non-transitory computer-readable storage medium (¶0112] Storage machine 1204 may include removable and/or built-in devices. Storage machine 1204 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory”) ; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors (¶0110 “The logic machine 1202 may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic machine 1202 may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Processors of the logic machine 1202 may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing”), cause the system to perform operations comprising Remaining of claim 19 is similar in scope to claim 1, and therefore rejected under the same rationale.
Regarding independent claim 20, Kohler teaches a non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations (¶0111” Storage machine 1204 includes one or more physical devices configured to hold instructions executable by the logic machine 1202 to implement the methods and processes described herein. When such methods and processes are implemented, the state of storage machine 1204 may be transformed--e.g., to hold different data.”) comprising: Remaining of claim 20 is similar in scope to claim 1, and therefore rejected under the same rationale.
2.	 Claim  2 is rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of Tourapis at al, U.S Patent Application Publication No. 20120026288 (“Tourapis”) further in view of Meiyappan, IDS, U.S Patent Application Publication No.20080002878 (“Meiyappan”)
Regarding claim 2, Kohler and Yasutake teach the method of claim 1, Kohler and Yasutake are understood to be silent on the remaining limitations of claim 2.
In the same field of endeavor, Tourapis teaches wherein the second data stream comprises a frame stream of two-dimensional (2D) views of the user experience associated with user viewpoints (¶0017 “In one embodiment, the data stream comprises a video comprising at least two different views and the signal further indicates that the processing technique is performed for at least one region within at least one frame of the video. The different views may comprise, for example, at least one of stereographic views, two different images, a 2D image and depth information, multiple views of a 2D scene having different characteristics such as resolution, bitdepth, or color information, and multiple views of a 3D scene. The at least two different views may also be compressed and multiplexed within the data stream in a standardized motion picture format capable of single view video streams. Compression of the views may comprise at least one of a sampling, filtering, and decimation of the views. The compression of the views may also comprise at least one of horizontal, vertical filtering, and quincunx sampling. The compression of the views may also comprise both filtering and sampling. And sampling may, for example, comprise at least one of horizontal, vertical, quincunx, formula based, pattern based, and arbitrary sampling. Multiplexing may be done, for example, in at least one of a checkerboard format, a quadrant based format, a column format, a row format, a side-by-side format, an over-under format, a format based on a pattern, and an alternative format.)
Therefore, in combination of Kohler and Yasutake, it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with a video comprising at least two different views where the different views may comprise, for example, at least one of stereographic views, two different images, a 2D image as seen in Tourapis because this modification would provide multiple views of a 2D scene having different characteristics such as resolution, bitdepth, or color information, and multiple views of a 3D scene (¶0017 of Tourapis) Kohler, Yasutake, Tourapis are understood to be silent on the remaining limitations of claim 1.
In the same field of endeavor, Meiyappan teaches cropped frames each formed by identifying a common portion of a left eye view and a right eye view of the user experience (¶0001 “Stereoscopic photography is the art of taking two pictures of the same subject from two slightly different view points, e.g. left and right eye views, and displaying them in such a way that each human eye sees only one of the images. The illusion of depth in a photograph or other 2-dimensional image is created by presenting a slightly different image to each eye. Stereoscopic photography involves two phases: capturing and presenting the image. One approach for capturing right and left images of the same scene is to use two identical cameras arranged in parallel or a specialized two-lens camera. To compose a stereoscopic image, only the common region that is visible in both the right and left images should be used. The image portions outside of the common region should be cropped and removed. The task of identifying the common region can be manually done by the user but it is troublesome and very time-consuming. There are available digital image processing programs for automatically creating stereo images, e.g. Cosima and Stereophoto Maker, which require the captured images to be in digital format. The cropping is done automatically by the programs. However, these programs utilize stereo matching techniques that require a very long run-time for computation, e.g. 6-10 minutes.”)
Therefore, in combination of Kohler, Yasutake, Tourapis,  it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler a and a video comprising at least two different views where the different views may comprise, for example, at least one of stereographic views, two different images, a 2D image of Tourapis with a frame comprising a common portion of a left eye view and a right eye view as seen in Meiyappan because this modification would compose a stereoscopic image only the common region that is visible in both the right and left images should be used (¶0001 of Meiyappan)
 Thus, the combination of Kohler, Yasutake, Tourapis and Meiyappan teaches wherein the second data stream comprises a frame stream of two-dimensional (2D) views of the user experience associated with user viewpoints and cropped frames each formed by identifying a common portion of a left eye view and a right eye view of the user experience.
3.	  Claim  7 is rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of Karlsson et al., IDS, U.S Patent Application Publication No. 20130064285 (“Karlsson”)
Regarding claim 7, Kohler, Yasutake teach the method of claim 1 further comprising live streaming the experience (¶0086 of Kohler”… In one example, each virtual and real-time video and audio stream may be separately stored or streamed to a remote device and composited later (e.g., utilizing timestamping, extrinsic and intrinsic calibration data, and/or other metadata). Real-time composition may allow visual presentation on the display of the augmented-reality device as well as live streaming to other remote display”)  Kohler, Yasutake are understood to be silent on the remaining limitations of claim 7,
In the same field of endeavor, Karlsson teaches comprising live streaming the user experience by encoding the composited stream according to a predetermined live streaming format (¶0002 “ Live streams typically involve encoding or re-encoding prior to transmission to devices and users associated with the devices. In many instances, live streams are encoded into a format such as H.264 (MPEG-4 Part 10). H.264 is a block oriented motion compensation based codec that is widely used in Blu-ray Discs and streaming Internet sources. H.264 encoding can be resource intensive, and specialized hardware is often used to accelerate encoding particularly at high quality levels. In many implementations, live stream encoding servers are configured with application specific hardware to receive one or more channels or live streams and encode the channels or live streams into particular formats. The encoding servers may have the capacity to perform real-time live encoding on up to half a dozen live streams simultaneously.”)
Therefore, in the combination of K Kohler and Yasutake, it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with encoding live streams into particular format as seen in Karlsson because this modification would receive one or more channels or live streams and encode the channels or live streams into particular formats or encoding prior to transmission to devices and users associated with the devices (¶0002 of Karlsson).
Thus, the combination of Kohler, Yasutake and Karlsson teaches further comprising live streaming the experience by encoding the composited stream according to a predetermined live streaming format.
4.	 Claim  8 is rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of Yerli, IDS, U.S Patent Application Publication No. 2013/0215229 (“Yerli”)
Regarding claim 8, Kohler, Yasutake teach the method of claim 1, Kohler, Yasutake are understood the remaining limitations of claim 8.
In the same field of endeavor, Yerli teaches wherein the composited stream comprises three-dimensional (3D) models representing 3D geometries of the virtual content or the real content (¶0018 “In yet another embodiment, said creating of the three-dimensional representation of the real scene further includes generating data for one or more virtual objects based on the recording, including at least one of motion data, meshes, and textures. In particular, all filmed objects, such as static props, vegetation, living persons or animals, or anything else, as well as the landscape or buildings are generated as 3D meshes of virtual objects. Furthermore, based on the recording, such as stereo images of the recording, the virtual objects are skinned with textures. The meshes may be combined with respective textures resulting in a one-to-one virtual copy of the recorded object”)
Therefore, in the combination of Kohler and Yasutake, it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with creating 3D representation of the real scene as seen in Yerli because this modification would include computing depth information based on the recording (¶0017 of Yerli).
5.	 Claim  9 is rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of Wright, IDS, U.S Patent No. 9,530,426 (“Wright”)
Regarding claim 9, Kohler and Yasutake teach the method of claim 1 further comprising identifying real or virtual audio sources producing audio during the user experience, wherein the composited stream identifies the real or virtual audio sources (¶0084 “In some implementations where a mixed-reality recording is output by the augmented-reality device, the mixed-reality recording may include an audio component. Accordingly, at 338, the method 300 optionally may include obtaining real-world audio data, via one or more microphones of the augmented-reality device. The real-world audio data may be timestamped in order to be synchronized with other layers of the mixed-reality recording. In particular, in order to provide an accurate mixed-reality experience, both virtual and real-world audio streams may be captured and synchronized with each other via virtual and real-world timestamps to form a composite audio stream.”) Kohler, Yasutake i are understood to be silent on the remaining limitation of claim 6.
Wright teaches further comprising identifying real or virtual audio sources producing audio during the user experience, wherein the composited stream identifies the real or virtual audio sources. (col.3, lines 57-67, col4, lines 1-9 “ Within the augmented reality view of FIG. 2, graphical indicators are displayed as virtual objects that identify virtualized real-world positions of virtual audio sources and/or real-world positions of real-world audio sources within the physical space. In this example, graphical indicator 232 is displayed via the augmented reality device that identifies virtual object 230 as a virtual audio source of virtual sounds. Also, in this example, graphical indicator 212 identifies real-world object 210 as a real-world audio source, and graphical indicator 222 identifies real-world object 220 as another real-world audio source. In at least some implementations, graphical indicators that identify virtual audio sources may have a different visual appearance as compared to graphical indicators that identify real-world audio sources, and may convey status information concerning the audio source, such as volume level, mute on/off, whether the sound produced by the audio source is shared with a communications partner, etc. Graphical indicators may be selectable by a user to change an audio treatment policy applied to or state of the audio source.”) 
Therefore, in the combination of Kohler and Yasutake, it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with identify virtual audio source or real world audio source as seen in Wright because this modification would convey status information concerning the audio source (col.4, lines 1-9 of Wright).
 Thus, the combination of Kohler, Yasutake and Wright teaches further comprising identifying real or virtual audio sources producing audio during the user experience, wherein the composited stream identifies the real or virtual audio sources.
6.	 Claims  10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of Vaden et al., IDS, U.S Patent No. 9,871,994 (“Vaden”)
Regarding claim 10, Kohler and Yasutake teach the method of claim 1, Kohler and Yasutake is understood to be silent on the remaining limitations of claim 10. 
In the same field of endeavor, Vaden teaches wherein the additional data comprises metadata associated with individual instants in time of a plurality of instants in time (col.6, lines 56-67, col.7, lines 1-4 “When acquiring video, e.g., using an action camera device such as GoPro HERO3, HERO4, additional information that may be related to the video acquisition session may be obtained and stored. In some implementations, such information may include camera sensor image acquisition parameters (e.g., exposure, white balance, gain), camera orientation, camera location, camera motion, time of day, season, ambient light conditions, audio information, evaluation of activity being filmed (e.g., surfing, biking), ambient temperature, user body parameters (e.g., heart rate, cadence) and/or any other parameter that may be conceivably related to the activity being filmed. Existing metadata acquisition solutions often record metadata when video being obtained and/or recorded. Such configuration may provide an additional demand on computational and/or energy resources of the capture device.”; col.16, lines 43-56 “In one or more implementations, such as shown and described in FIG. 4A, the session container may include metadata track while video and/or audio track may remain blank. The session container 400 may include metadata track 408 consisting of one or more metadata channels (e.g., channels 330, 310, 320 in FIG. 3A). The metadata track 408 may include links 410 to one or more captured content elements (e.g., video clips 302, 306 in FIG. 3A, images 342, 346 in FIG. 3B, and/or other content elements). Links 410 may contain information related to time, file name, content clip identification ID, frame position, frame time, and/or other information configured to enable a content playback application to access content element” col. 9, lines 34-37 “In some implementations, the capture device 130 in FIG. 1B may correspond to an action camera configured to capture photo, video and/or audio content” where video and audio content)
Therefore, in the combination of Kohler and Yasutake, it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with record metadata when video being obtained as seen in Vaden because this modification would provide an additional demand on computational and/or energy resources of the capture device (col. 7, lines 1-5 of Vaden).
Regarding claim 11, Kohler, Yasutake and Vaden teach the method of claim 7, wherein the metadata identifies a real physical property of the user experience (col.6, lines 18-43 of Vaden “Capture devices, such as action video cameras (e.g., GoPro HERO4 Silver) may be used in a variety of application where collecting data other than the video track may be of use. The non-video information (also referred to as the metadata) may include e.g., camera orientation, camera location, camera motion, time of day, season, ambient light conditions, weather parameters (e.g., wind speed, direction, humidity), user activity (e.g. running, biking, surfing), image acquisition parameters (e.g., white balance, image sensor gain, sensor temperature, exposure time, lens aperture, bracketing configuration (e.g., image exposure bracketing, aperture bracketing, focus bracketing), and/or other parameters), user statistics (heart rate, age, cycling cadence), Highlight Tags, image acquisition settings (e.g., white balance, field of view, gain, lens aperture, tonality curve used to obtain an image, exposure time, exposure compensation, and/or other image acquisition parameters), device and/or shot identification (ID) used in, e.g., multi-camera arrays, and/or practically any parameter that may be measured and/or recorded during video acquisition. In some implementations, metadata may include information related to proximity of other capture devices including e.g., device ID, status (e.g., recoding video, metadata, standby), range to the device, duration of device proximity occurrence, and/or other information.”) In addition, the same motivation is used as claim 10. 
7.	 Claims 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of Vaden et al., IDS, U.S Patent No. 9,871,994 (“Vaden”) further in view of JAMES et al, IDS, U.S Patent Application Publication No. 20170251261 (“James”)
Regarding claim 12, Kohler, Yasutake and Vaden teach the method of claim 10, wherein the metadata identifies a person detected via computer-implemented object detection (col.32, lines 9-20  of Vanden “In some implementations, the user interface device may be configured to provide metadata information (e.g., position, heart rate from smart watch). The computerized system may include computer executable instructions configured to automatically detect and/or identify one more persons that may be present in the video (e.g., rider B 704 in FIG. 7A).”) In addition, the same motivation is used as the rejection for claim 10. Kohler, Yasutake and Vaden are understood to be silent on computer-implemented object detection
In the same field of endeavor, James teaches wherein the metadata identifies a person detected via computer-implemented object detection (¶0105 “The annotation environment 906 may include modules for annotating the video stream 902 with annotations as may be desired. The automation environment may include one or more modules embodying known or later developed machine learning, machine vision, and automation techniques. These techniques may be entirely automatic or may utilize a combination of automatic and manual actions. An example of a combined approach may be, e.g., an automated facial detection algorithm combined with a manual tagging of the name of the individual whose face is recognized. Example capabilities of the annotation environment include face and/or object detection 908, the ability to identify and/or recognize individual faces in the video, or similarly specific objects within the video (e.g. people, vehicles, animals, vegetation, furniture, mechanical devices, etc.). In addition, annotation environment 906 may include object tracking 910 to track the various objects as they may move from frame to frame, either via their own volition or via panning of the camera. Similarly, scene recognition 912 may operate similarly to object detection, but focus on specific holistic scenes, e.g. a living room, a production studio, a natural environment, etc. Region discovery and automation 914 may identify portions of the video that have particular interest, such as an area of focused movement in the video or otherwise are subject to change based upon object tracking 910 or other processes. The identified regions may be static (e.g. a fixed region on the video frame) or may be dynamic (e.g. changing size, shape, or location on the video frame). Annotation further allows scripting 916 to embed actions based upon the various annotated elements, and other related machine learning, machine vision, and other techniques. For example, a scripting module 916 may enable the selecting of an action to be performed responsive to user interaction with one or more regions from a library of available action, and associating the selected action with the scripting event and at least one metadata tag. The annotation environment 906X may be communicatively coupled with a data storage (e.g., a database) which may store metadata including definitions of tags, regions and/or scripting events. In some examples, the data storage may store the library of available actions.”)
Therefore, in the combination of Kohler, Yasutake, Vaden,  it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with using object detection as seen in James because this modification would identify and/or recognize individual faces in the video (¶105 of James).
Thus, the combination of Kohler, Yasutake, Vaden and James teaches wherein the metadata identifies a person detected via computer-implemented object detection
Regarding claim 13, Kohler, Yasutake , Vaden teach the method of claim 10, wherein the metadata identifies a user detected (col.32, lines 9-20 of Vaden “In some implementations, the user interface device may be configured to provide metadata information (e.g., position, heart rate from smart watch). The computerized system may include computer executable instructions configured to automatically detect and/or identify one more persons that may be present in the video (e.g., rider B 704 in FIG. 7A).”) In addition, the same motivation is used as the rejection for claim 10. Kohler, Yasutake and Vaden are understood to be silent on the remaining limitations of claim 13.
In the same field of endeavor, James teaches wherein the metadata identifies a body part of a user detected via computer-implemented object detection (¶0105 “The annotation environment 906 may include modules for annotating the video stream 902 with annotations as may be desired. The automation environment may include one or more modules embodying known or later developed machine learning, machine vision, and automation techniques. These techniques may be entirely automatic or may utilize a combination of automatic and manual actions. An example of a combined approach may be, e.g., an automated facial detection algorithm combined with a manual tagging of the name of the individual whose face is recognized. Example capabilities of the annotation environment include face and/or object detection 908, the ability to identify and/or recognize individual faces in the video, or similarly specific objects within the video (e.g. people, vehicles, animals, vegetation, furniture, mechanical devices, etc.). In addition, annotation environment 906 may include object tracking 910 to track the various objects as they may move from frame to frame, either via their own volition or via panning of the camera. Similarly, scene recognition 912 may operate similarly to object detection, but focus on specific holistic scenes, e.g. a living room, a production studio, a natural environment, etc. Region discovery and automation 914 may identify portions of the video that have particular interest, such as an area of focused movement in the video or otherwise are subject to change based upon object tracking 910 or other processes. The identified regions may be static (e.g. a fixed region on the video frame) or may be dynamic (e.g. changing size, shape, or location on the video frame). Annotation further allows scripting 916 to embed actions based upon the various annotated elements, and other related machine learning, machine vision, and other techniques. For example, a scripting module 916 may enable the selecting of an action to be performed responsive to user interaction with one or more regions from a library of available action, and associating the selected action with the scripting event and at least one metadata tag. The annotation environment 906X may be communicatively coupled with a data storage (e.g., a database) which may store metadata including definitions of tags, regions and/or scripting events. In some examples, the data storage may store the library of available actions.”) In addition, the same motivation is used as the rejection for claim 12.
Thus, the combination of Kohler, Yasutake, Vaden and James teaches wherein the metadata identifies a body part of a user detected via computer-implemented object detection.
8.	 Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of Perry, IDS, U.S Patent Application Publication No. 20140364209 (“Perry”)
Regarding claim 14, Kohler and Yasutake teach the method of claim 1, Kohler and Yasutake are understood to be silent on the remaining limitations of claim 14.
In the same field of endeavor, Perry teaches wherein the additional data comprises second rendered content of the user experience from a second device separate from the device ([0102] FIG. 3 is a diagram of an embodiment of an HMD 250, which is an example of the HMD 104 (FIGS. 1A-1C). The HMD 250 includes a video audio separator 254, a video decoder 255, a WAC 258, a stream buffer 259, one or more speakers 260, a user input circuit 262, a display screen 266, a microcontroller 268, an audio buffer 272, an external video camera 274, an audio codec 276, an internal digital camera 278, a video buffer 280, a video audio synchronizer 282, a microphone 284, and a controller/computer communications circuit 287. The external video camera 274 faces a real-world environment of the user 108 and the internal digital camera 278 faces the user 108, e.g., eyes, head, etc. of the user 108.” ¶0114 “ The internal digital camera 278 captures one or more images of the one or more head actions of the user 108 (FIGS. 1A-1C, 2) to generate image data, which is an example of input data that is generated based on the one or more head actions. Similarly, the external video camera 274 captures one or more images of the real-world environment and/or of markers located on the HMD 250 or on the hand of the user 108 and/or of the hands of the user 108 to generate image data, which is an example of input data. The image data generated based on the markers located on the hand of the user 108 or based on the movement of the hand of the user 108 is an example of input data that is generated based on the hand actions. The image data captured by the cameras 274 and 278 is stored in the video buffer 280.”)
Therefore, in the combination of Kohler and Yasutake, it would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with using external video camera as seen in Perry because this modification would capture images of the real-world environment (¶0114 of Perry).
9.	 Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of SMITH et al, IDS, U.S Patent Application Publication No. 20170301140 (“SMITH”)
Regarding claim 16, Kohler and Yasutake teach the method of claim 1, wherein the virtual content is provided by an app executing within a framework, wherein the app shares use of resources of the framework with other apps(¶0030 “At 302, the method 300 may include rendering from a three-dimensional model a two-dimensional augmentation image from a first virtual perspective. The three-dimensional model may include any suitable virtual content (e.g., hologram) that may be produced by any suitable application of the augmented-reality device. For example, the three-dimensional model may include a virtual scene or virtual objects of a video game.”) Kohler and Yasutake are understood to be silent on the remaining limitations of claim 16.
In the same field of endeavor, SMITH teaches wherein the virtual content is provided by an app executing within a framework that provides the user experience, wherein the app shares use of resources of the framework with other apps (¶0030] The VR receiver may interact with and influence the AR environment of the AR host by actuating a user input device (e.g., VR device 121, mobile devices 122, or other devices) and generating data that is transmitted back to AR host system 110. For example, the VR receiver may generate new digital objects that are displayed in the augmented reality view of the AR host or manipulate digital objects that are currently displayed in the augmented reality view. In an application of this example, a VR receiver may highlight an object for the AR host's attention by manipulating a touch screen or other hand controller of a mobile device 122. In another application of this example, the VR receiver may generate and/or send images, video and other content (live or recorded) to the AR host.”)
Therefore, in the combination of Kohler and Yasutake, i t would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with sharing and interacting between augmented reality and virtual reality environment as seen in SMITH because this modification would allow the second user to interact with the view of the first user (¶0017 of SMITH).
Thus, the combination of Kohler, Yasutake and SMITH teaches wherein the virtual content is provided by an app executing within a framework that provides the user experience, wherein the app shares use of resources of the framework with other apps.
10.	 Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Kohler et al., IDS, U.S Patent Application Publication No. 2017/0061693 (“Kohler”) in view of Yasutake , U.S Patent Application Publication No. 20150371447 (“Yasutake”) further in view of Gove et al, IDS, U.S Patent Application Publication No. 20130070109 (“Gove”) further in view of Bastani et al, IDS, U.S Patent Application Publication No. 20180350032 (“Bastani”)
Regarding claim 17, Kohler and Yasutake teach the method of claim 1,  wherein the stream comprises the images (¶0085 “In some implementations, at 340, the method 300 optionally may include outputting the composite audio stream as a layer of the mixed-reality recording. In particular, the composite audio stream may be synchronized with a composite video stream that includes virtual and real-world video layers.”) Kohler and Yasutake are understood to be silent on the remaining limitations of claim 17.
In the same field of endeavor, Gove teaches wherein the stream comprises the un-foveated images (¶ 0050 “In step 88, camera module 12 may use camera sensor 14 to capture an image of a scene. If desired, the image captured in step 88 may be a non-foveated image in which all regions of the image are captured with a common resolution and framerate. With other suitable arrangements, the image captured in step 88 may be a foveated image in which objects of interest are captured with a high level of detail (such as an image captured in step 92 of a previous iteration of the steps of FIG. 7).
Therefore, in the combination of Kohler and Yasutake, i would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler with capture a non-foveated image or foveated images  as seen in Gove because this modification would capture images with different level quality. (0059 of Gove). Kohler, Yasutake and Gove are understood to be silent on the remaining limitations of claim 17.
In the same field of endeavor, Bastani teaches further comprising applying an inverse transform to unwarp images in the rendered frames to produce images (¶0034 “ In some implementations, the warped image is then unwarped to generate the desired final image. For example, unwarping the image may counteract the previously performed warping. In some implementations, unwarping the image includes applying an inverse of the function used to warp the 3D scene to the pixels of the image. For example, the inverse of the function may move the pixels representing portions of the 3D scene back to where those portions were before the warping).
Kohler teaches composition stream. Gove teaches captures non-foveated images or foveated images. Bastani teaching using unwarping functions to inverse transform of warp image where warp image is considered as foveated images.
Therefore, in the combination of Kohler, Yasutake, Gove, i t would  have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of generating the mixed-reality recording of Kohler and capture a non-foveated image or foveated images  of Gove with unwarping image function as seen in Bastani  because this modification would apply an inverse of the function used to warp the 3D scene (¶0034 of Bastani).
Thus, the combination of Kohler, Yasutake, Gove and Bastani teaches further comprising applying an inverse transform to unwarp foveated images in the rendered frames to produce un-foveated images, wherein the composited stream comprises the un-foveated images.

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARAH LE whose telephone number is (571)270-7842. The examiner can normally be reached Monday: 8AM-4:30PM EST, Tuesday: 8 AM-3:30PM EST, Wednesday: 8AM-2:30PM EST, Thursday and Friday off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SARAH LE/Primary Examiner, Art Unit 2619