Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Amendment
Applicant's amendments and remarks submitted 09/22/2022 have been entered and considered, but are not found convincing. Claims 1, 2,5-5, 7-8, 11-18, 21, 24-25,27, 36  have been amended.  Claims 10,19-20, 22-23, 26, 29, 33-35, 37-41 have been cancelled. Claims 42-46 have been added  In summary, claims 1-9, 11-18, 21, 24-25, 27-28, 30-32, 36, 42-46 are pending in the application. Applicant’s amendments have necessitated the new grounds of rejection set forth herein; accordingly, this action is made final.
Response to Arguments
Claim Rejections - 35 USC § 112:
 Applicant cancelled claim 34 .  The previous rejection of claim 34 under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph has been withdrawn.
Claim Rejections - 35 USC § 102 and Rejections under 35 U.S.C. 103
Applicant’s arguments with respect to claim 1 have been considered but are moot because the rejection has been modified to address the newly added limitation. The examiner now relies on new reference Peevers and Nauseef for the argued limitation
.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 17 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
 Claim 17 depends from claim 12, recites wherein the rendering in (C) is based on real time tracking of the second user's face in the second user image.  However, claim 12 recites  where (C) obtaining a first user image from at least one first  camera on said first device. ; and (D) rendering, on said first display…..
It is unclear the rendering in (C) is referring to because rendering only in (D) and how to rendering obtaining a first user image from at least one first camera on said first device based on real time tracking of the second user’s face in the second user image.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
1.	Claims 1-9, 11-18, 21, 24-25, 27-28, 30-32, 42-46 are rejected under 35 U.S.C. 103 as being unpatentable over Peevers et al, U.S Patent Application No. 20140192140 (“Peevers”) in view of Nauseef et al, U.S Patent Application Publication No. 2016/0191958 (“Nauseef”) 
Regarding independent claim 1, Peevers teaches a method, with a first device having a first at least one camera and a first display, the first device being associated with a first user, and a second device distinct from the first device, the second device having a second at least one camera and a second display, the second device being 5associated with a second user, distinct from the first user, and wherein the first at least one camera and the first display are integrated in the first device, and wherein the second at least one camera and the second display are integrated in the second device (¶0166-0167 as shown in Fig.1 and Fig. 15 “As part of the video capture process, consider FIG. 15, which illustrates an example embodiment, shown here as end user terminal 102 of FIG. 1. As previously illustrated and described above, end user terminal 102 contains augmentation effect module 112 , which includes, among other things, audio augmentation module 300 , video augmentation module 302 , and augmentation cue module 304 . For the purposes of this discussion, end user terminal 102 and its associated elements and environment have been simplified. However, it is to be appreciated and understood that this simplification is not intended to limit the scope of the claimed subject matter.” ¶0167” Among other things, end user terminal 102 receives video input from camera 1502 . Camera 1502 represents functionality that can electronically capture, record, and/or process a series of images in motion. Further, the electronically captured images can be stored on any suitable type of storage device, examples of which are provided below. Here, camera 1502 is illustrated as a device external to the end user-terminal that sends captured video through a wired connection. However, any suitable type of connection can be used, such a wireless connection. In some embodiments, camera 1502 and user terminal 102 are integrated with one another on a same hardware platform (such as a video camera integrated on a smart phone). Alternately or additionally camera 1502 can be integrated with a peripheral of end user terminal 102 , such as a camera integrated on display device connected to end user terminal 102 . Thus, camera 1502 represents any form of device that can capture video electronically and/or send the video to end user terminal 102 , whether they are integrated or separate”), the method comprising:
A)(1) capturing a scene with said first at least one camera, the scene comprising a live view of a real-world physical environment (¶0056 “In the illustrated and described embodiment, one or more readers who are remote from one another can read an interactive story, such as one appearing in an electronic or digital book, and can have their speech modified or morphed as the story is read. In at least some embodiments, readers participating in a remotely read interactive story share a common view of the digital story content. This common view can be, and typically is rendered on a display of the reader's computing device, such as one or more of the computing devices as described above. In these instances, the readers are connected by video communication provided by a video camera that captures at least each reader's face so that the faces can be displayed to the other readers. In addition, a microphone captures the audio, i.e., the reader's voice, at each reader's location. Thus, input such as video, audio, and/or interaction with a shared digital story, that is sensed at each reader's computing device can be shared with the other participating readers.” Where captures each reader’s face is considered as a live view); A)(2) obtaining a second user image from the second at least one camera on the second device (¶0056 “In the illustrated and described embodiment, one or more readers who are remote from one another can read an interactive story, such as one appearing in an electronic or digital book, and can have their speech modified or morphed as the story is read. In at least some embodiments, readers participating in a remotely read interactive story share a common view of the digital story content. This common view can be, and typically is rendered on a display of the reader's computing device, such as one or more of the computing devices as described above. In these instances, the readers are connected by video communication provided by a video camera that captures at least each reader's face so that the faces can be displayed to the other readers. In addition, a microphone captures the audio, i.e., the reader's voice, at each reader's location. Thus, input such as video, audio, and/or interaction with a shared digital story, that is sensed at each reader's computing device can be shared with the other participating readers.” Where captures each reader’s face); (A)(3) capturing audio data from said second device( ¶0056 “In the illustrated and described embodiment, one or more readers who are remote from one another can read an interactive story, such as one appearing in an electronic or digital book, and can have their speech modified or morphed as the story is read. In at least some embodiments, readers participating in a remotely read interactive story share a common view of the digital story content. This common view can be, and typically is rendered on a display of the reader's computing device, such as one or more of the computing devices as described above. In these instances, the readers are connected by video communication provided by a video camera that captures at least each reader's face so that the faces can be displayed to the other readers. In addition, a microphone captures the audio, i.e., the reader's voice, at each reader's location. Thus, input such as video, audio, and/or interaction with a shared digital story, that is sensed at each reader's computing device can be shared with the other participating readers.”); and 15(B) for a story comprising a plurality of events, (B)(1) rendering a particular event of said plurality of events on said first display of said first device wherein said rendering of said particular event (i) augments ( ¶0152 “In yet other embodiments, an electronic book can be rendered on the server and downloaded to all of the connected devices. In this case, the endpoints might be less powerful platforms, as all they need to do is play back the received audio and video streams. This would work for instances where, for example, the endpoints represent so-called “thin clients”. The server renders the pages of the book, applies all augmentations to the audio and video streams received from the call participants, and creates composite images, such as a book page with the appropriate participant's video stream overlaid on top, for each of the input devices” where each page is considered as particular event), and includesApplication No. 16/675,196Docket 4062-0008-US Response to Non-Final Action Page 4 of 24(ii) rendering a version of the captured audio with the particular event on at least one speaker associated with said first device (¶0148] FIG. 14 illustrates aspects of an implementation of a device 1400 in accordance with one or more embodiments. Device 1400 includes a microphone, camera, and speaker as illustrated; ¶0056 “In the illustrated and described embodiment, one or more readers who are remote from one another can read an interactive story, such as one appearing in an electronic or digital book, and can have their speech modified or morphed as the story is read. In at least some embodiments, readers participating in a remotely read interactive story share a common view of the digital story content. This common view can be, and typically is rendered on a display of the reader's computing device, such as one or more of the computing devices as described above. In these instances, the readers are connected by video communication provided by a video camera that captures at least each reader's face so that the faces can be displayed to the other readers. In addition, a microphone captures the audio, i.e., the reader's voice, at each reader's location. Thus, input such as video, audio, and/or interaction with a shared digital story, that is sensed at each reader's computing device can be shared with the other participating readers.”; ¶0063 “With respect to augmentation that takes place at the sender's or reader's computing device, consider the following. When the reader's voice is captured, the augmentation effect module 112 processes the audio data that is received from associated microphone in order to impart some type of different characteristic to it, examples of which are provided above. The augmented audio data is then encoded and compressed and then transmitted either to a server for forwarding on to one or more other participants, or directly to one or more other client devices such as those in a peer-to-peer network. By performing augmentation on the reader's computing device, the reader can be provided with feedback on how their voice sounds with the least amount of lag. The reader's experience in this instance can be improved through the use of a headset or other audio feedback control mechanisms which can reduce acoustic feedback.” ), and (iii) rendering on said first display a version of the second user image obtained in (A)(2) from the second device, 5wherein the version of the second user image is rendered as at least part of a particular augmented reality (AR) character in the story (¶0186 “Assume that two people, “Billy” and “Uncle Joe”, are remotely reading an electronic book. The book is an illustrated version of the familiar children's song “The Wheels on the Bus Go Round and Round”. The book is open to a page showing a school bus, the bus driver, doors, wheels, and windshield wipers. When Billy initiates an augmentation effect, either by touching the driver's face, or some embedded control, face detection and rotoscoping are applied to cause Uncle Joe's face to be manipulated into a cartoon version and overlaid onto the bus driver's head. As various actions are indicated in the story as through tracking by ASR, object interactions, receiving user interface input, and the like, they are enacted in the digital story display (e.g., wipers swish, doors open and shut, babies cry, and the like). Both Uncle Joe and Billy see these effects on their devices as they are applied.”) Peevers is understood to be silent on the remaining limitation so of claim 1.
In the same field of endeavor, Nauseef a first device having a first at least one camera and a first display, the first device being associated with a first user, and a second device distinct from the first device, the second device having a second at least one camera and a second display, the second device being 5associated with a second user, distinct from the first user, and wherein the first at least one camera and the first display are integrated in the first device, and wherein the second at least one camera and the second display are integrated in the second device (¶0032 “Referring now to the Figures, FIG. 1 illustrates an exemplary video communication connection 100 for enabling a video communication between a first user 102 and a second user 104. For example, each of the first user 102 and the second user 104 may hold a user device (e.g., a first user device 106 and a second user device 108, respectively) in front of his or her face so that a camera 110, 112 (e.g., a sensor) included in each respective user device 106, 108 may capture a live video feed of each user's face (e.g., the first user's face 114 and/or the second user's face 116). Audio of each user may also be captured by a microphone (not pictured) included in each user device 106, 108. The first user's face 114 may be presented to the second user 104 on the second user device 108, as well as on the first user device 106 for monitoring purposes. Similarly, the second user's face 116 may be presented to the first user 102 on the first user device 106, as well as on the second user device 108 for monitoring purposes. Additionally, contextual features (e.g., icons, images, text, background images, overlay images, and/or the like) 118, 120 associated with the first user 102 and the second user 104 may be provided in a heads-up display on the first user device 106 and the second user device 108, respectively”), the method comprising:
(A)(1) capturing a scene with said first at least one camera, the scene comprising a live view of a real-world physical environment ) (¶0105 “The content management unit 312 and/or the features unit 322 may then identify one or more contextual features relevant to the determined location of the user (e.g., relevant to the identified locational cues and/or objects of interest). For example, via analysis of live video feed of a user, the location determination unit 314 , the facial/vocal recognition unit 318 , the gesture analysis unit 320 , and/or the features unit 322 may identify a recognizable landmark, such as the Big Ben clock tower in London, in the background of the live video feed (see exemplary user interface 600 of FIG. 6”); 
(A)(2) obtaining a second user image from the second at least one camera on the second device; (A)(3) capturing audio data from said second device(¶0032 “Referring now to the Figures, FIG. 1 illustrates an exemplary video communication connection 100 for enabling a video communication between a first user 102 and a second user 104. For example, each of the first user 102 and the second user 104 may hold a user device (e.g., a first user device 106 and a second user device 108, respectively) in front of his or her face so that a camera 110, 112 (e.g., a sensor) included in each respective user device 106, 108 may capture a live video feed of each user's face (e.g., the first user's face 114 and/or the second user's face 116). Audio of each user may also be captured by a microphone (not pictured) included in each user device 106, 108. The first user's face 114 may be presented to the second user 104 on the second user device 108, as well as on the first user device 106 for monitoring purposes. Similarly, the second user's face 116 may be presented to the first user 102 on the first user device 106, as well as on the second user device 108 for monitoring purposes. Additionally, contextual features (e.g., icons, images, text, background images, overlay images, and/or the like) 118, 120 associated with the first user 102 and the second user 104 may be provided in a heads-up display on the first user device 106 and the second user device 108, respectively”); and
(B)(1) rendering a particular event of said plurality of events on said first display of said first device, wherein said rendering of said particular event (i) augments the scene captured in (A) by said first at least one camera, and includesApplication No. 16/675,196Docket 4062-0008-US (ii) rendering a version of the captured audio with the particular event on at least one speaker associated with said first device, and (iii) rendering on said first display a version of the second user image obtained in (A)(2) from the second device (¶0096 “Once the video communication connection has been established by the communication unit 308, the user device and/or the second user device may enable the user and the second user, respectively, to stream a live video and/or audio feed to one another. For example, the user may utilize the I/O device 342 (e.g., a camera and a microphone, a sensor, and/or the like) included in the user device to capture a live video feed of the user's face and voice. Similarly, the second user may utilize the I/O device 342 (e.g., a camera and a microphone, a sensor, and/or the like) included in the second user device to capture a live video feed of the second user's face and voice. In some embodiments, the live video feeds and/or the live audio feeds captured by the user device may be transmitted from the user device to the second user device for display to the second user, and vice versa. In this manner, the user and the second user may communicate by viewing and/or listening to the live video feeds and/or the live audio feeds received from the other user (e.g., the second user and/or the user, respectively) using the established video communication connection.”), 5
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of communicate and interact with story-based shared , interactive content in real-time between two or more remote participants of Peevers with a video communication server for receiving video content of a video communication connection between a first user of a first user device and a second user of a second user device in real time and providing contextual features for digital communication as seen in Nauseef because this modification would apply development of enhancements to conversations enabled by digital communications technologies (¶0003 of Nauseef)
Thus, the combination of Peevers and Nauseef teaches a method, with a first device having a first at least one camera and a first display, the first device being associated with a first user, and a second device distinct from the first device, the second device having a second at least one camera and a second display, the second device being 5associated with a second user, distinct from the first user, and wherein the first at least one camera and the first display are integrated in the first device, and wherein the second at least one camera and the second display are integrated in the second device, the method comprising: 10(A)(1) capturing a scene with said first at least one camera, the scene comprising a live view of a real-world physical environment; (A)(2) obtaining a second user image from the second at least one camera on the second device; (A)(3) capturing audio data from said second device; and 15(B) for a story comprising a plurality of events, (B)(1) rendering a particular event of said plurality of events on said first display of said first device, wherein said rendering of said particular event (i) augments the scene captured in (A) by said first at least one camera, and includesApplication No. 16/675,196Docket 4062-0008-US Response to Non-Final Action Page 4 of 24 (ii) rendering a version of the captured audio with the particular event on at least one speaker associated with said first device, and (iii) rendering on said first display a version of the second user image obtained in (A)(2) from the second device, 5wherein the version of the second user image is rendered as at least part of a particular augmented reality (AR) character in the story.
Regarding claim 2, Peevers and Nauseef teach the method of claim 1, further comprising: (B)(2) transitioning to a next event of said plurality of events; and, 10(B)(3) in response to said transitioning in (B)(2), rendering said next event on said first display of said first device (¶0082-0083 “Table 1 is an example of how the position information from a suitably-configured position tracker can be used as an index into a table of effects to trigger a particular augmentation when a specific word is reached on the page to which the table is bound. In one or more embodiments, a single table can be utilized to trigger augmentation effects for each page in the book. Alternately, a single table can be utilized for the entire book. In this instance, the table could be indexed not by position within a page, but rather by position within the entire book.[0083] In addition, one or more tables can be utilized to determine when to trigger background audio sounds, e.g., jungle sounds, thunder, applause, and the like. If there is only one table, it can be indexed by page number, as in the following example”; ¶0140 “In one or more embodiments, the page numbers of the story or other story structure can be utilized to apply augmentation. For example, as a story is being read, when the reader reaches a certain page or paragraph, augmentation can be applied. Assume, for example, a story is being read and on page 3 of the story, the entire page includes a dialogue of one character. In this instance, voice morphing and/or other effects can be applied when the reader turns to page 3. When the reader turns to page 4, the voice morphing and/or other effects can be terminated. Alternately or additionally, once the augmentation begins, it may end naturally before the page or paragraph ends.”)
Regarding claim 3, Peevers and Nauseef teach the method of claim 2, wherein said particular event includes event transition information, and wherein said transitioning in (B)(2) 1soccurs in accordance with said event transition information (¶0082-0083 “Table 1 is an example of how the position information from a suitably-configured position tracker can be used as an index into a table of effects to trigger a particular augmentation when a specific word is reached on the page to which the table is bound. In one or more embodiments, a single table can be utilized to trigger augmentation effects for each page in the book. Alternately, a single table can be utilized for the entire book. In this instance, the table could be indexed not by position within a page, but rather by position within the entire book.[0083] In addition, one or more tables can be utilized to determine when to trigger background audio sounds, e.g., jungle sounds, thunder, applause, and the like. If there is only one table, it can be indexed by page number, as in the following example” ¶0140 “In one or more embodiments, the page numbers of the story or other story structure can be utilized to apply augmentation. For example, as a story is being read, when the reader reaches a certain page or paragraph, augmentation can be applied. Assume, for example, a story is being read and on page 3 of the story, the entire page includes a dialogue of one character. In this instance, voice morphing and/or other effects can be applied when the reader turns to page 3. When the reader turns to page 4, the voice morphing and/or other effects can be terminated. Alternately or additionally, once the augmentation begins, it may end naturally before the page or paragraph ends.”)
Regarding claim 4, Peevers and Nauseef teach the method of claim 1, wherein a transition is based on one or more of:(a) a period of time; and/or 20(b) a user interaction associated with said second device; and/or (c) a user gesture associated with said second device (¶0115] In one or more embodiments, gestures can be utilized to apply augmentation. The gestures can include touch-based gestures as well as non-touch-based gestures, such as those provided through a natural user interface (NUI). In either case, particular gestures can be mapped to various augmentations. As an example, consider non-touch-based gestures that can be captured by a video camera and analyzed in much the same manner as gestures are captured and analyzed by Microsoft's Kinect technology.” ¶0157] In one or more embodiments, individual instances of an electronic book being shared can be synchronized between all of the participants' computers. Whenever one of the participants interacts with the book, control information corresponding to this interaction is transmitted to all other participants. Examples of interactions include, but are not limited to: advance or rewind to next/previous page, touch an object within a page, exit the book, skip to the end, set a bookmark, choose an existing bookmark, etc.”; ¶0157 “ In one or more embodiments, individual instances of an electronic book being shared can be synchronized between all of the participants' computers. Whenever one of the participants interacts with the book, control information corresponding to this interaction is transmitted to all other participants. Examples of interactions include, but are not limited to: advance or rewind to next/previous page, touch an object within a page, exit the book, skip to the end, set a bookmark, choose an existing bookmark, etc.”¶0160 “ Some of the above actions (for example, NEXTPAGE) might be initiated by any of the participants. A filtering/interlock mechanism precludes the various users' devices from getting out of synchrony. When a page change is requested locally, the command is immediately broadcast to all other participants. When a remote device receives this command, it will temporarily lock out any locally (to that device) generated page-change requests until it receives a PAGECHANGECOMPLETE message from the initiating device. The remote devices then enacts the command (e.g. turn to the next page), and then sends an acknowledgement (PAGECHANGEACKNOWLEDGE) message back to the initiating device. The page on the local (initiating) device is not changed until all remote devices have acknowledged receipt of the page-turn command. The local page is turned, and a PAGECHANGECOMPLETE message is broadcast. When remote devices receive this message, they are again free to respond to locally generated commands.”)
Regarding claim 5, Peevers and Nauseef teach the method of claim 4, wherein the user gesture is determined based on one or more of: (i) an image obtained by said 25second device; and/or (ii) on movement and/or orientation of said second device (¶0115] In one or more embodiments, gestures can be utilized to apply augmentation. The gestures can include touch-based gestures as well as non-touch-based gestures, such as those provided through a natural user interface (NUI). In either case, particular gestures can be mapped to various augmentations. As an example, consider non-touch-based gestures that can be captured by a video camera and analyzed in much the same manner as gestures are captured and analyzed by Microsoft's Kinect technology.”; ¶0116] In this particular instance, assume that a reader is reading a story that is shared with other participants. A forward-facing camera captures images of the reader. When the reader reaches a particular part of the story, they make a swiping gesture over one of the story's characters. The swiping gesture is then mapped to a voice effect that morphs the reader's voice into the voice of the character over which the swiping gesture occurred. Similarly, assume that in this particular story a number of background sounds are available. As the reader progresses through the story, they make a tapping gesture in space over a rain cloud which is captured by the forward-facing camera and mapped to a background sound in the form of thunder.”)
Regarding claim 6, Peevers and Nauseef teach the method of claim 4, wherein the user gesture comprises a facial gesture and/or a body gesture (¶0115] In one or more embodiments, gestures can be utilized to apply augmentation. The gestures can include touch-based gestures as well as non-touch-based gestures, such as those provided through a natural user interface (NUI). In either case, particular gestures can be mapped to various augmentations. As an example, consider non-touch-based gestures that can be captured by a video camera and analyzed in much the same manner as gestures are captured and analyzed by Microsoft's Kinect technology.”; ¶0116] In this particular instance, assume that a reader is reading a story that is shared with other participants. A forward-facing camera captures images of the reader. When the reader reaches a particular part of the story, they make a swiping gesture over one of the story's characters. The swiping gesture is then mapped to a voice effect that morphs the reader's voice into the voice of the character over which the swiping gesture occurred. Similarly, assume that in this particular story a number of background sounds are available. As the reader progresses through the story, they make a tapping gesture in space over a rain cloud which is captured by the forward-facing camera and mapped to a background sound in the form of thunder.” ; ¶0174 “The above discussions describe manual and automatic detection techniques associated with video capture and still images. While described in the context of identifying a face, facial features, and/or facial gestures, it is to be appreciated that these techniques can be modified and/or applied in any suitable manner. For example, instead of face recognition and/or identifying a wink, video can be processed to identify a hand wave, sign language gestures, and so forth. As discussed above, these identified gestures can then be used to influence animation and/or behavior of a shared story experience. Alternately or additionally, once various features have been identified (such as facial detection), the video can be augmented and/or enhanced as part of the story telling process.”)
Regarding claim 7, Peevers and Nauseef teach the method of claim 4, wherein the user 5interaction comprises one or more of: a user voice command; and a user touching a screen or button on said second device (¶0115] In one or more embodiments, gestures can be utilized to apply augmentation. The gestures can include touch-based gestures as well as non-touch-based gestures, such as those provided through a natural user interface (NUI). In either case, particular gestures can be mapped to various augmentations. As an example, consider non-touch-based gestures that can be captured by a video camera and analyzed in much the same manner as gestures are captured and analyzed by Microsoft's Kinect technology.” ¶0157] In one or more embodiments, individual instances of an electronic book being shared can be synchronized between all of the participants' computers. Whenever one of the participants interacts with the book, control information corresponding to this interaction is transmitted to all other participants. Examples of interactions include, but are not limited to: advance or rewind to next/previous page, touch an object within a page, exit the book, skip to the end, set a bookmark, choose an existing bookmark, etc.”)
Regarding claim 8, Peevers and Nauseef teach the method of claim 1, wherein said particular event comprises one or more of: (i) audio information; (ii) textual 10information; and (iii) augmented reality (AR) information and  wherein rendering of said particular event in (B)(1) comprises rendering one or more of: (x) audio information associated with said particular event; (y) textual information associated with said particular event; and (z) AR information associated with said particular event (¶0140] In one or more embodiments, the page numbers of the story or other story structure can be utilized to apply augmentation. For example, as a story is being read, when the reader reaches a certain page or paragraph, augmentation can be applied. Assume, for example, a story is being read and on page 3 of the story, the entire page includes a dialogue of one character. In this instance, voice morphing and/or other effects can be applied when the reader turns to page 3. When the reader turns to page 4, the voice morphing and/or other effects can be terminated. Alternately or additionally, once the augmentation begins, it may end naturally before the page or paragraph ends.”¶186 “ Assume that two people, “Billy” and “Uncle Joe”, are remotely reading an electronic book. The book is an illustrated version of the familiar children's song “The Wheels on the Bus Go Round and Round”. The book is open to a page showing a school bus, the bus driver, doors, wheels, and windshield wipers. When Billy initiates an augmentation effect, either by touching the driver's face, or some embedded control, face detection and rotoscoping are applied to cause Uncle Joe's face to be manipulated into a cartoon version and overlaid onto the bus driver's head. As various actions are indicated in the story as through tracking by ASR, object interactions, receiving user interface input, and the like, they are enacted in the digital story display (e.g., wipers swish, doors open and shut, babies cry, and the like). Both Uncle Joe and Billy see these effects on their devices as they are applied”)
Regarding claim 9, Peevers and Nauseef teach the method of claim 1, further comprising: repeating act (B)(1) for multiple events in said story (¶014 “ In one or more embodiments, the page numbers of the story or other story structure can be utilized to apply augmentation. For example, as a story is being read, when the reader reaches a certain page or paragraph, augmentation can be applied. Assume, for example, a story is being read and on page 3 of the story, the entire page includes a dialogue of one character. In this instance, voice morphing and/or other effects can be applied when the reader turns to page 3. When the reader turns to page 4, the voice morphing and/or other effects can be terminated. Alternately or additionally, once the augmentation begins, it may end naturally before the page or paragraph ends”; ¶0144] Step 1302 detects, during reading of the story, one or more page numbers or other story structure that identifies locations where augmentation is to take place. Step 1304 augments one or more properties or characteristics of the story based on locations identified from the page numbers or other story structure. For example, the reader's voice can be augmented as described above. Alternately or additionally, one or more effects can be applied as described above. Further, content of the story itself can be augmented or modified. For example, augmentation can further include augmenting video associated with the story, e.g., manipulating one or more objects within the story as described above and below. Further, this step can be performed at any suitable location, examples of which are provided above.”)
Regarding claim 11, Peevers and Nauseef teach the method of claim 1, wherein the first device and the second device are selected from: mobile phones and  tablet devices (π0044] In one embodiment, this interconnection architecture enables functionality to be delivered across multiple devices to provide a common and seamless experience to the user of the multiple devices. Each of the multiple devices may have different physical requirements and capabilities, and the central computing device uses a platform to enable the delivery of an experience to the device that is both tailored to the device and yet common to all devices. In one embodiment, a “class” of target device is created and experiences are tailored to the generic class of devices. A class of device may be defined by physical features or usage or other common characteristics, e.g. CPU performance of the devices. For example, as previously described, end-user terminal 102 may be configured in a variety of different ways, such as for mobile 202 , computer 204 , and television 206 uses. Each of these configurations has a generally corresponding screen size and thus end-user terminal 102 may be configured as one of these device classes in this example system 200 . For instance, the end-user terminal 102 may assume the mobile 202 class of device which includes mobile telephones, music players, game devices, and so on. The end-user terminal 102 may also assume a computer 204 class of device that includes personal computers, laptop computers, netbooks, tablet computers, and so on.”)
Regarding claim 12, Peevers and Nauseef teach the method of claim 1, further comprising: (C) obtaining a first user image from at least one first  camera on said first device (¶0056 “In the illustrated and described embodiment, one or more readers who are remote from one another can read an interactive story, such as one appearing in an electronic or digital book, and can have their speech modified or morphed as the story is read. In at least some embodiments, readers participating in a remotely read interactive story share a common view of the digital story content. This common view can be, and typically is rendered on a display of the reader's computing device, such as one or more of the computing devices as described above. In these instances, the readers are connected by video communication provided by a video camera that captures at least each reader's face so that the faces can be displayed to the other readers. In addition, a microphone captures the audio, i.e., the reader's voice, at each reader's location. Thus, input such as video, audio, and/or interaction with a shared digital story, that is sensed at each reader's computing device can be shared with the other participating readers.”); and (D) rendering, on said first display of said first device, a version of the 5first user image with the particular event of said plurality of events in (B)(1), wherein the version of the first user image is rendered as part of a first augmented reality character in the story (¶0186 of Peevers “Assume that two people, “Billy” and “Uncle Joe”, are remotely reading an electronic book. The book is an illustrated version of the familiar children's song “The Wheels on the Bus Go Round and Round”. The book is open to a page showing a school bus, the bus driver, doors, wheels, and windshield wipers. When Billy initiates an augmentation effect, either by touching the driver's face, or some embedded control, face detection and rotoscoping are applied to cause Uncle Joe's face to be manipulated into a cartoon version and overlaid onto the bus driver's head. As various actions are indicated in the story as through tracking by ASR, object interactions, receiving user interface input, and the like, they are enacted in the digital story display (e.g., wipers swish, doors open and shut, babies cry, and the like). Both Uncle Joe and Billy see these effects on their devices as they are applied.” ¶0032 of Nauseef “Referring now to the Figures, FIG. 1 illustrates an exemplary video communication connection 100 for enabling a video communication between a first user 102 and a second user 104. For example, each of the first user 102 and the second user 104 may hold a user device (e.g., a first user device 106 and a second user device 108, respectively) in front of his or her face so that a camera 110, 112 (e.g., a sensor) included in each respective user device 106, 108 may capture a live video feed of each user's face (e.g., the first user's face 114 and/or the second user's face 116). Audio of each user may also be captured by a microphone (not pictured) included in each user device 106, 108. The first user's face 114 may be presented to the second user 104 on the second user device 108, as well as on the first user device 106 for monitoring purposes. Similarly, the second user's face 116 may be presented to the first user 102 on the first user device 106, as well as on the second user device 108 for monitoring purposes. Additionally, contextual features (e.g., icons, images, text, background images, overlay images, and/or the like) 118, 120 associated with the first user 102 and the second user 104 may be provided in a heads-up display on the first user device 106 and the second user device 108, respectively) In addition, the same motivation is used as the rejection for claim 1.
 Regarding claim 13, Peevers and Nauseef teach the method of claim 12, wherein rendering 10 the version of the second user image in (D) comprises: animating at least a portion of the second user image (¶107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0172 “As another example, facial detection algorithm 1602 c identifies specific details associated with a face, shown generally here as regions 1610 . Here, the eyes, the nose, and the mouth are separately located and identified from one another. As in the case above, these features can be superimposed on one or more images contained within a story, such as replacing the eyes nose and mouth of a cartoon character within the story. Alternately or additionally, these features can be monitored over time to identify gestures, such as a wink, a kiss, a sneeze, whistling, talking, yelling, blinking, a head nod, a head shake, and so forth. In turn, the identified gestures can drive animation of a cartoon character within the story. For example, in some embodiments, detecting a wink within the video can, in turn, can cause an associated cartoon character to wink. While discussed in the context of facial detection, it is to be appreciated and understood that any suitable gesture can be monitored and/or detected without departing from the scope of the claimed subject matter.”)
Regarding claim 14, Peevers and Nauseef teach the method of claim 13, wherein the portion of the second user image comprises the second user's face (¶0172 “As another example, facial detection algorithm 1602 c identifies specific details associated with a face, shown generally here as regions 1610 . Here, the eyes, the nose, and the mouth are separately located and identified from one another. As in the case above, these features can be superimposed on one or more images contained within a story, such as replacing the eyes nose and mouth of a cartoon character within the story. Alternately or additionally, these features can be monitored over time to identify gestures, such as a wink, a kiss, a sneeze, whistling, talking, yelling, blinking, a head nod, a head shake, and so forth. In turn, the identified gestures can drive animation of a cartoon character within the story. For example, in some embodiments, detecting a wink within the video can, in turn, can cause an associated cartoon character to wink. While discussed in the context of facial detection, it is to be appreciated and understood that any suitable gesture can be monitored and/or detected without departing from the scope of the claimed subject matter.”; ¶0186 “Assume that two people, “Billy” and “Uncle Joe”, are remotely reading an electronic book. The book is an illustrated version of the familiar children's song “The Wheels on the Bus Go Round and Round”. The book is open to a page showing a school bus, the bus driver, doors, wheels, and windshield wipers. When Billy initiates an augmentation effect, either by touching the driver's face, or some embedded control, face detection and rotoscoping are applied to cause Uncle Joe's face to be manipulated into a cartoon version and overlaid onto the bus driver's head. As various actions are indicated in the story as through tracking by ASR, object interactions, receiving user interface input, and the like, they are enacted in the digital story display (e.g., wipers swish, doors open and shut, babies cry, and the like). Both Uncle Joe and Billy see these effects on their devices as they are applied.)
Regarding claim 15, Peevers and Nauseef teach the method of claim 12, further comprising: recognizing the second user's face in the second user image (¶0056 “In the illustrated and described embodiment, one or more readers who are remote from one another can read an interactive story, such as one appearing in an electronic or digital book, and can have their speech modified or morphed as the story is read. In at least some embodiments, readers participating in a remotely read interactive story share a common view of the digital story content. This common view can be, and typically is rendered on a display of the reader's computing device, such as one or more of the computing devices as described above. In these instances, the readers are connected by video communication provided by a video camera that captures at least each reader's face so that the faces can be displayed to the other readers. In addition, a microphone captures the audio, i.e., the reader's voice, at each reader's location. Thus, input such as video, audio, and/or interaction with a shared digital story, that is sensed at each reader's computing device can be shared with the other participating readers.” ¶0178 “In addition to incorporating augmented video 1904 , enhanced interactive story 1902 includes a still image associated with a face of video capture image 1506 superimposed upon image 1906 . As discussed above, the face can be extracted using automatic and/or manual face detection processes. Here, the facial features are simply cut and pasted into image 1906 . However, in other embodiments, other augmentation filters can be applied, such as the alpha blending algorithm described above.”)
Regarding claim 16, Peevers and Nauseef teach the method of claim 14, further comprising: 20tracking the second user's face in real-time (¶0107 “ In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”)
 Regarding claim 17, Peevers and Nauseef teach the method of claim 12, wherein the rendering in (C) is based on real time tracking of the second user's face in the second user image ((¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”)
 Regarding claim 18, Peevers and Nauseef teach the method of claim 13, wherein said animating is based, at least in part, on manipulation and/or 5movement of the second device (¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”)
Regarding claim 21, Peevers and Nauseef teach the method of claim 1, wherein the ioaudio data captured in (A)(3) are  manipulated and/or augmented before being rendered ((¶0060 “The specific use of voice manipulation or morphing in the present context, as noted above, is intended for manipulation of a reader's voice as they read a shared story to a remote person.”;¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”)
Regarding claim 24, Peevers and Nauseef teach the method of claim 2, wherein said transitioning in (B)(2) is based on an action associated with the second device (¶[0160 “ Some of the above actions (for example, NEXTPAGE) might be initiated by any of the participants. A filtering/interlock mechanism precludes the various users' devices from getting out of synchrony. When a page change is requested locally, the command is immediately broadcast to all other participants. When a remote device receives this command, it will temporarily lock out any locally (to that device) generated page-change requests until it receives a PAGECHANGECOMPLETE message from the initiating device. The remote devices then enacts the command (e.g. turn to the next page), and then sends an acknowledgement (PAGECHANGEACKNOWLEDGE) message back to the initiating device. The page on the local (initiating) device is not changed until all remote devices have acknowledged receipt of the page-turn command. The local page is turned, and a PAGECHANGECOMPLETE message is broadcast. When remote devices receive this message, they are again free to respond to locally generated commands.” Where participants (remote and local users) can change the page of the story)
Regarding claim 25, Peevers and Nauseef teach the method of claim 24, wherein said 20transitioning in (B)(2) is triggered by said action associated with said second device (¶0160 “ Some of the above actions (for example, NEXTPAGE) might be initiated by any of the participants. A filtering/interlock mechanism precludes the various users' devices from getting out of synchrony. When a page change is requested locally, the command is immediately broadcast to all other participants. When a remote device receives this command, it will temporarily lock out any locally (to that device) generated page-change requests until it receives a PAGECHANGECOMPLETE message from the initiating device. The remote devices then enacts the command (e.g. turn to the next page), and then sends an acknowledgement (PAGECHANGEACKNOWLEDGE) message back to the initiating device. The page on the local (initiating) device is not changed until all remote devices have acknowledged receipt of the page-turn command. The local page is turned, and a PAGECHANGECOMPLETE message is broadcast. When remote devices receive this message, they are again free to respond to locally generated commands.”)
Regarding claim 27, Peevers and Nauseef teach the method of claim 1, wherein said rendering of said particular event in (B)(1) also augments the scene with information associated with at least one other device (¶0075 “In one or more embodiments, automatic speech recognition can be utilized to recognize where, in a particular narrative, the reader is reading and use this information to trigger various augmentation effects at the appropriate time. In these instances, the augmentation cue module 304 includes a speech recognition component that tracks where in the story the reader is reading through analysis of audio signal data that is captured by a suitably-configured microphone. The augmentation cue module 304 can then trigger augmentation events as appropriate. For example, assume that participants are sharing a story about Elmo. When the reader reaches words that are spoken by Elmo, the reader's voice can be morphed to sound like Elmo. When Elmo's phrase is complete, the reader's voice can be returned to its normal sound. Alternately or additionally, augmentation effects can be applied with respect to particular words that are read by the reader. For example, background sounds or effects can be triggered when the reader reads words such as “wind”, “thunder”, “rain”, and the like. “¶0183 “Responsive to augmenting the video data to generate at least one new image, step 2004 enables the one or more remote participants to consume the augmented video data. For example, in embodiments where the video data is augmented on the reader's computing device, step 2004 can be performed by transmitting or otherwise conveying the augmented video data to a computing device associated with each of the remote participants. In embodiments where the video data is augmented by a server, the step can be performed by the server distributing the augmented video data to a computing device associated with each of the remote participants. In embodiments where the video data is augmented by a computing device associated with a remote participant, the step can be performed by enabling the remote participant to consume the augmented video data via a suitably-configured application.”; ¶0186 “ Assume that two people, “Billy” and “Uncle Joe”, are remotely reading an electronic book. The book is an illustrated version of the familiar children's song “The Wheels on the Bus Go Round and Round”. The book is open to a page showing a school bus, the bus driver, doors, wheels, and windshield wipers. When Billy initiates an augmentation effect, either by touching the driver's face, or some embedded control, face detection and rotoscoping are applied to cause Uncle Joe's face to be manipulated into a cartoon version and overlaid onto the bus driver's head. As various actions are indicated in the story as through tracking by ASR, object interactions, receiving user interface input, and the like, they are enacted in the digital story display (e.g., wipers swish, doors open and shut, babies cry, and the like). Both Uncle Joe and Billy see these effects on their devices as they are applied.”)
Regarding claim 28, Peevers and Nauseef teach the method of claim 27, wherein said information associated with said at least one other device corresponds to on one or more of: (i) an image captured by said at least one other device; and/or (ii) an image representing or corresponding to said at least one other 10device; and/or (iii) audio from said at least one other device (¶0075 “In one or more embodiments, automatic speech recognition can be utilized to recognize where, in a particular narrative, the reader is reading and use this information to trigger various augmentation effects at the appropriate time. In these instances, the augmentation cue module 304 includes a speech recognition component that tracks where in the story the reader is reading through analysis of audio signal data that is captured by a suitably-configured microphone. The augmentation cue module 304 can then trigger augmentation events as appropriate. For example, assume that participants are sharing a story about Elmo. When the reader reaches words that are spoken by Elmo, the reader's voice can be morphed to sound like Elmo. When Elmo's phrase is complete, the reader's voice can be returned to its normal sound. Alternately or additionally, augmentation effects can be applied with respect to particular words that are read by the reader. For example, background sounds or effects can be triggered when the reader reads words such as “wind”, “thunder”, “rain”, and the like. “¶0183 “Responsive to augmenting the video data to generate at least one new image, step 2004 enables the one or more remote participants to consume the augmented video data. For example, in embodiments where the video data is augmented on the reader's computing device, step 2004 can be performed by transmitting or otherwise conveying the augmented video data to a computing device associated with each of the remote participants. In embodiments where the video data is augmented by a server, the step can be performed by the server distributing the augmented video data to a computing device associated with each of the remote participants. In embodiments where the video data is augmented by a computing device associated with a remote participant, the step can be performed by enabling the remote participant to consume the augmented video data via a suitably-configured application.”; ¶0186 “ Assume that two people, “Billy” and “Uncle Joe”, are remotely reading an electronic book. The book is an illustrated version of the familiar children's song “The Wheels on the Bus Go Round and Round”. The book is open to a page showing a school bus, the bus driver, doors, wheels, and windshield wipers. When Billy initiates an augmentation effect, either by touching the driver's face, or some embedded control, face detection and rotoscoping are applied to cause Uncle Joe's face to be manipulated into a cartoon version and overlaid onto the bus driver's head. As various actions are indicated in the story as through tracking by ASR, object interactions, receiving user interface input, and the like, they are enacted in the digital story display (e.g., wipers swish, doors open and shut, babies cry, and the like). Both Uncle Joe and Billy see these effects on their devices as they are applied.”)
Regarding claim 30,  Peevers and Nauseef teach the method of claim 28, wherein said image representing or corresponding to said at least one other device comprises an avatar (¶[0176 as shown in Fig. 18 of Peevers “Consider FIG. 18, which illustrates before and after examples of a rotoscoping filter. Image 1802 illustrates a still image of a man. This image represents a real world image taken by a camera, such as camera 1502 of FIG. 15. Here, the image has been centered on the man's head. In some embodiments, image 1802 has been previously processed using facial detection algorithms as described above to remove other elements and/or objects surrounding the face. This image can be used as input to one or more filters, such as the rotoscope filter described above. Image 1804 illustrates how image 1802 would appear after applying a rotoscope filter. After filtering, image 1804 closely resembles a drawn version, or cartoon version, of image 1802 . While discussed in the context of a still image, it is to be appreciated that filters can be applied to video capture without departing from the scope of the claimed subject matter.”; ¶0108 of Nauseef “As described herein, the features unit 322 and/or the content management unit 312 may present to the user contextual features identified as relevant to the user's emotions and/or location. In some embodiments, the relevant contextual features may be presented to the user in a toolbar, a menu, and/or other portion of a user interface. Selecting a contextual feature for incorporation into the video communication may include overlaying a live video feed and/or a live audio feed with an image, text, an icon, an audio clip, and/or the like. Additionally and/or alternatively, selecting a contextual feature for incorporation into the video communication connection may include replacing an image of a user in the live video stream (e.g., visually overlaying in real time) with an icon, a static image, an animated image, text, an avatar or a cartoon, digital apparel, a shape, a filter, a color, a sticker, a video stream, and/or the like. For example, exemplary user interface 500 of FIG. 5 illustrates an duck avatar that has replaced the image of a user in the live video stream. Selecting a contextual feature for incorporation into the video communication may further include masking and/or modifying a live audio feed of a user by modulating the user's voice with a phaser, a compressor, a flanger, a delay, a reverb, a pitch shifter, a filter, and/or the like. Selecting a contextual feature for incorporation into the video communication may also include changing, modifying, and/or augmenting a background image of the live video feed with a pattern with an image of a particular setting or location (e.g., a beach setting, a skyscraper skyline, a rainforest, and/or the like), and/or the like. Typically, selecting a contextual feature includes transforming the visual and/or auditory appearance of a user and may be selected and/or determined to be relevant based on an identified environment of a user, a determined location of a user, and/or the like.”) In addition, the same motivation is used as the rejection for claim 1.
Regarding claim 31, Peevers and Nauseef teach the method of claim 28, wherein said image 20representing or corresponding to said at least one other device is animated (¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”)
 Regarding claim 32, Peevers and Nauseef teach the method of claim 31, wherein said image is animated, at least in part, by manipulation and/or movement of the at least one other device (¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 of Peevers “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”)
Regarding claim 42, Peevers and Nauseef teach the method of claim 1, wherein rendering a version of 10the second user image in (B)(1)(iii) comprises: animating at least a portion of the second user image ((¶107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0172 “As another example, facial detection algorithm 1602 c identifies specific details associated with a face, shown generally here as regions 1610 . Here, the eyes, the nose, and the mouth are separately located and identified from one another. As in the case above, these features can be superimposed on one or more images contained within a story, such as replacing the eyes nose and mouth of a cartoon character within the story. Alternately or additionally, these features can be monitored over time to identify gestures, such as a wink, a kiss, a sneeze, whistling, talking, yelling, blinking, a head nod, a head shake, and so forth. In turn, the identified gestures can drive animation of a cartoon character within the story. For example, in some embodiments, detecting a wink within the video can, in turn, can cause an associated cartoon character to wink. While discussed in the context of facial detection, it is to be appreciated and understood that any suitable gesture can be monitored and/or detected without departing from the scope of the claimed subject matter.”)
Regarding claim 43, Peevers and Nauseef teach the method of claim 42, wherein the portion of the second user image comprises the second user's face ((¶0172 “As another example, facial detection algorithm 1602 c identifies specific details associated with a face, shown generally here as regions 1610 . Here, the eyes, the nose, and the mouth are separately located and identified from one another. As in the case above, these features can be superimposed on one or more images contained within a story, such as replacing the eyes nose and mouth of a cartoon character within the story. Alternately or additionally, these features can be monitored over time to identify gestures, such as a wink, a kiss, a sneeze, whistling, talking, yelling, blinking, a head nod, a head shake, and so forth. In turn, the identified gestures can drive animation of a cartoon character within the story. For example, in some embodiments, detecting a wink within the video can, in turn, can cause an associated cartoon character to wink. While discussed in the context of facial detection, it is to be appreciated and understood that any suitable gesture can be monitored and/or detected without departing from the scope of the claimed subject matter.”; ¶0186 “Assume that two people, “Billy” and “Uncle Joe”, are remotely reading an electronic book. The book is an illustrated version of the familiar children's song “The Wheels on the Bus Go Round and Round”. The book is open to a page showing a school bus, the bus driver, doors, wheels, and windshield wipers. When Billy initiates an augmentation effect, either by touching the driver's face, or some embedded control, face detection and rotoscoping are applied to cause Uncle Joe's face to be manipulated into a cartoon version and overlaid onto the bus driver's head. As various actions are indicated in the story as through tracking by ASR, object interactions, receiving user interface input, and the like, they are enacted in the digital story display (e.g., wipers swish, doors open and shut, babies cry, and the like). Both Uncle Joe and Billy see these effects on their devices as they are applied.)
Regarding claim 44, Peevers and Nauseef teach the method of claim 1, further comprising: recognizing the second user's face in the second user image (¶0056 “In the illustrated and described embodiment, one or more readers who are remote from one another can read an interactive story, such as one appearing in an electronic or digital book, and can have their speech modified or morphed as the story is read. In at least some embodiments, readers participating in a remotely read interactive story share a common view of the digital story content. This common view can be, and typically is rendered on a display of the reader's computing device, such as one or more of the computing devices as described above. In these instances, the readers are connected by video communication provided by a video camera that captures at least each reader's face so that the faces can be displayed to the other readers. In addition, a microphone captures the audio, i.e., the reader's voice, at each reader's location. Thus, input such as video, audio, and/or interaction with a shared digital story, that is sensed at each reader's computing device can be shared with the other participating readers.” ¶0178 “In addition to incorporating augmented video 1904 , enhanced interactive story 1902 includes a still image associated with a face of video capture image 1506 superimposed upon image 1906 . As discussed above, the face can be extracted using automatic and/or manual face detection processes. Here, the facial features are simply cut and pasted into image 1906 . However, in other embodiments, other augmentation filters can be applied, such as the alpha blending algorithm described above.”)
Regarding claim 45, Peevers and Nauseef teach the method of claim 44, further comprising: 20tracking the second user's face in real-time. (¶0107 “ In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”)
Regarding claim 46, Peevers and Nauseef teach the method of claim 1, wherein the version of the captured audio is rendered as if spoken by the particular augmented reality (AR) character in the story (0179 of Peevers “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”)
2.	Claim 36 is rejected under 35 U.S.C. 103 as being unpatentable over Peevers et al, U.S Patent Application No. 20140192140 (“Peevers”) in view of Nauseef et al, U.S Patent Application Publication No. 2016/0191958 (“Nauseef”) further in view of Billinghurst, Mark, Hirokazu Kato, and Ivan Poupyrev. "The magicbook-moving seamlessly between reality and virtuality." IEEE Computer Graphics and applications 21.3 (2001): 6-8. (“Billingburst”)
 Regarding claim 36, Peevers and Nauseef teach the method of claim 1, wherein the scene captured in (A)(1) comprises a unified space (¶017Some embodiments augment and/or modify video capture data as part of a shared story experience. A reader and/or participant can upload video and incorporate a modified version of the video capture data into the story. In some cases, one or more filters can be applied to the video to modify its appearance, such as a high-pass filter, a low-pass filter (to blur an image), edge-enhancement techniques, colorized filters (e.g. index an arbitrary RGB table using a luminance channel of the source image), distortion filters (ripple, lens, vertical waves, horizontal waves, and so forth), sepia tone filtering, and so forth. For example, a “rotoscoping” filter can modify the appearance of a “real world” image to a “cartoon world” image. Rotoscoping can be achieved using a combination of several filters (for example, applying contrast enhancement, then converting from RGB color space to HSV color space, then quantizing the V coordinate very coarsely). One stage of professional rotoscoping typically involves rendering an outline around each face to be rotoscoped and then applying a rotoscoping algorithm. Alternately or additionally, the visual background of the story might be personalized into something familiar to the participants. For example, the background may be a picture of a participant's bedroom, house or neighborhood. Thus, images and/or objects within the story can be combined with at least part of a video capture and/or still image. For instance, an electronic story may include an image and/or object that displays a cartoon character sitting in a bedroom. In some embodiments, an image of a separate bedroom can be uploaded and combined with the cartoon character such that a resulting image and/or objects displays the cartoon character sitting in the separate bedroom. Further, in at least some embodiments, a reader's body motions can be captured, similar to Kinect-type scenarios, and used to drive the animation of a character in the story.”; ¶0105 “The content management unit 312 and/or the features unit 322 may then identify one or more contextual features relevant to the determined location of the user (e.g., relevant to the identified locational cues and/or objects of interest). For example, via analysis of live video feed of a user, the location determination unit 314 , the facial/vocal recognition unit 318 , the gesture analysis unit 320 , and/or the features unit 322 may identify a recognizable landmark, such as the Big Ben clock tower in London, in the background of the live video feed (see exemplary user interface 600 of FIG. 6).”¶ 0168] Video capture 1504 represents video images that have been received by end user terminal 102 . In this example, video capture 1504 is generated by camera 1502 and stored locally on end user terminal 102 . However, it is to be appreciated that video capture 1504 can also be stored remotely from end user terminal 102 without departing from the scope of the claimed subject matter. Thus, end user terminal 102 can acquire video capture in any suitable manner, such as through a camera directly connected to end user terminal 102 (as illustrated here), or through remote connections. In some embodiments, video capture can include images of one or more persons, such as the one or more participants and/or readers of the shared story experience. Here, video capture image 1506 represents one of a plurality of still images which comprise video capture 1504 . For simplicity, discussions will be made with reference to video capture image 1506 . However, it is to be appreciated that functionality described with reference to video capture image 1506 is equally applicable to video capture 1504 and/or the plurality of images.) In addition, the same motivation is used as the rejection for claim 1. Both Peevers and Nauseef are understood to be silent on the remaining limitation so claim 36. 
In the same field of endeavor, Billinghurst teaches wherein the scene captured comprises a unified space , and wherein the 5particular event rendered in (B)(1) provides a view of the unified space( see section MagicBook interface, paragraphs three and fourth “Real books often serve as the focus for face-to-face collaboration and in a similar way multiple people can use the MagicBook interface at the same time. Several readers can look at the same book and share the story together. If they’re using the augmented reality displays, they can each see the virtual models from their own viewpoint. Since they can see each other at the same time as the virtual models, they can easily communicate using normal face-to-face conversation cues. Multiple users can immerse in the same virtual scene where they’ll see each other represented as virtual characters (Figure 3a). More interestingly, one or more people may immerse themselves in the virtual world while  others view the content as an augmented reality scene. In this case, those viewing the augmented reality scene will see a miniature avatar of the immersive user in the virtual world (Figure 3b). In the immersive world, people viewing the augmented reality scene appear as large, virtual heads looking down from the sky. This way, people are always aware of the other users of the interface and where they are looking.”)
Therefore, in combination of Peevers and Nauseef , it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the method of communicate and interact with story-based shared , interactive content in real-time between two or more remote participants of Peevers with multiple users look at the same book and see the virtual models from their own view as seen in Billinghurst because this modification would allow users to see virtual objects appearing on the pages of the book from their own viewpoint (see section MagicBook interface, paragraph six of Billinghurst)
 Thus, the combination of Peevers, Nauseef and Billinghurst wherein the scene captured in (A)(1) comprises a unified space, and wherein the 5particular event rendered in (B)(1) provides a view of the unified space.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARAH LE whose telephone number is (571)270-7842. The examiner can normally be reached Monday: 8AM-4:30PM EST, Tuesday: 8 AM-3:30PM EST, Wednesday: 8AM-2:30PM EST, Thursday and Friday off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SARAH LE/Primary Examiner, Art Unit 2619