Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Election/Restrictions
Applicant’s election without traverse of Group I drawn of claims 1-36  in the reply filed on 01/29/2022 is acknowledged.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 34 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 34 depends from claim 27 and recites the limitation " wherein said transitioning in (B)(2) in lines 1-2.  However, claim 27 does not recite “transitioning in (B)(2)”.  It is unclear which claim 34 depends from and whether the claim 34 depends from claim 26 or claim 33 . For purpose the examination, Examiner interprets claim 34 depends from claim 33.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

1.	Claim 1-4, 8-11, 26, 33 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kamhi et al, U.S Patent Application No. 20160065860 (“Kamhi”)

    PNG
    media_image1.png
    432
    311
    media_image1.png
    Greyscale

Regarding independent claim 1, Kamhi teaches a method, with a device having at least one camera and a display(Fig.1), the method comprising: 
(A) capturing a scene with said at least one camera, the scene comprising a live view of a real-world physical environment (¶0028 “In some embodiments, plane calculation module 110 may receive real-time video feed 202 and may calculate one or more planes contained within real-time video feed 202 (e.g., the plane created by table 206). This may be accomplished through any conventional process, for example, by utilizing depth and color information contained within real-time video feed and captured by a camera (e.g., camera 104 of FIG. 1).”); and 5(B) for a story comprising a plurality of events (¶0020 “In embodiments, the portion of textual content may be associated with a digital scene (e.g., the digital scene depicted in augmentation video scene 114). This association of the portion of textual content with the digital scene may take any suitable form. For instance, the association may be contained in metadata associated with either or both of the portion of textual content or the digital scene; the association may be made via a relational database that relates the portion of textual content to the digital scene; the association may be made by packaging the digital scene and the portion of textual content into a single file; or any other suitable manner of association. In embodiments where the portion of textual content is associated with the digital scene by being packaged into a single file, the single file may contain additional portions of textual content along with additional digital scenes, respectively associated with the additional portions of textual content. For example, if the textual content is a digital book, then the portions of textual content may correspond with chapters, pages, or passages of the digital book and each of the chapters, pages, or passages may be individually associated with respective digital scenes which may all be contained within a single file. The digital scene may include static images and/or animated images to augment the portion of textual content.” Where digital book has textual content corresponds with pages, chapters, or passages, digital scenes), (B)(1) rendering a particular event of said plurality of events on said display, wherein said rendering of said particular event augments the scene captured in (A) by said at least one camera(¶0035 “FIG. 4 is an illustrative depiction of navigation from a first portion of textual content 304 to a second portion of textual content 402. FIG. 4 continues from FIG. 3 and as a result some of the same reference numbers are utilized therein. As depicted, computing device 302 may begin with a rendering of a first portion of textual content 304 along with an augmentation video feed 306 depicting a digital scene, associated with the first portion of textual content 304, incorporated with a real-time video feed capture by the camera integrated with computing device 302.” where A boat 312 of the digital scene incorporated with a real-time video feed capture is table 314.)
Regarding claim 2, Kamhi teaches the method of claim 1, further comprising: (B)(2) transitioning to a next event of said plurality of events (¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.”); and, (B)(3) in response to said transitioning in (B)(2), rendering said next event of said plurality of events on said display(¶0037 “Once content augmentation environment has received such input from the user, the content augmentation environment may cause the second portion of textual content to be rendered on the display device of computing device 302 and may also cause a new digital scene associated with the second portion of textual content to be incorporated with the real-time video feed into a new augmentation video feed 406. As depicted, the real-time video feed may not change unless there is a change to the orientation of the camera capturing the video feed. As such, augmentation video feed 406 includes table 314 from the real-time video feed incorporated with the new digital scene, depicted here as dolphin 408 jumping out of water 410.”).
Regarding claim 3, Kamhi teaches the method of claim 2, wherein said particular event includes event transition information(¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.” Where interacting with a second portion of textual content), and wherein said transitioning in (B)(2) occurs in accordance with said event transition information (¶0037 “Once content augmentation environment has received such input from the user, the content augmentation environment may cause the second portion of textual content to be rendered on the display device of computing device 302 and may also cause a new digital scene associated with the second portion of textual content to be incorporated with the real-time video feed into a new augmentation video feed 406. As depicted, the real-time video feed may not change unless there is a change to the orientation of the camera capturing the video feed. As such, augmentation video feed 406 includes table 314 from the real-time video feed incorporated with the new digital scene, depicted here as dolphin 408 jumping out of water 410.”).
Regarding claim 4, Kamhi teaches the method of claim 1, wherein a transition is based on one or more of:(a) a period of time; and/or (b) a user interaction; and/or 5(c) a user gesture (¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.” Where a user interaction as the user navigates to a second portion of textual content or the user input)
Regarding claim 8, Kamhi teaches the method of claim 1, wherein said particular event comprises one or more of: (i) audio information; (ii) textual information; and (iii) 20augmented reality (AR) information, and wherein rendering of said particular event in (B)(1) comprises rendering one or more of: (x) audio information associated with said event; (y) textual information associated with said event; and (z) AR information associated with said event (¶0035 “FIG. 4 is an illustrative depiction of navigation from a first portion of textual content 304 to a second portion of textual content 402. FIG. 4 continues from FIG. 3 and as a result some of the same reference numbers are utilized therein. As depicted, computing device 302 may begin with a rendering of a first portion of textual content 304 along with an augmentation video feed 306 depicting a digital scene, associated with the first portion of textual content 304, incorporated with a real-time video feed capture by the camera integrated with computing device 302.” where A boat 312 of the digital scene is considered as AR information associated with said event)
Regarding claim 9, Kamhi teaches the method of claim 1, further comprising: repeating act (B)(1) for multiple events in said story (¶0044-0045 “The process may begin at block 702 where content augmentation environment may receive input to navigate to another portion of textual content associated with another digital scene. In embodiments, the input may be received in response to a user of the computing device interacting with navigational input portion of a display device, such as 402 of FIG. 4. In other embodiments the input may be received in response to a user selecting another portion of textual content from a list of various portions of textual content such as a table of contents, index, etc. [0045] Once the input to navigate to another portion of textual content has been received by the content augmentation environment, the process may proceed to block 704 where content augmentation environment may receive a real-time video feed captured by a camera coupled with the content augmentation environment. At block 706 content augmentation environment may dynamically adapt, as discussed elsewhere herein, a portion of the another digital scene based on the real-time video feed for rendering on a display device coupled with the content augmentation environment” where user navigates to another portion of textual which is considered repeating act (B)(1) for multiple events in said story)
Regarding claim 10, Kamhi teaches the method of claim 1, wherein the at least one camera sand the display are integrated in the device (¶0017 as shown in Fig.1 “ FIG. 1 illustrates a computing environment 100 in accordance with various embodiments of the present disclosure. Computing environment 100 may include computing device 102 which may include content augmentation environment 122. Content augmentation environment 122 may include a digital content module 106, an augmentation module 108, and a plane calculation module 110. Each of these modules is discussed further below. Computing environment 100 may further include a camera 104 coupled with computing device 102. While depicted herein as being integrated into computing device 102, camera 104 may, in some embodiments, be peripherally attached to computing device 102. In embodiments where camera 104 is peripherally attached, camera 104 may be communicatively coupled with computing device 104 via any wired or wireless connection suitable for transmitting data captured by camera 104.”)
Regarding claim 11, Kamhi teaches the method of claim 1, wherein the device is a mobile phone or a tablet device (¶0024 “While computing device 102 is depicted herein as a tablet, it will be appreciated that this is merely for illustrative purposes. Computing device 102 may take the form of any type of portable or stationary computing device, such as, but not limited to, a smart phone, tablet, laptop, desktop, kiosk, or wearable computing devices such as, for example, Google Glass. Any computing device capable of carrying out the processes described herein is contemplated by this disclosure.”)
Regarding independent claim 26, Kamhi teaches a method comprising:
 (A) capturing a scene from a first camera associated with a first device ishaving a first display, the scene comprising a live view of a real-world physical environment(¶0028 “In some embodiments, plane calculation module 110 may receive real-time video feed 202 and may calculate one or more planes contained within real-time video feed 202 (e.g., the plane created by table 206). This may be accomplished through any conventional process, for example, by utilizing depth and color information contained within real-time video feed and captured by a camera (e.g., camera 104 of FIG. 1).”); (B) for a story comprising a plurality of events (¶0020 “In embodiments, the portion of textual content may be associated with a digital scene (e.g., the digital scene depicted in augmentation video scene 114). This association of the portion of textual content with the digital scene may take any suitable form. For instance, the association may be contained in metadata associated with either or both of the portion of textual content or the digital scene; the association may be made via a relational database that relates the portion of textual content to the digital scene; the association may be made by packaging the digital scene and the portion of textual content into a single file; or any other suitable manner of association. In embodiments where the portion of textual content is associated with the digital scene by being packaged into a single file, the single file may contain additional portions of textual content along with additional digital scenes, respectively associated with the additional portions of textual content. For example, if the textual content is a digital book, then the portions of textual content may correspond with chapters, pages, or passages of the digital book and each of the chapters, pages, or passages may be individually associated with respective digital scenes which may all be contained within a single file. The digital scene may include static images and/or animated images to augment the portion of textual content.” Where digital book has textual content corresponds with pages, chapters, or passages, digital scenes), (B)(1) rendering a particular event of said plurality of events on said first display, wherein said rendering of said event augments the scene captured by 20said first camera(¶0035 “FIG. 4 is an illustrative depiction of navigation from a first portion of textual content 304 to a second portion of textual content 402. FIG. 4 continues from FIG. 3 and as a result some of the same reference numbers are utilized therein. As depicted, computing device 302 may begin with a rendering of a first portion of textual content 304 along with an augmentation video feed 306 depicting a digital scene, associated with the first portion of textual content 304, incorporated with a real-time video feed capture by the camera integrated with computing device 302.” where A boat 312 of the digital scene incorporated with a real-time video feed capture is table 314.); and (B)(2) transitioning to a next event of said plurality of events (¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.”). 
Regarding claim 33, Kamhi teaches the method of claim 26, wherein said particular event includes event transition information(¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.” Where interacting with a second portion of textual content as event transition information), and wherein said transitioning in (B)(2) 25occurs in accordance with said event transition information (¶0037 “Once content augmentation environment has received such input from the user, the content augmentation environment may cause the second portion of textual content to be rendered on the display device of computing device 302 and may also cause a new digital scene associated with the second portion of textual content to be incorporated with the real-time video feed into a new augmentation video feed 406. As depicted, the real-time video feed may not change unless there is a change to the orientation of the camera capturing the video feed. As such, augmentation video feed 406 includes table 314 from the real-time video feed incorporated with the new digital scene, depicted here as dolphin 408 jumping out of water 410.”). 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
1.	Claims 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over Kamhi et al, U.S Patent Application No. 20160065860 (“Kamhi”) in view of Morrison, U.S Patent Application No. 20160253746 (“Morrision”)
Regarding claim 5, Kamhi teaches the method of claim 4,  Kamhi teaches a transition is based on the user input but does not teach the user gesture.
In the same field of endeavor, Morrison teaches wherein the user gesture is determined based on one or more of: (i) an image obtained by said device; and (ii) on movement and/or orientation of said device (¶0043 “Since the infrastructure 200 may support gesture input, input devices 20, such as cameras, may be trained on the user 202 to detect gestures. Furthermore, in the case of cameras used to detect the user's reactions and facial expressions, the input devices 204 may thereby be used to collect input for a cognitive modeling based analysis to determine the user's positive or negative attitude towards the augmented reality environment and objects rendered)
Therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify user input to navigate and interact with the portion of textual contents of Kamhi with using camera to detect the user’s gestures as seen in Morrison because this modification would provide gesture input as user input(¶0043 of Morrison)
Thus, the combination of Kamhi and Morrison teaches the method of claim 4,  wherein the user gesture is determined based on one or more of: (i) an image obtained by said device; and (ii) on movement and/or orientation of said device.
Regarding claim 6, Kamhi teaches the method of claim 4, Kamhi teaches a transition is based on the user input but does not teach the user gesture.
In the same field of endeavor, Morrison teaches wherein the user gesture comprises a facial gesture and/or a body gesture (¶0043 “Since the infrastructure 200 may support gesture input, input devices 20, such as cameras, may be trained on the user 202 to detect gestures. Furthermore, in the case of cameras used to detect the user's reactions and facial expressions, the input devices 204 may thereby be used to collect input for a cognitive modeling based analysis to determine the user's positive or negative attitude towards the augmented reality environment and objects rendered” ;¶0093] FIG. 6 is an exemplary user experience process 600 for augmented reality e-commerce. At block 602, the web service 214 may receive user input from a user as captured by the input devices 204. The input devices 204 may include cameras, microphones, touch surfaces, wearable sensors, or other sensing devices that are configured to capture motions or actions of the user. For example, the user input may include specific gestures made by the user via the user's arms or hands, gestures made using the fingers (e.g., a pinch command), selection of virtual icons, and/or voice commands. In some embodiments, the input devices 204 may be configured to capture motions or actions that are performed by persons with disabilities. Such motions or actions may include movement of facial features (e.g., winking of eyes, twitching of facial muscles), blowing of air through a person's mouth in specific patterns, movement of limbs or head in particular ways, and/or so forth) In addition, the same motivation is used as the rejection for claim 5.
Thus, the combination of Kamhi and Morrison teaches the method of claim 4, wherein the user gesture comprises a facial gesture and/or a body gesture.
Regarding claim 7, Kamhi teaches the method of claim 4, Kamhi is understood to be silent on the remaining limitations
However, Morrison teaches wherein the user interaction 15comprises one or more of: a user voice command; and a user touching a screen or button on said device (¶0094 “[0094] At decision block 604, the web service 214 may determine whether the user input includes a command to perform a query for a particular 3D product representation. In various embodiments, the web service 214 may include command processing logic that recognizes query commands. For example, the input devices 204 may translate a touch gesture or a voice command regarding a particular product into a command. In turn, the command may be recognized by the command processing logic of the web service 214 as a query command Thus, if the user input is recognized at decision block 604 by the web service 214 as a query command, the process 600 may proceed to 606.)
Therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify user input to navigate and interact with the portion of textual contents of Kamhi with user input as a touch gesture or a voice command as seen in Morrison because this modification would translate such inputs into commands (¶0093 of Morrison)
2.	Claims 12, 23, 27-28, 30 are rejected under 35 U.S.C. 103 as being unpatentable over Kamhi et al, U.S Patent Application No. 20160065860 (“Kamhi”) in view of Thankavel, U.S Patent Application No. 20160217699 (“Thankavel”)
Regarding claim 12, Kamhi teaches the method of claim 1, further comprising: (D) rendering, on said display, the particular event of said plurality of events in (B)(1) (¶0035 “FIG. 4 is an illustrative depiction of navigation from a first portion of textual content 304 to a second portion of textual content 402. FIG. 4 continues from FIG. 3 and as a result some of the same reference numbers are utilized therein. As depicted, computing device 302 may begin with a rendering of a first portion of textual content 304 along with an augmentation video feed 306 depicting a digital scene, associated with the first portion of textual content 304, incorporated with a real-time video feed capture by the camera integrated with computing device 302.” where A boat 312 of the digital scene incorporated with a real-time video feed capture is table 314.)
 In the same field of endeavor, Thankavel teaches further comprising: (C) obtaining a user image from at least one second camera (¶0011 “The present invention has different control mean: (a) a scanner to scan a graphics pattern to activate and use the picture book application, using commercially available devices (smart phone, tablet, PC, laptop); (b) an input device to use a commercial smart device (smart phone, tablet, PC, laptop) to take a picture of participants' faces, select a caricature of the participants' faces or use actual faces of participants and select an avatar to use with participants' faces or caricatures of participants' faces and to interact with the AR-Book application; (c) a computer to generate 3D graphics or a video that immerses participants into the picture book story line; and (d) an interface to display the video and/or graphics in response to the participants' picture book and a computer.” ¶0038 “FIG. 4 (a and b) shows a third perspective view of the AR-Book application displaying the face silhouette 14. The AR-Book application database 12 displays a participant's name and a silhouette of a face 14 and asks the user 3 to put their face into the silhouette to take a picture or upload their face (FIG. 4b).; and (D) rendering, on said display, a version of the user image with the particular event of said plurality of events in (B)(1).”) and (D) rendering, on said display, a version of the user image with the particular event of said plurality of events in (B)(1) (¶0049] FIG. 15 (a and b) shows a fourteenth perspective view of the AR-Book application 12, which allows the user 3 to select and save the selected avatar 29 or select a different avatar 28, 22, 27. The process repeats the steps of displaying avatars 21 for the user 3 to select an avatar for each participant (FIG. 11), allowing the user 3 to select the desired avatar 22 for each participant (FIG. 12), attaching participant's face (either real or caricature) to the head of the avatar and displaying 26 (FIG. 13), displaying the text to ask the users 3 if the they want to keep the selected avatar 27 or re-select the face 28a (FIG. 14), allowing the user to select and save the selected avatar 29 or select a different avatar 28, 22, 27 (FIG. 15) until an avatar for each participant is accepted and saved by the user. ¶0052 “FIG. 17 is a sixteenth perspective view of the AR-Book application activating a video, 2D or a 3D graphics 35 based on each page's content. The AR-Book application 12 activates a 3D graphic or video 35 based on each page's content, from the first page until the last page, displaying the participant's avatars into the story line of each page for total immersion into the story line. The video or 3D interactive animated objects appear on the page in the form of a short clip played in that page only. This applies to a respective page, which has Augmented Reality application and every page may or may not have Augmented Reality application and these pages are decided when the book is designed. When the user activates the video mode, it is supported by an audio to play in real-time and the user interactions including hand, facial and eye movements are interactive with the audio.” Where display avatar(attaching participant’s face) into the story line of each page)
Therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify the method of adapting the digital scene, based at least in part on a real-time video feed, to be rendered on the one or more display devices to augment the textual content may be a page from an electronic book of Kamhi with display avatar with attaching participant’s face into the story line of each page as seen in Thankavel because this modification would allow participants to be immersed into the picture book as one of the characters in the picture book using their real face  to the selected character (avatar) in the picture book (abstract of Thankavel).
Thus, the combination of Kamhi and Thankavel teaches further comprising: (C) obtaining a user image from at least one second camera; and (D) rendering, on said display, a version of the user image with the particular event of said plurality of events in (B)(1).
Regarding claim 23, Kamhi and Thankavel teach the method of claim 12, wherein the at least one 5second camera is associated with another device, distinct from said device (¶0018 of Kamhi “Camera 104 may be disposed in computing environment 102 and configured to capture a real-time video feed of a physical scene (e.g., physical scene 116).” ¶0038 “FIG. 4 (a and b) shows a third perspective view of the AR-Book application displaying the face silhouette 14. The AR-Book application database 12 displays a participant's name and a silhouette of a face 14 and asks the user 3 to put their face into the silhouette to take a picture or upload their face (FIG. 4b) where take picture or upload their faces is considered as one 5second camera is associated with another device, distinct from said device) In addition, the same motivation is used as the rejection for claim 12.
Regarding claim 27, Kamhi teaches the method of claim 26, wherein said rendering of said event also augments the scene (Fig.4) Kamhi is understood to be silent on the remaining limitations of claim 27.
In the same field of endeavor, Thankayel teaches wherein said rendering of said event also augments the scene with information associated with at least one other 25device (¶0052 “FIG. 17 is a sixteenth perspective view of the AR-Book application activating a video, 2D or a 3D graphics 35 based on each page's content. The AR-Book application 12 activates a 3D graphic or video 35 based on each page's content, from the first page until the last page, displaying the participant's avatars into the story line of each page for total immersion into the story line. The video or 3D interactive animated objects appear on the page in the form of a short clip played in that page only. This applies to a respective page, which has Augmented Reality application and every page may or may not have Augmented Reality application and these pages are decided when the book is designed. When the user activates the video mode, it is supported by an audio to play in real-time and the user interactions including hand, facial and eye movements are interactive with the audio.” Where displaying the participant’s avatars into the story line of each page is considered with information associated with at least one other device) In addition, the same motivation is used as the rejection for claim 12.
Thus, the combination of Kamhi and Thankayel teaches wherein said rendering of said event also augments the scene with information associated with at least one other 25device
Regarding claim 28, Kamhi and Thankayel teach the method of claim 27, wherein said information associated with said at least one other device corresponds to on one or more of: 5(i) an image captured by said at least one other device; and/or (ii) an image representing or corresponding to said at least one other device; and/or (iii) audio from said at least one other device (¶0038 of Thankayel “ FIG. 4 (a and b) shows a third perspective view of the AR-Book application displaying the face silhouette 14. The AR-Book application database 12 displays a participant's name and a silhouette of a face 14 and asks the user 3 to put their face into the silhouette to take a picture or upload their face (FIG. 4b).” ¶0049 of Thankayel” FIG. 15 (a and b) shows a fourteenth perspective view of the AR-Book application 12, which allows the user 3 to select and save the selected avatar 29 or select a different avatar 28, 22, 27. The process repeats the steps of displaying avatars 21 for the user 3 to select an avatar for each participant (FIG. 11), allowing the user 3 to select the desired avatar 22 for each participant (FIG. 12), attaching participant's face (either real or caricature) to the head of the avatar and displaying 26 (FIG. 13), displaying the text to ask the users 3 if the they want to keep the selected avatar 27 or re-select the face 28a (FIG. 14), allowing the user to select and save the selected avatar 29 or select a different avatar 28, 22, 27 (FIG. 15) until an avatar for each participant is accepted and saved by the user. ¶0052 if Thankayel “FIG. 17 is a sixteenth perspective view of the AR-Book application activating a video, 2D or a 3D graphics 35 based on each page's content. The AR-Book application 12 activates a 3D graphic or video 35 based on each page's content, from the first page until the last page, displaying the participant's avatars into the story line of each page for total immersion into the story line. The video or 3D interactive animated objects appear on the page in the form of a short clip played in that page only. This applies to a respective page, which has Augmented Reality application and every page may or may not have Augmented Reality application and these pages are decided when the book is designed. When the user activates the video mode, it is supported by an audio to play in real-time and the user interactions including hand, facial and eye movements are interactive with the audio.”) In addition, the same motivation is used as the rejection for claim 12.
Regarding claim 30, Kamhi and Thankayel teach the method of claim 28, wherein said image representing or corresponding to said at least one other device comprises an avatar (¶0052 if Thankayel “FIG. 17 is a sixteenth perspective view of the AR-Book application activating a video, 2D or a 3D graphics 35 based on each page's content. The AR-Book application 12 activates a 3D graphic or video 35 based on each page's content, from the first page until the last page, displaying the participant's avatars into the story line of each page for total immersion into the story line. The video or 3D interactive animated objects appear on the page in the form of a short clip played in that page only. This applies to a respective page, which has Augmented Reality application and every page may or may not have Augmented Reality application and these pages are decided when the book is designed. When the user activates the video mode, it is supported by an audio to play in real-time and the user interactions including hand, facial and eye movements are interactive with the audio.”) In addition, the same motivation is used as the rejection for claim 12.
3.	Claims 13-19, 31-32, 40 are rejected under 35 U.S.C. 103 as being unpatentable over Kamhi et al, U.S Patent Application No. 20160065860 (“Kamhi”) in view of Thankavel, U.S Patent Application No. 20160217699 (“Thankavel”) further in view of Peevers et al, U.S Patent Application No. 20140192140 (“Peevers”)
Regarding claim 13, Kamhi and Thankayel teach the method of claim 12, wherein rendering a version of the user image in (D) comprises: Kamhi and Thankayal are understood to be silent on the remaining limitations of claim 13.
In the same field of endeavor, Peevers teaches animating at least a portion of the user image (¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”)
Therefore, in combination of Kamhi and Thankavel, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify display avatars using their real face into the story line of each page of Thankavel with the character is  animated accordingly to the facial expressions of the reader as seen in Peevers  because this modification would personalize the story experience by all participants associated with the story (¶0179 of Peevers).
	Thus, the combination of Kamhi , Thankavel and Peevers teaches wherein rendering a version of the user image in (D) comprises: animating at least a portion of the user image.
Regarding claim 14, Kamhi , Thankavel and Peevers teach the method of claim 13, wherein the portion of the 20image comprises the user's face (¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”) In addition, the same motivation is used as the rejection for claim 13.
Regarding claim 15, Kamhi and Thankavel teach the method of claim 12, further comprising: Kamhi and Thankavel are understood to be silent on the remaining limitations of claim 15
In the same field of endeavor, Peevers teaches further comprising:  recognizing the user's face in the user image (¶0170] In some embodiments, face detection algorithms can automatically detect a face and/or regions of a face in the video capture. These algorithms can identify facial features within a video and/or still image, while ignoring and/or disregarding other objects within the image. For instance, consider FIG. 16, which depicts aspects of facial detection algorithms 1602a, 1602b, and 1602c applied to video capture image 1506 of FIG. 15. Facial detection algorithm 1602a represents an algorithm that generally detects a face and marks a location of the face using a box. In this example, a rectangular box is used to define region 1604 to identify where detected face is located. Any suitable size and shape can be used, such as a square box, an oval box, a circular box, and so forth. Alternately or additionally, the size of the region may change based upon how much of an image contains the detected face. In some cases, this general identification might be suitable in environments where there are less processing capabilities available.”)
Therefore, in combination of Kamhi and Thankavel, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify display avatars using their real face into the story line of each page of Thankavel with using face detection algorithm as seen in Peevers  because this modification would automatically detect a face and/or regions of a face in the video capture (¶0170 of Peevers).
Regarding claim 16, Kamhi, Thankavel and Peevers teach the method of claim 14, further comprising: tracking the user's face in real-time (¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”)
Regarding claim 17, Kamhi and Thankavel teach the method of claim 12, Kamhi and Thankavel are understood to be silent on the remaining limitations of claim 17.
In the same field of endeavor, Peevers teaches wherein the rendering in (C) is 5based on real time tracking of the user's face in the user image (¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”) In addition, the same motivation is used as the rejection for claim 15.
Regarding claim 18, Kamhi, Thankavel and Peevers teach the method of claim 13, wherein said at least one second camera is associated with a second device (¶0038 of Thankayel “¶0011 “The present invention has different control mean: (a) a scanner to scan a graphics pattern to activate and use the picture book application, using commercially available devices (smart phone, tablet, PC, laptop); (b) an input device to use a commercial smart device (smart phone, tablet, PC, laptop) to take a picture of participants' faces, select a caricature of the participants' faces or use actual faces of participants and select an avatar to use with participants' faces or caricatures of participants' faces and to interact with the AR-Book application; (c) a computer to generate 3D graphics or a video that immerses participants into the picture book story line; and (d) an interface to display the video and/or graphics in response to the participants' picture book and a computer.”; ¶0175 of Peevers “Some embodiments augment and/or modify video capture data as part of a shared story experience. A reader and/or participant can upload video and incorporate a modified version of the video capture data into the story.”), and 10wherein said animating is based, at least in part, on manipulation and/or movement of the second device (¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 of Peevers “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”) In addition, the same motivation is used as the rejection for claim 13.
Regarding claim 19, Kamhi, Thankavel and Peevers teach the method of claim 18, wherein the second device comprises a mobile phone or a tablet device (¶0011 of Thankavel “The present invention has different control mean: (a) a scanner to scan a graphics pattern to activate and use the picture book application, using commercially available devices (smart phone, tablet, PC, laptop); (b) an input device to use a commercial smart device (smart phone, tablet, PC, laptop) to take a picture of participants' faces, select a caricature of the participants' faces or use actual faces of participants and select an avatar to use with participants' faces or caricatures of participants' faces and to interact with the AR-Book application; (c) a computer to generate 3D graphics or a video that immerses participants into the picture book story line; and (d) an interface to display the video and/or graphics in response to the participants' picture book and a computer.”) In addition, the same motivation is used as the rejection for claim 13.
Regarding claim 31, Kamhi and Thankavel teach the method of claim 28, Kamhi and Thankavel are understood to be silent on the remaining limitations of claim 31.
In the same field of endeavor, Peevers teaches wherein said image representing or corresponding to said at least one other device is animated (¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”) In addition, the same motivation is used as the rejection for claim 13.
Regarding claim 32, Kamhi, Thankavel and Peevers teach the method of claim 31, wherein said image is 20animated, at least in part, by manipulation and/or movement of the at least one other device (¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 of Peevers “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”) In addition, the same motivation is used as the rejection for claim 13.
Regarding independent claim 40, Kamhi teaches a method, with a first device having at least one camera and a display (Fig.1), the method comprising: (A) capturing a scene with said at least one camera, the scene comprising a live view of a real-world physical environment(¶0028 “In some embodiments, plane calculation module 110 may receive real-time video feed 202 and may calculate one or more planes contained within real-time video feed 202 (e.g., the plane created by table 206). This may be accomplished through any conventional process, for example, by utilizing depth and color information contained within real-time video feed and captured by a camera (e.g., camera 104 of FIG. 1).”); and 5(B) for a story comprising a plurality of events(¶0020 “In embodiments, the portion of textual content may be associated with a digital scene (e.g., the digital scene depicted in augmentation video scene 114). This association of the portion of textual content with the digital scene may take any suitable form. For instance, the association may be contained in metadata associated with either or both of the portion of textual content or the digital scene; the association may be made via a relational database that relates the portion of textual content to the digital scene; the association may be made by packaging the digital scene and the portion of textual content into a single file; or any other suitable manner of association. In embodiments where the portion of textual content is associated with the digital scene by being packaged into a single file, the single file may contain additional portions of textual content along with additional digital scenes, respectively associated with the additional portions of textual content. For example, if the textual content is a digital book, then the portions of textual content may correspond with chapters, pages, or passages of the digital book and each of the chapters, pages, or passages may be individually associated with respective digital scenes which may all be contained within a single file. The digital scene may include static images and/or animated images to augment the portion of textual content.” Where digital book has textual content corresponds with pages, chapters, or passages, digital scenes), (B)(1) rendering a particular event of said plurality of events on said display, wherein said rendering of said particular event augments the scene captured in (A) by said at least one camera(¶0035 “FIG. 4 is an illustrative depiction of navigation from a first portion of textual content 304 to a second portion of textual content 402. FIG. 4 continues from FIG. 3 and as a result some of the same reference numbers are utilized therein. As depicted, computing device 302 may begin with a rendering of a first portion of textual content 304 along with an augmentation video feed 306 depicting a digital scene, associated with the first portion of textual content 304, incorporated with a real-time video feed capture by the camera integrated with computing device 302.” where A boat 312 of the digital scene incorporated with a real-time video feed capture is table 314.); (B)(2) transitioning to a next event of said plurality of events (¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.”); and, 10(B)(3) in response to said transitioning in (B)(2), rendering said next event of said plurality of events on said display(¶0037 “Once content augmentation environment has received such input from the user, the content augmentation environment may cause the second portion of textual content to be rendered on the display device of computing device 302 and may also cause a new digital scene associated with the second portion of textual content to be incorporated with the real-time video feed into a new augmentation video feed 406. As depicted, the real-time video feed may not change unless there is a change to the orientation of the camera capturing the video feed. As such, augmentation video feed 406 includes table 314 from the real-time video feed incorporated with the new digital scene, depicted here as dolphin 408 jumping out of water 410.”); Application No. 16/675,196Docket 4062-0008-US Response to Restriction Requirement Page 11 of 14 wherein said particular event includes event transition information(¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.” Where interacting with a second portion of textual content), and 5wherein said transitioning in (B)(2) occurs in accordance with said event transition information(¶0037 “Once content augmentation environment has received such input from the user, the content augmentation environment may cause the second portion of textual content to be rendered on the display device of computing device 302 and may also cause a new digital scene associated with the second portion of textual content to be incorporated with the real-time video feed into a new augmentation video feed 406. As depicted, the real-time video feed may not change unless there is a change to the orientation of the camera capturing the video feed. As such, augmentation video feed 406 includes table 314 from the real-time video feed incorporated with the new digital scene, depicted here as dolphin 408 jumping out of water 410.”)., wherein a transition is based on one or more of: (a) a period of time; and/or (b) a user interaction; and/or (c) a user gesture, the user gesture being determined based on one or 10more of: (i) an image obtained by said device; and (ii) on movement and/or orientation of said device (¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.” Where a user interaction as the user navigates to a second portion of textual content or the user input) Kamhi is understood to be silent on the remaining limitations of claim 40.
In the same field of endeavor, Thankavel teaches (C) obtaining a user image from at least one second camera (¶0011 “The present invention has different control mean: (a) a scanner to scan a graphics pattern to activate and use the picture book application, using commercially available devices (smart phone, tablet, PC, laptop); (b) an input device to use a commercial smart device (smart phone, tablet, PC, laptop) to take a picture of participants' faces, select a caricature of the participants' faces or use actual faces of participants and select an avatar to use with participants' faces or caricatures of participants' faces and to interact with the AR-Book application; (c) a computer to generate 3D graphics or a video that immerses participants into the picture book story line; and (d) an interface to display the video and/or graphics in response to the participants' picture book and a computer.” ¶0038 “FIG. 4 (a and b) shows a third perspective view of the AR-Book application displaying the face silhouette 14. The AR-Book application database 12 displays a participant's name and a silhouette of a face 14 and asks the user 3 to put their face into the silhouette to take a picture or upload their face (FIG. 4b); (D) rendering, on said display, a version of the user image with the particular event of said plurality of events in (B)(1) (¶0049] FIG. 15 (a and b) shows a fourteenth perspective view of the AR-Book application 12, which allows the user 3 to select and save the selected avatar 29 or select a different avatar 28, 22, 27. The process repeats the steps of displaying avatars 21 for the user 3 to select an avatar for each participant (FIG. 11), allowing the user 3 to select the desired avatar 22 for each participant (FIG. 12), attaching participant's face (either real or caricature) to the head of the avatar and displaying 26 (FIG. 13), displaying the text to ask the users 3 if the they want to keep the selected avatar 27 or re-select the face 28a (FIG. 14), allowing the user to select and save the selected avatar 29 or select a different avatar 28, 22, 27 (FIG. 15) until an avatar for each participant is accepted and saved by the user. ¶0052 “FIG. 17 is a sixteenth perspective view of the AR-Book application activating a video, 2D or a 3D graphics 35 based on each page's content. The AR-Book application 12 activates a 3D graphic or video 35 based on each page's content, from the first page until the last page, displaying the participant's avatars into the story line of each page for total immersion into the story line. The video or 3D interactive animated objects appear on the page in the form of a short clip played in that page only. This applies to a respective page, which has Augmented Reality application and every page may or may not have Augmented Reality application and these pages are decided when the book is designed. When the user activates the video mode, it is supported by an audio to play in real-time and the user interactions including hand, facial and eye movements are interactive with the audio.” Where display avatar(attaching participant’s face) into the story line of each page), wherein said at least one second camera is associated with a second device distinct from the first device (¶0038 “FIG. 4 (a and b) shows a third perspective view of the AR-Book application displaying the face silhouette 14. The AR-Book application database 12 displays a participant's name and a silhouette of a face 14 and asks the user 3 to put their face into the silhouette to take a picture or upload their face (FIG. 4b) where take a picture of upload their face is considered at least one second camera is associated with a second device distinct from the first device)
Therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify the method of adapting the digital scene, based at least in part on a real-time video feed, to be rendered on the one or more display devices to augment the textual content may be a page from an electronic book of Kamhi with display avatar with attaching participant’s face into the story line of each page as seen in Thankavel because this modification would allow participants to be immersed into the picture book as one of the characters in the picture book using their real face  to the selected character (avatar) in the picture book (abstract of Thankavel). Both Kamhi and Thankavel are understood to be silent on the remaining limitations of claim 40.
In the same field of endeavor, Peevers teaches wherein rendering a version 15of the user image in (D) comprises animating at least a portion of the user image(¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”), wherein the portion of the image comprises the user's face(¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”), wherein the rendering in (D) is based on real time tracking of the user's face in the user image(¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”); (E) capturing audio data from said device(¶0063 “With respect to augmentation that takes place at the sender's or reader's computing device, consider the following. When the reader's voice is captured, the augmentation effect module 112 processes the audio data that is received from associated microphone in order to impart some type of different characteristic to it, examples of which are provided above”); and (F) rendering a version of the captured audio with the particular event of 20said plurality of events in (B)(1) on at least one speaker associated with said device(¶0148 “FIG. 14 illustrates aspects of an implementation of a device 1400 in accordance with one or more embodiments. Device 1400 includes a microphone, camera, and speaker as illustrated”; ¶0057 “Speech or audio morphing refers to the manipulation of the voice of a reader or call participant in various ways to deliberately sound like someone or something else. In one or more embodiments, the intention is that these manipulations or morphings should be amusing and entertaining in various ways. For example, during the reading of an electronic story, the reader's voice could be morphed to sound like a chipmunk, a monster, or some other type of character in the story. Any suitable type of audio morphing software can be utilized to achieve the intended effects. Some audio morphing software is designed to manipulate the spoken voice, while other software is designed to manipulate the sound of human singing.”; ¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”), wherein the audio rendered in (F) is manipulated and/or augmented before being rendered(¶0060 of Peevers “The specific use of voice manipulation or morphing in the present context, as noted above, is intended for manipulation of a reader's voice as they read a shared story to a remote person.”;¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”), and wherein rendering a version of the user image in (D) comprises: animating at least a portion of the user image¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”), wherein said at least one second camera is associated with a second device distinct from the first device (“¶0175 of Peevers “Some embodiments augment and/or modify video capture data as part of a shared story experience. A reader and/or participant can upload video and incorporate a modified version of the video capture data into the story.”), and wherein said animating is based, at least in part, on manipulation and/or movement of the second device (¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 of Peevers “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”)
Therefore, in combination of Kamhi and Thankavel, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify display avatars using their real face into the story line of each page of Thankavel with the character is  animated accordingly to the facial expressions of the reader as seen in Peevers  because this modification would personalize the story experience by all participants associated with the story (¶0179 of Peevers)
 Thus, the combination of Kamhi, Thankayel and Peevers teaches a method, with a first device having at least one camera and a display, the method comprising: (A) capturing a scene with said at least one camera, the scene comprising a live view of a real-world physical environment; and 5(B) for a story comprising a plurality of events, (B)(1) rendering a particular event of said plurality of events on said display, wherein said rendering of said particular event augments the scene captured in (A) by said at least one camera; (B)(2) transitioning to a next event of said plurality of events; and, 10(B)(3) in response to said transitioning in (B)(2), rendering said next event of said plurality of events on said display; and (C) obtaining a user image from at least one second camera; (D) rendering, on said display, a version of the user image with the particular event of said plurality of events in (B)(1), wherein rendering a version 15of the user image in (D) comprises animating at least a portion of the user image, wherein the portion of the image comprises the user's face, wherein the rendering in (D) is based on real time tracking of the user's face in the user image; (E) capturing audio data from said device; and (F) rendering a version of the captured audio with the particular event of 20said plurality of events in (B)(1) on at least one speaker associated with said device, wherein the audio rendered in (F) is manipulated and/or augmented before being rendered, and wherein rendering a version of the user image in (D) comprises: animating at least a portion of the user image, Application No. 16/675,196Docket 4062-0008-US Response to Restriction Requirement Page 11 of 14 wherein said at least one second camera is associated with a second device distinct from the first device, and wherein said animating is based, at least in part, on manipulation and/or movement of the second device, wherein said particular event includes event transition information, and 5wherein said transitioning in (B)(2) occurs in accordance with said event transition information, wherein a transition is based on one or more of: (a) a period of time; and/or (b) a user interaction; and/or (c) a user gesture, the user gesture being determined based on one or 10more of: (i) an image obtained by said device; and (ii) on movement and/or orientation of said device.
4.	Claims 20-21, 24-25 and 34-35 are rejected under 35 U.S.C. 103 as being unpatentable over Kamhi et al, U.S Patent Application No. 20160065860 (“Kamhi”) in view of Peevers et al, U.S Patent Application No. 20140192140 (“Peevers”)
Regarding claim 20, Kamhi teaches the method of claim 1, further comprising: (F) rendering the particular event of said plurality of events in (B)(1) associated with said 20device (Fig.4 ) Kamhi is understood to be silent on the remaining limitations of claim 20.
In the same field of endeavor, Peevers teaches further comprising:
  (E) capturing audio data from said device (¶0063 “With respect to augmentation that takes place at the sender's or reader's computing device, consider the following. When the reader's voice is captured, the augmentation effect module 112 processes the audio data that is received from associated microphone in order to impart some type of different characteristic to it, examples of which are provided above”); and 
(F) rendering a version of the captured audio with the particular event of said plurality of events in (B)(1) on at least one speaker associated with said 20device (¶0148 “FIG. 14 illustrates aspects of an implementation of a device 1400 in accordance with one or more embodiments. Device 1400 includes a microphone, camera, and speaker as illustrated”; ¶0057 “Speech or audio morphing refers to the manipulation of the voice of a reader or call participant in various ways to deliberately sound like someone or something else. In one or more embodiments, the intention is that these manipulations or morphings should be amusing and entertaining in various ways. For example, during the reading of an electronic story, the reader's voice could be morphed to sound like a chipmunk, a monster, or some other type of character in the story. Any suitable type of audio morphing software can be utilized to achieve the intended effects. Some audio morphing software is designed to manipulate the spoken voice, while other software is designed to manipulate the sound of human singing.”; ¶0107 “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”)
Therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify the method of adapting the digital scene, based at least in part on a real-time video feed, to be rendered on the one or more display devices to augment the textual content may be a page from an electronic book of Kamhi with voice morphing as a story is read and/or augmenting audio story content as the story is read as seen in Peevers because this modification would modify the reader's voice in a manner chosen by each participant (¶0065 of Peevers).
Regarding claim 21, Kamhi and Peevers teach the method of claim 20, wherein the audio rendered in (F) is manipulated and/or augmented before being rendered (¶0060 of Peevers “The specific use of voice manipulation or morphing in the present context, as noted above, is intended for manipulation of a reader's voice as they read a shared story to a remote person.”;¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”) In addition, the same motivation is used as the rejection for claim 20.
Regarding claim 24, Kamhi teaches the method of claim 2, wherein said transitioning in (B)(2) is based on an action associated with device. (¶0037 of Kamhi as shown in Fig.4 ) Kamhi is understood to be silent on the remaining limitations of claim 24.
In the same field of endeavor, Peevers teaches wherein said transitioning in (B)(2) is based on an action associated with another device (¶[0160 “ Some of the above actions (for example, NEXTPAGE) might be initiated by any of the participants. A filtering/interlock mechanism precludes the various users' devices from getting out of synchrony. When a page change is requested locally, the command is immediately broadcast to all other participants. When a remote device receives this command, it will temporarily lock out any locally (to that device) generated page-change requests until it receives a PAGECHANGECOMPLETE message from the initiating device. The remote devices then enacts the command (e.g. turn to the next page), and then sends an acknowledgement (PAGECHANGEACKNOWLEDGE) message back to the initiating device. The page on the local (initiating) device is not changed until all remote devices have acknowledged receipt of the page-turn command. The local page is turned, and a PAGECHANGECOMPLETE message is broadcast. When remote devices receive this message, they are again free to respond to locally generated commands.” Where participants (remote and local users) can change the page of the story)
Therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify the method of adapting the digital scene, based at least in part on a real-time video feed, to be rendered on the one or more display devices to augment the textual content may be a page from an electronic book of Kamhi with the participants interacts with the book, control information corresponding to this interaction is transmitted to all other participants as seen in Peevers because this modification would initiate the same action on the corresponding devices (¶0158 of Peevers)
Thus, the combination of Kamhi and Peevers teaches wherein said transitioning in (B)(2) is based on an action associated with another device.
Regarding claim 25, Kamhi and Peevers teach the method of claim 24, wherein said transitioning in (B)(2) is triggered by said action associated with said other device (¶0037 of Kamhi “Once content augmentation environment has received such input from the user, the content augmentation environment may cause the second portion of textual content to be rendered on the display device of computing device 302 and may also cause a new digital scene associated with the second portion of textual content to be incorporated with the real-time video feed into a new augmentation video feed 406. As depicted, the real-time video feed may not change unless there is a change to the orientation of the camera capturing the video feed. As such, augmentation video feed 406 includes table 314 from the real-time video feed incorporated with the new digital scene, depicted here as dolphin 408 jumping out of water 410.”; ¶[0160 of Peevers “ Some of the above actions (for example, NEXTPAGE) might be initiated by any of the participants. A filtering/interlock mechanism precludes the various users' devices from getting out of synchrony. When a page change is requested locally, the command is immediately broadcast to all other participants. When a remote device receives this command, it will temporarily lock out any locally (to that device) generated page-change requests until it receives a PAGECHANGECOMPLETE message from the initiating device. The remote devices then enacts the command (e.g. turn to the next page), and then sends an acknowledgement (PAGECHANGEACKNOWLEDGE) message back to the initiating device. The page on the local (initiating) device is not changed until all remote devices have acknowledged receipt of the page-turn command. The local page is turned, and a PAGECHANGECOMPLETE message is broadcast. When remote devices receive this message, they are again free to respond to locally generated commands.”) In addition, the same motivation is used as the rejection for claim 24.
Regarding claim 34, Kamhi teaches the method of claim 27 (33), wherein said transitioning in (B)(2) occurs based on an action associated with device (¶0037 of Kamhi as shown in Fig.4 ) Kamhi is understood to be silent on the remaining limitations of claim 34.
In the same field of endeavor, Peevers teaches wherein said transitioning in (B)(2) occurs based on an action associated with said at least one other device (¶[0160 “ Some of the above actions (for example, NEXTPAGE) might be initiated by any of the participants. A filtering/interlock mechanism precludes the various users' devices from getting out of synchrony. When a page change is requested locally, the command is immediately broadcast to all other participants. When a remote device receives this command, it will temporarily lock out any locally (to that device) generated page-change requests until it receives a PAGECHANGECOMPLETE message from the initiating device. The remote devices then enacts the command (e.g. turn to the next page), and then sends an acknowledgement (PAGECHANGEACKNOWLEDGE) message back to the initiating device. The page on the local (initiating) device is not changed until all remote devices have acknowledged receipt of the page-turn command. The local page is turned, and a PAGECHANGECOMPLETE message is broadcast. When remote devices receive this message, they are again free to respond to locally generated commands.” Where participants (remote and local users) can change the page of the story)In addition, the same motivation is used as the rejection for claim 24.
Thus, the combination of Kamhi and Peevers teaches wherein said transitioning in (B)(2) occurs based on an action associated with said at least one other device.
Regarding claim 35, Kamhi and Peevers teach the method of claim 34, wherein said transitioning in (B)(2) is triggered by said action associated with said other device (¶0037 of Kamhi “Once content augmentation environment has received such input from the user, the content augmentation environment may cause the second portion of textual content to be rendered on the display device of computing device 302 and may also cause a new digital scene associated with the second portion of textual content to be incorporated with the real-time video feed into a new augmentation video feed 406. As depicted, the real-time video feed may not change unless there is a change to the orientation of the camera capturing the video feed. As such, augmentation video feed 406 includes table 314 from the real-time video feed incorporated with the new digital scene, depicted here as dolphin 408 jumping out of water 410.”; ¶[0160 of Peevers “ Some of the above actions (for example, NEXTPAGE) might be initiated by any of the participants. A filtering/interlock mechanism precludes the various users' devices from getting out of synchrony. When a page change is requested locally, the command is immediately broadcast to all other participants. When a remote device receives this command, it will temporarily lock out any locally (to that device) generated page-change requests until it receives a PAGECHANGECOMPLETE message from the initiating device. The remote devices then enacts the command (e.g. turn to the next page), and then sends an acknowledgement (PAGECHANGEACKNOWLEDGE) message back to the initiating device. The page on the local (initiating) device is not changed until all remote devices have acknowledged receipt of the page-turn command. The local page is turned, and a PAGECHANGECOMPLETE message is broadcast. When remote devices receive this message, they are again free to respond to locally generated commands.”) In addition, the same motivation is used as the rejection for claim 24.
5.	Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Kamhi et al, U.S Patent Application No. 20160065860 (“Kamhi”) in view of Thankavel, U.S Patent Application No. 20160217699 (“Thankavel”) further in view of Swaminathan et al, U.S Patent Application No. 20140168056 (“Swaminathan”)
Regarding claim 20, Kamhi and Thankavel teach the method of claim 12, wherein the at least one camera is associated with said device (¶0024 “While computing device 102 is depicted herein as a tablet, it will be appreciated that this is merely for illustrative purposes. Computing device 102 may take the form of any type of portable or stationary computing device, such as, but not limited to, a smart phone, tablet, laptop, desktop, kiosk, or wearable computing devices such as, for example, Google Glass. Any computing device capable of carrying out the processes described herein is contemplated by this disclosure.”) Both Kamhi and Thankayel do not teach the device has two camera
However, Swaminathan teaches wherein the at least one second camera is associated with said device (¶0038 “In recent times, many mobile devices have multiple cameras--a front facing camera, back-facing camera, etc. and many more cameras may be included in mobile devices going into the future. In most augmented reality applications, while the front facing camera is looking at the image target, a back-facing camera may point to the user who is operating the mobile device. For example, the back-facing camera can capture images of the user's eyes, which can be used to determine a location on the display screen that is the current object under the user's gaze. This functionality is generally referred to as eye gaze tracking. Eye gaze tracking can be used to evaluate a user's interest in an image on the display screen.”)
Therefore, in combination of Kamhi and Thankavel, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify computing device for adapting the digital scene, based at least in part on a real-time video feed, to be rendered on the one or more display devices to augment the textual content may be a page from an electronic book of Kamhi with mobile devices have multiple cameras as seen in Swaminathan because this modification would look at the image target by the front facing camera while a back-facing camera may point to the user who is operating the mobile device (¶0038)
Thus, the combination of Kamhi, Thankavel and Swaminathan teaches wherein the at least one second camera is associated with said device.
6.	Claim 36 is rejected under 35 U.S.C. 103 as being unpatentable over Kamhi et al, U.S Patent Application No. 20160065860 (“Kamhi”) in view of Billinghurst, Mark, Hirokazu Kato, and Ivan Poupyrev. "The magicbook-moving seamlessly between reality and virtuality." IEEE Computer Graphics and applications 21.3 (2001): 6-8. (“Billingburst”)
Regarding claim 36, Kamhi teaches the method of claim 26, wherein the captured scene and wherein the particular event 10rendered in (B)(1) provides a view of the space (¶0035 as shown in Fig.4 “FIG. 4 is an illustrative depiction of navigation from a first portion of textual content 304 to a second portion of textual content 402. FIG. 4 continues from FIG. 3 and as a result some of the same reference numbers are utilized therein. As depicted, computing device 302 may begin with a rendering of a first portion of textual content 304 along with an augmentation video feed 306 depicting a digital scene, associated with the first portion of textual content 304, incorporated with a real-time video feed capture by the camera integrated with computing device 302. [0036] In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner. Where a boat 312 on table scene and transition a dolphin jumping out of water on table scene) Kamhi is understood to be silent on the remaining limitations of claim 36.
In the same field of endeavor, Billinghurst teaches wherein the captured scene comprises a unified space, and wherein the particular event 10rendered in (B)(1) provides a view of the unified space ( see section MagicBook interface, paragraphs three and fourth “Real books often serve as the focus for face-to-face collaboration and in a similar way multiple people can use the MagicBook interface at the same time. Several readers can look at the same book and share the story together. If they’re using the augmented reality displays, they can each see the virtual models from their own viewpoint. Since they can see each other at the same time as the virtual models, they can easily communicate using normal face-to-face conversation cues. Multiple users can immerse in the same virtual scene where they’ll see each other represented as virtual characters (Figure 3a). More interestingly, one or more people may immerse themselves in the virtual world while  others view the content as an augmented reality scene. In this case, those viewing the augmented reality scene will see a miniature avatar of the immersive user in the virtual world (Figure 3b). In the immersive world, people viewing the augmented reality scene appear as large, virtual heads looking down from the sky. This way, people are always aware of the other users of the interface and where they are looking.”)
Therefore, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify computing device for adapting the digital scene, based at least in part on a real-time video feed, to be rendered on the one or more display devices to augment the textual content may be a page from an electronic book of Kamhi with multiple users look at the same book and see the virtual models from their own view as seen in Billinghurst because this modification would allow users to see virtual objects appearing on the pages of the book from their own viewpoint (see section MagicBook interface, paragraph six of Billinghurst)
Thus, the combination of Kamhi and Billinghurst teaches wherein the captured scene comprises a unified space, and wherein the particular event 10rendered in (B)(1) provides a view of the unified space.
7.	Claim 41 is rejected under 35 U.S.C. 103 as being unpatentable over Kamhi et al, U.S Patent Application No. 20160065860 (“Kamhi”) in view of Thankavel, U.S Patent Application No. 20160217699 (“Thankavel”) further in view of Peevers et al, U.S Patent Application No. 20140192140 (“Peevers”) further in view of Billinghurst, Mark, Hirokazu Kato, and Ivan Poupyrev. "The magicbook-moving seamlessly between reality and virtuality." IEEE Computer Graphics and applications 21.3 (2001): 6-8. (“Billingburst”)
Regarding claim 41, Kamhi teaches a method comprising: (A) capturing a scene from a first camera associated with a first device having a first display, the scene comprising a live view of a real-world physical environment(¶0028 “In some embodiments, plane calculation module 110 may receive real-time video feed 202 and may calculate one or more planes contained within real-time video feed 202 (e.g., the plane created by table 206). This may be accomplished through any conventional process, for example, by utilizing depth and color information contained within real-time video feed and captured by a camera (e.g., camera 104 of FIG. 1).”); (B) for a story comprising a plurality of events(¶0020 “In embodiments, the portion of textual content may be associated with a digital scene (e.g., the digital scene depicted in augmentation video scene 114). This association of the portion of textual content with the digital scene may take any suitable form. For instance, the association may be contained in metadata associated with either or both of the portion of textual content or the digital scene; the association may be made via a relational database that relates the portion of textual content to the digital scene; the association may be made by packaging the digital scene and the portion of textual content into a single file; or any other suitable manner of association. In embodiments where the portion of textual content is associated with the digital scene by being packaged into a single file, the single file may contain additional portions of textual content along with additional digital scenes, respectively associated with the additional portions of textual content. For example, if the textual content is a digital book, then the portions of textual content may correspond with chapters, pages, or passages of the digital book and each of the chapters, pages, or passages may be individually associated with respective digital scenes which may all be contained within a single file. The digital scene may include static images and/or animated images to augment the portion of textual content.” Where digital book has textual content corresponds with pages, chapters, or passages, digital scenes), 20(B)(1) rendering a particular event of said plurality of events on said first display, wherein said rendering of said event augments the scene captured by said first camera(¶0035 “FIG. 4 is an illustrative depiction of navigation from a first portion of textual content 304 to a second portion of textual content 402. FIG. 4 continues from FIG. 3 and as a result some of the same reference numbers are utilized therein. As depicted, computing device 302 may begin with a rendering of a first portion of textual content 304 along with an augmentation video feed 306 depicting a digital scene, associated with the first portion of textual content 304, incorporated with a real-time video feed capture by the camera integrated with computing device 302.” where A boat 312 of the digital scene incorporated with a real-time video feed capture is table 314.); and (B)(2) transitioning to a next event of said plurality of events (¶0036 “In embodiments, content augmentation environment, or a module therein, may be configured to accept input from a user of computing device 302 to navigate to a second portion of textual content 404. In such embodiments, the user may navigate to the second portion of textual content 404 by, for example, interacting with a portion of the display device of computing device 302, such as portion 402; through the use of a table of contents, index, or the like where the user may select the second portion of textual content 404 from a list of various portions of the textual content; or in any other suitable manner.”), wherein said rendering of said event also augments the scene with 25information associated with device(¶0037 “Once content augmentation environment has received such input from the user, the content augmentation environment may cause the second portion of textual content to be rendered on the display device of computing device 302 and may also cause a new digital scene associated with the second portion of textual content to be incorporated with the real-time video feed into a new augmentation video feed 406. As depicted, the real-time video feed may not change unless there is a change to the orientation of the camera capturing the video feed. As such, augmentation video feed 406 includes table 314 from the real-time video feed incorporated with the new digital scene, depicted here as dolphin 408 jumping out of water 410.”), wherein the captured scene, and wherein the particular event rendered in (B)(1) provides a view of the space (¶0035 as shown in Fig.4 “FIG. 4 is an illustrative depiction of navigation from a first portion of textual content 304 to a second portion of textual content 402. FIG. 4 continues from FIG. 3 and as a result some of the same reference numbers are utilized therein. As depicted, computing device 302 may begin with a rendering of a first portion of textual content 304 along with an augmentation video feed 306 depicting a digital scene, associated with the first portion of textual content 304, incorporated with a real-time video feed capture by the camera integrated with computing device 302. Where a boat 312 on table scene and transition a dolphin jumping out of water on table scene) Kamhi is understood to be silent on the remaining limitations of claim 41.
In the same field of endeavor, Thankayel wherein said rendering of said event also augments the scene with 25information associated with at least one other device (¶0052 “FIG. 17 is a sixteenth perspective view of the AR-Book application activating a video, 2D or a 3D graphics 35 based on each page's content. The AR-Book application 12 activates a 3D graphic or video 35 based on each page's content, from the first page until the last page, displaying the participant's avatars into the story line of each page for total immersion into the story line. The video or 3D interactive animated objects appear on the page in the form of a short clip played in that page only. This applies to a respective page, which has Augmented Reality application and every page may or may not have Augmented Reality application and these pages are decided when the book is designed. When the user activates the video mode, it is supported by an audio to play in real-time and the user interactions including hand, facial and eye movements are interactive with the audio.” Where displaying the participant’s avatars into the story line of each page is considered with information associated with at least one other device) and Application No. 16/675,196Docket 4062-0008-US Response to Restriction Requirement Page 12 of 14 wherein said information associated with said at least one other device corresponds to on one or more of: (i) an image captured by said at least one other device; and/or (ii) an image representing or corresponding to said at least one other 5device, and/or (iii) audio from said at least one other device0038 of Thankayel “ FIG. 4 (a and b) shows a third perspective view of the AR-Book application displaying the face silhouette 14. The AR-Book application database 12 displays a participant's name and a silhouette of a face 14 and asks the user 3 to put their face into the silhouette to take a picture or upload their face (FIG. 4b).” ¶0049 of Thankayel” FIG. 15 (a and b) shows a fourteenth perspective view of the AR-Book application 12, which allows the user 3 to select and save the selected avatar 29 or select a different avatar 28, 22, 27. The process repeats the steps of displaying avatars 21 for the user 3 to select an avatar for each participant (FIG. 11), allowing the user 3 to select the desired avatar 22 for each participant (FIG. 12), attaching participant's face (either real or caricature) to the head of the avatar and displaying 26 (FIG. 13), displaying the text to ask the users 3 if the they want to keep the selected avatar 27 or re-select the face 28a (FIG. 14), allowing the user to select and save the selected avatar 29 or select a different avatar 28, 22, 27 (FIG. 15) until an avatar for each participant is accepted and saved by the user. ¶0052 if Thankayel “FIG. 17 is a sixteenth perspective view of the AR-Book application activating a video, 2D or a 3D graphics 35 based on each page's content. The AR-Book application 12 activates a 3D graphic or video 35 based on each page's content, from the first page until the last page, displaying the participant's avatars into the story line of each page for total immersion into the story line. The video or 3D interactive animated objects appear on the page in the form of a short clip played in that page only. This applies to a respective page, which has Augmented Reality application and every page may or may not have Augmented Reality application and these pages are decided when the book is designed. When the user activates the video mode, it is supported by an audio to play in real-time and the user interactions including hand, facial and eye movements are interactive with the audio.”), and wherein said image representing or corresponding to said at least one other device comprises an avatar(¶0052 if Thankayel “FIG. 17 is a sixteenth perspective view of the AR-Book application activating a video, 2D or a 3D graphics 35 based on each page's content. The AR-Book application 12 activates a 3D graphic or video 35 based on each page's content, from the first page until the last page, displaying the participant's avatars into the story line of each page for total immersion into the story line. The video or 3D interactive animated objects appear on the page in the form of a short clip played in that page only. This applies to a respective page, which has Augmented Reality application and every page may or may not have Augmented Reality application and these pages are decided when the book is designed. When the user activates the video mode, it is supported by an audio to play in real-time and the user interactions including hand, facial and eye movements are interactive with the audio.”) In addition, the same motivation is used as the rejection for claim 40. Both Kamhi and Thankavel are understood to be silent on the remaining limitations of claim 41.
In the same field of endeavor, Peevers teaches wherein said image representing or corresponding to said at least one other 10device is animated, at least in part, by manipulation and/or movement of the at least one other device (¶0107 of Peevers “In addition to augmentation effects to a reader's voice, touching on a particular object may cause the object to be modified in some manner. For example, if the reader touches on a particular actor in a story, not only would the reader's voice be morphed to sound like the actor, but the actor could also be animated so that their mouth and face move mirroring that of the reader's. This can be accomplished by processing the video signal of the reader as captured by an associated video camera to create a model that can be used to drive the actor's presentation in the electronic book. For example, a three-dimensional mesh can be algorithmically fit to a reader's face to track their facial features and position in real-time. This information can then be used as a model to drive the actor's presentation in electronic book. This approach can be the same as or similar to that used in Microsoft's Kinect for Windows.”; ¶0177 of Peevers “As previously described, detection of various events can cue the user when aspects of the story can be personalized, modified, and/or customized. Responsive to these cues, a user can personalize the story through, among other things, modifying video capture and embedding the modified video into the story. In some cases, the video capture can be automatically analyzed and/or manually marked for various features and/or gestures related to telling the story. For instance, consider FIG. 19, which illustrates enhanced interactive story 1902. In this example, video capture image 1506 is augmented and embedded into enhanced interactive story 1902 in two separate ways. Augmented video 1904 represents a rotoscoped image associated with video capture image 1506. Here, video capture image 1506 has been filtered with a rotoscope filter effect to transfer the associated face into the "cartoon world" as described above. In addition to applying the rotoscope filter as an augmentation process, the modified image is superimposed upon a cartoon body of a flower. In some embodiments, augmented video 1904 can be a still image associated with the video, while in other embodiments augmented video 1904 can be a series of images. Alternately or additionally, facial features detected in video capture image 1506 can drive facial changes associated with a cartoon contained within the story”; ¶0179 of Peevers “A user can choose to incorporate video into a story experience in several ways. Some embodiments notify and/or cue the user of potential opportunities for video insertion and/or augmentation before, during, or after the reading process, examples of which are provided above. In some cases, the user may select a character from a list of available characters within the story to supplement, augment, or replace with video capture. This can also be done automatically. For example, any time the reader reads a quote from Elmo, the reader's voice is morphed to sound like Elmo, and the picture of Elmo in the electronic story is animated accordingly to the facial expressions of the reader. Alternately or additionally, selecting a character or cue notification by the user can activate a camera and/or the video capture process. In addition to notifying a user of potential augmentation opportunities, some embodiments enable the user to select how the video capture is processed, filtered, analyzed, and so forth. In other embodiments, when opportunities for video insertion and/or augmentation are detected, the video insertion and/or augmentation can occur automatically. For example, using the above example of Elmo, when Elmo's voice is detected as being read, video capture can be analyzed for gestures, which can be subsequently used to automatically animate an image of Elmo in the electronic story. In this manner, the story experience can be personalized by all participants associated with the story. It can additionally be noted that the video processing and/or augmentation can occur at any suitable device within the system, such as a device associated with capturing the video, a server device configured to store a composite story experience, and/or a receiving device”) In addition, the same motivation is used as the rejection for claim 40. Kamhi, Thankavel and Peevers are understood to be silent on the remaining limitations of claim 40.
In the same field of endeavor, Billinghurst teaches wherein the captured scene comprises a unified space, and wherein the particular event rendered in (B)(1) provides a view of the unified space ( see section MagicBook interface, paragraphs three and fourth “Real books often serve as the focus for face-to-face collaboration and in a similar way multiple people can use the MagicBook interface at the same time. Several readers can look at the same book and share the story together. If they’re using the augmented reality displays, they can each see the virtual models from their own viewpoint. Since they can see each other at the same time as the virtual models, they can easily communicate using normal face-to-face conversation cues. Multiple users can immerse in the same virtual scene where they’ll see each other represented as virtual characters (Figure 3a). More interestingly, one or more people may immerse themselves in the virtual world while  others view the content as an augmented reality scene. In this case, those viewing the augmented reality scene will see a miniature avatar of the immersive user in the virtual world (Figure 3b). In the immersive world, people viewing the augmented reality scene appear as large, virtual heads looking down from the sky. This way, people are always aware of the other users of the interface and where they are looking.”)
Therefore, in combination of Kamhi, Thankavel and Peevers, it would have been obvious to a person of ordinary skill in the art at the time of invention to modify computing device for adapting the digital scene, based at least in part on a real-time video feed, to be rendered on the one or more display devices to augment the textual content may be a page from an electronic book of Kamhi with multiple users look at the same book and see the virtual models from their own view as seen in Billinghurst because this modification would allow users to see virtual objects appearing on the pages of the book from their own viewpoint (see section MagicBook interface, paragraph six of Billinghurst)
Thus, the combination of Kamhi, Thankavel,  Peevers and Billingburst teaches a method comprising: (A) capturing a scene from a first camera associated with a first device having a first display, the scene comprising a live view of a real-world physical environment; (B) for a story comprising a plurality of events, 20(B)(1) rendering a particular event of said plurality of events on said first display, wherein said rendering of said event augments the scene captured by said first camera; and (B)(2) transitioning to a next event of said plurality of events, wherein said rendering of said event also augments the scene with 25information associated with at least one other device, and Application No. 16/675,196Docket 4062-0008-US Response to Restriction Requirement Page 12 of 14 wherein said information associated with said at least one other device corresponds to on one or more of: (i) an image captured by said at least one other device; and/or (ii) an image representing or corresponding to said at least one other 5device, and/or (iii) audio from said at least one other device, and wherein said image representing or corresponding to said at least one other device comprises an avatar, and wherein said image representing or corresponding to said at least one other 10device is animated, at least in part, by manipulation and/or movement of the at least one other device, wherein the captured scene comprises a unified space, and wherein the particular event rendered in (B)(1) provides a view of the unified space.


Contact

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARAH LE whose telephone number is (571)270-7842. The examiner can normally be reached Monday: 8AM-4:30PM EST, Tuesday: 8 AM-3:30PM EST, Wednesday: 8AM-2:30PM EST, Thursday and Friday off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SARAH LE/Primary Examiner, Art Unit 2619