DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Response to Amendment
The Applicant’s response filed on 05/11/2021 have been acknowledged and entered for consideration. Claims 3-7, 9, 12, 15-19 have been cancelled. New claims 21-28 have been added. Claims 1-2, 8, 10-11, 13-14, 20-28 are pending in the current application. The Applicant’s amendments are in response to the Non-Final Office Action mailed on 11/13/2020.

Claim Objections
Claim 11 is objected to because of the following informalities: 
Claim 11 recites “wherein said displaying said least one caption to the user”. The Examiner suggests replacing this with “wherein said displaying said at least one caption to the user”. 
Appropriate correction is required.


Claim Rejections - 35 USC § 103

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 8, 10-11, 13-14, 20-28 are rejected under 35 U.S.C. 103 as being unpatentable over Kondo et al. (US PGPub 2009/0310021 A1) in view of Meira et al. (“Video Annotation for Immersive Journalism using Masking Techniques”) (See attached NPL1.pdf for mapping).

Regarding claim 1 (Currently Amended), Kondo et al. teach a method of adding caption information to a multi-view three-dimensional video displayed on a head-mounted display (HMD) being worn by a user ([0082]; [0123]; Fig. 6 shows the caption adding process for three-dimensional video rendering. See also [0069], [0083] for multi-view program viewing in a virtual environment), wherein the video has a duration ([0070]; Fig. 9 shows the different durations of different video segments), and wherein, at a given time in the duration of the video, the user may view multiple distinct scenes of the video ([0069]-[0070]; Fig. 9; Fig. 3 shows the multiple views and the corresponding captions that a viewer may view), and wherein the video comprises a real-time-rendered ([0070]; Fig. 3 shows the captions C0, C1, C2, etc. associated with the corresponding video images Im0, Im1, Im2, etc., which means the captions are displayed in real-time along with the display of the corresponding video images) 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space, the method comprising: 
(A) associating multiple distinct viewing directions or regions with the multi-view three-dimensional video (Fig. 3, Fig. 23; These drawings show multiple distinct viewing regions, e.g., Im0, Im1, Im2. In Fig. 24, it shows two distinct regions of two different programs 191, 192 and described in [0190]-[0191]); 
(B) using at least said HMD, determining, in real time, an orientation and/or movement of the user relative to the VR or AR space ([0082]-[0083]; Fig. 5 shows the block diagram of associating different captions with different images at different viewpoints at different regions of the virtual space. See also Fig. 20 where it shows the association of captions with corresponding images and their durations during the video sequence as described in [0167]-[0168]); and
(C) based on an orientation and/or movement of the user relative to the VR or AR space, as determined in (B), in real time, selectively displaying, on the HMD, (i) a first one or more captions or (ii) a second one or more captions, wherein said first one or more captions are associated with a first viewing direction or region in the VR or AR space, and wherein said second one or more captions are associated with a second viewing direction or region in said VR or AR space ([0082]-[0083]; Fig. 5 shows the block diagram of associating different captions with different images at different viewpoints at different regions of the virtual space. See also Fig. 20 where it shows the association of captions with corresponding images and their durations during the video sequence as described in [0167]-[0168]), said first viewing direction or region being distinct from said second viewing direction or region (Figs. 3, 23, 24 show that the regions for Im0, Im1, Im2 or regions 191, 192 are different).
Although, Kondo et al. teach associating different captions with different images at different viewpoints at different regions of the virtual space, but it does not explicitly teach that the multi-view 3-D video is displayed on a head-mounted display (HMD) being worn by a user, the video comprises a real-time-rendered 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space and based on an orientation and/or movement of the user relative to the VR or AR space, as determined in (B), in real time, selectively displaying, on the HMD, (i) a first one or more captions or (ii) a second one or more captions, wherein said first one or more captions are associated with a first viewing direction or region in the VR or AR space, and wherein said second one or more captions are associated with a second viewing direction or region in said VR or AR space, wherein said first viewing direction or region being distinct from said second viewing direction or region.
However, Meira et al. teach a system in the same field of endeavor (Abstract), where it teaches rendering of multi-view 3-D video on a head-mounted display (HMD) being worn by a user (Meira et al.; Page 2, Col. 1, 1st Paragraph of Section A on Virtual Reality), wherein the video comprises a real-time-rendered (Meira et al.; Page 3, Col. 2, Paragraph prior to Section IV) 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space (Meira et al.; Page 2, Col. 1, Paragraph prior to Section A on Virtual Reality) and based on an orientation and/or movement of the user relative to the VR or AR space, as determined in (B), in real time,  (Meira et al.; Page 2, Col. 2, Paragraph prior to Section C; It teaches viewport motion control technique like gaze-directed steering. Page 3, Col. 1, 2nd Paragraph under Section III; it teaches dynamic information which follows the user’s visual field) selectively displaying, on the HMD, (i) a first one or more captions or (ii) a second one or more captions (See Figs. 1, 2), wherein said first one or more captions are associated with a first viewing direction or region in the VR or AR space (Meira et al.; Page 3, Col. 2, Billboards. In Fig. 1 it shows different captions or added texts on top of the VR display associated with the first FOV of the user), and wherein said second one or more captions are associated with a second viewing direction or region in said VR or AR space (Meira et al.; In Fig. 2 it shows a different caption or added text on top of the VR display associated with the second FOV of the user), wherein said first viewing direction or region being distinct from said second viewing direction or region (Meira et al.; Fig. 1 and Fig. 2 FOVs are distinct from each other as Fig. 1 FOV contains the Dubai fountain and the city skyscrapers, wherein Fig. 2 FOV shows the Burj Khalifa tower).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine Kondo et al’s invention of time sequence generation of video images along with associated captions to include Meira et al's usage of 3600 VR/AR capable HMD, because innovations in both ends of the chain (creation and consumption) will foster better, more effective, more engaging and more economically attractive applications (Meira et al.; Page 1, Col. 2, Last Paragraph) as well as it is possible to mitigate ambiguities regarding location information in the space of a 360◦video, as well as improving general comprehension of the video content by the user (Meira et al.; Page 3, Col. 1, 2nd Paragraph under Section III).

Regarding claim 2 (Currently Amended), Kondo et al. and Meira et al. teach the method of claim 1 further comprising: 
repeating acts (B) and (C) multiple times during the video (Kondo et al.; It is seen from Figs. 7, 9, 20 that the associating the captioning with the corresponding images are repeated for different time durations. In addition, Fig. 15 shows the repetition of the steps of (B) in steps S33-S37. Meira et al. also show that the transition from Fig. 1 to Fig. 2 requires the execution of the Page 3, Col. 2, 2nd Paragraph under Section IV).  

Regarding claim 8 (Currently Amended), Kondo et al. teach a method comprising: 
(A) displaying a multi-view three-dimensional video to a user ([0069]-[0070]; Fig. 9; Fig. 3 shows the multiple views and the corresponding captions that a viewer may view) on a head- mounted display (HMD) being worn by the user, wherein the video comprises a real-time-rendered ([0070]; Fig. 3 shows the captions C0, C1, C2, etc. associated with the corresponding video images Im0, Im1, Im2, etc., which means the captions are displayed in real-time along with the display of the corresponding video images) 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space, and 
wherein scenes from the video are displayed to the user based on a direction the user is viewing in the VR space or AR space (Fig. 25; [0200]; It discloses changing the orientation of viewing around one’s surrounding area to view a different image sequence with caption from a different viewing lane); and 
(B) selectively displaying, on the HMD and in real time, at least one caption to the user (Fig. 3 shows the display of at least one caption C0 associated with the video image Im0 at a particular time as described in [0070]), wherein the at least one caption is displayed to the user based on the direction the user is viewing in the VR space or AR space (Fig. 25; [0200]; It discloses changing the orientation of viewing around one’s surrounding area to view a different image sequence with caption from a different viewing lane).
Although, Kondo et al. teach associating different captions with different images at different viewpoints at different regions of the virtual space, but it does not explicitly teach that on a head- mounted display (HMD) being worn by the user, wherein the video comprises a real-time-rendered 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space and wherein the direction the user is viewing was determined using at least the head-mounted display and based on an orientation of the user's head and on prior movements of the user relative to the VR or AR space.
However, Meira et al. teach a system in the same field of endeavor (Abstract), where it teaches rendering of multi-view 3-D video on a head- mounted display (HMD) being worn by the user (Meira et al.; Page 2, Col. 1, 1st Paragraph of Section A on Virtual Reality), wherein the video comprises a real-time-rendered (Meira et al.; Page 3, Col. 2, Paragraph prior to Section IV) 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space (Meira et al.; Page 2, Col. 1, Paragraph prior to Section A on Virtual Reality) and wherein the direction the user is viewing was determined using at least the head-mounted display and based on an orientation of the user's head and on prior movements of the user relative to the VR or AR space (Meira et al.; Page 2, Col. 2, Paragraph prior to Section C; It teaches viewport motion control technique like gaze-directed steering. Page 3, Col. 1, 2nd Paragraph under Section III; it teaches dynamic information which follows the user’s visual field).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine Kondo et al’s invention of time sequence generation of video images along with associated captions to include Meira et al's usage of 3600 VR/AR capable HMD, because innovations in both ends of the chain (creation and consumption) will foster better, more effective, more engaging and more economically attractive applications (Meira et al.; Page 1, Col. 2, Last Paragraph) as well as it is possible to mitigate ambiguities regarding location information in the space of a 360◦video, as well as improving general  (Meira et al.; Page 3, Col. 1, 2nd Paragraph under Section III).

Regarding claim 10 (Currently Amended), Kondo et al. and Meira et al. teach the method of claim 8, wherein, in response to the user changing their viewing orientation in the VR or AR space, displaying a second at least one caption to the user (Kondo et al.; Fig. 25; [0200]; It teaches changing the orientation of viewing around one’s surrounding area to view a different image sequence with caption from a different viewing lane. Meira et al. also teach the same limitation that the transition from Fig. 1 to Fig. 2 requires the execution of the steps of annotating (.srt file update) and rendering (displaying) for different FOVs as explained in Page 3, Col. 2, 2nd Paragraph under Section IV ).  

Regarding claim 11 (Currently Amended), Kondo et al. and Meira et al. teach the method of claim 8, wherein said displaying said least one caption to the user is based on a speed and/or rate at which the user changes their viewing orientation in the VR or AR space (Kondo et al.; [0246]; Fig. 32; It shows the scrolling speed of the captioned images which is analogous to the speed of changing the viewing lane. Meira et al. in Page 2, Col. 2, Paragraph prior to Section C, teaches viewport motion control technique like gaze-directed steering).  

Regarding claim 13 (Currently Amended), Kondo et al. teach an article of manufacture comprising a non-transitory computer-readable medium having program instructions stored thereon ([0348]-[0349]), the program instructions (Fig. 49, reference numeral 602, 603), operable on a computer system (Fig. 49, reference numeral 600), wherein execution of the program instructions by one or more processors of said computer system (Fig. 49, reference numeral 601) causes the one or more processors to carry out the acts of: 
(A) associating multiple distinct viewing directions or regions with a multi-view three-dimensional video (Fig. 3, Fig. 23; These drawings show multiple distinct viewing regions, e.g., Im0, Im1, Im2. In Fig. 24, it shows two distinct regions of two different programs 191, 192 and described in [0190]-[0191]) displayed on a head-mounted display (HMD) being worn by a user, wherein the video comprises a real-time-rendered ([0070]; Fig. 3 shows the captions C0, C1, C2, etc. associated with the corresponding video images Im0, Im1, Im2, etc., which means the captions are displayed in real-time along with the display of the corresponding video images) 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space; 
(B) using at least said HMD being worn by the user, determining, in real time ([0070]; Fig. 3 shows the captions C0, C1, C2, etc. associated with the corresponding video images Im0, Im1, Im2, etc., which means the captions are displayed in real-time along with the display of the corresponding video images), an orientation and/or movement of the user relative to the VR or AR space ([0082]-[0083]; Fig. 5 shows the block diagram of associating different captions with different images at different viewpoints at different regions of the virtual space. See also Fig. 20 where it shows the association of captions with corresponding images and their durations during the video sequence as described in [0167]-[0168]); and
(C) based on an orientation and/or movement of the user relative to the VR or AR space, as determined in (B), in real time, selectively displaying, on the HMD, (i) a first one or more captions or (ii) a second one or more captions, wherein said first one or more captions are associated with a first viewing direction or region and wherein said second one or more captions are associated with a second viewing direction or region ([0082]-[0083]; Fig. 5 shows the block Fig. 20 where it shows the association of captions with corresponding images and their durations during the video sequence as described in [0167]-[0168]), said first viewing direction or region being distinct from said second viewing direction or region (Figs. 3, 23, 24 show that the regions for Im0, Im1, Im2 or regions 191, 192 are different). 
Although, Kondo et al. teach associating different captions with different images at different viewpoints at different regions of the virtual space, but it does not explicitly teach that the multi-view 3-D video is displayed on a head-mounted display (HMD) being worn by a user, wherein the video comprises a real-time-rendered 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space and using at least said HMD being worn by the user, determining, in real time, an orientation and/or movement of the user relative to the VR or AR space, and (C) based on an orientation and/or movement of the user relative to the VR or AR space, as determined in (B), in real time, selectively displaying, on the HMD, (i) a first one or more captions or (ii) a second one or more captions, wherein said first one or more captions are associated with a first viewing direction or region and wherein said second one or more captions are associated with a second viewing direction or region.
However, Meira et al. teach a system in the same field of endeavor (Abstract), where it teaches rendering of multi-view 3-D video on a head-mounted display (HMD) being worn by a user (Meira et al.; Page 2, Col. 1, 1st Paragraph of Section A on Virtual Reality), wherein the video comprises a real-time-rendered (Meira et al.; Page 3, Col. 2, Paragraph prior to Section IV) 360-degree video in a virtual reality (VR) space or an augmented reality (AR) space (Meira et al.; Page 2, Col. 1, Paragraph prior to Section A on Virtual Reality) and using at least said HMD being worn by the user, determining, in real time, an orientation and/or movement of the user relative to the VR or AR space (Meira et al.; Page 2, Col. 2, Paragraph prior to Section C; It teaches viewport motion control technique like gaze-directed steering. Page 3, Col. 1, 2nd Paragraph under Section III; it teaches dynamic information which follows the user’s visual field), and (C) based on an orientation and/or movement of the user relative to the VR or AR space, as determined in (B), in real time, selectively displaying, on the HMD, (i) a first one or more captions or (ii) a second one or more captions (See Figs. 1, 2), wherein said first one or more captions are associated with a first viewing direction or region (Meira et al.; Page 3, Col. 2, Billboards. In Fig. 1 it shows different captions or added texts on top of the VR display associated with the first FOV of the user) and wherein said second one or more captions are associated with a second viewing direction or region (Meira et al.; In Fig. 2 it shows a different caption or added text on top of the VR display associated with the second FOV of the user).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine Kondo et al’s invention of time sequence generation of video images along with associated captions to include Meira et al's usage of 3600 VR/AR capable HMD, because innovations in both ends of the chain (creation and consumption) will foster better, more effective, more engaging and more economically attractive applications (Meira et al.; Page 1, Col. 2, Last Paragraph) as well as it is possible to mitigate ambiguities regarding location information in the space of a 360◦video, as well as improving general comprehension of the video content by the user (Meira et al.; Page 3, Col. 1, 2nd Paragraph under Section III).

14 (Currently Amended), Kondo et al. and Meira et al. teach the article of manufacture of claim 13 wherein execution of the program instructions by one or more processors of said computer system causes the one or more processors to carry out the acts of: 
repeating acts (B) and (C) multiple times during the video (Kondo et al.; It is seen from Figs. 7, 9, 20 that the associating the captioning with the corresponding images are repeated for different time durations. In addition, Fig. 15 shows the repetition of the steps of (B) in steps S33-S37. Meira et al. also show that the transition from Fig. 1 to Fig. 2 requires the execution of the steps of annotating (.srt file update) and rendering (displaying) for different FOVs as explained in Page 3, Col. 2, 2nd Paragraph under Section IV).  

Regarding claim 20 (Original), Kondo et al. and Meira et al. teach a device constructed and adapted to perform the method of claim 1 (Kondo et al.; [0071]; Fig. 4).

Regarding claim 21 (NEW), Kondo et al. and Meira et al. teach the method of claim 1, wherein at least some of the captions comprise graphical information (Kondo et al.; Fig. 23, reference numeral 180. Meira et al., Fig. 1, 2 shows dots and arrows as graphics in addition to texts. Also, see Page 3, Col. 2, Paragraph 1).  

Regarding claim 22 (NEW), Kondo et al. and Meira et al. teach the method of claim 1, wherein the determining in (B) determines orientation and/or movement of the user's head in the VR or AR space (Meira et al.; Page 2, Col. 2, Paragraph prior to Section C, teaches viewport motion control technique like gaze-directed steering. Page 1, Col. 2, Paragraph 1 teaches taking into consideration of user’s attention focus (eye-tracking) and field of view (head motion)).  

Regarding claim 23 (NEW), Kondo et al. and Meira et al. teach the method of claim 1, wherein the displaying in (C) is based at least on a speed and/or rate of movement of the user relative to the VR or AR space (Kondo et al.; [0246]; Fig. 32; It shows the scrolling speed of the captioned images which is analogous to the speed of changing the viewing lane. Meira et al.; Page 2, Col. 2, Paragraph prior to Section C; It teaches viewport motion control technique like gaze-directed steering. Page 1, Col. 2, Paragraph 1 teaches taking into consideration of user’s attention focus (eye-tracking) and field of view (head motion)).  

Regarding claim 24 (New), Kondo et al. and Meira et al. teach the method of claim 1, wherein the multiple distinct viewing regions correspond to multiple distinct viewing directions in the VR or AR space (Kondo et al.; Fig. 25; [0200]; It discloses changing the orientation of viewing around one’s surrounding area to view a different image sequence with caption from a different viewing lane. Meira et al.; Fig. 1 and Fig. 2 FOVs are distinct from each other based on the user’s FOV as Fig. 1 FOV region contains the Dubai fountain and the city skyscrapers, wherein Fig. 2 FOV region shows the Burj Khalifa tower).  

Regarding claim 25 (New), Kondo et al. and Meira et al. teach the method of claim 1, wherein the captions are determined at viewing time (Meira et al.; Page 3, Col. 2, Paragraph prior to Section IV; It teaches real-time annotation meaning during the viewing time. See also Page 2, Col. 2, Paragraph 4 under section B).  

26 (New), Kondo et al. and Meira et al. teach the method of claim 1, wherein at least some of the captions are independent of any objects in the VR or AR space (Meira et al.; Figs. 1, 2 show a text rendering called “FOV” which is independent of any objects in the VR/AR space).  

Regarding claim 27 (New), Kondo et al. and Meira et al. teach the method of claim 1, wherein the one or more first captions are distinct from the one or more second captions (Kondo et al.; Figs. 3, 23, 24 show that the regions for Im0, Im1, Im2 or regions 191, 192 are different. Meira et al.; Fig. 1 and Fig. 2 FOVs are distinct from each other as Fig. 1 FOV contains the Dubai fountain and the city skyscrapers, wherein Fig. 2 FOV shows the Burj Khalifa tower).  

Regarding claim 28 (New), Kondo et al. and Meira et al. teach the method of claim 1, wherein a direction the user is viewing in the VR or AR space was determined using at least the head-mounted display and based on an orientation of the user's head and on prior movements of the user relative to the VR or AR space (Meira et al.; Page 2, Col. 2, Paragraph prior to Section C; It teaches viewport motion control technique like gaze-directed steering. Page 3, Col. 1, 2nd Paragraph under Section III; it teaches dynamic information which follows the user’s visual field).

Response to Arguments
Applicant's arguments filed on 05/11/2021 have been fully considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

The Examiner, in response to the Applicant’s request of entering the prior-arts made of record and not relied upon, hereby includes all the prior-arts in PTO-892 form, although the Examiner follows the practice of entering all relevant prior-arts, used or not, if the application is in condition for allowance.  

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
1. “Enhanced Natural Visual Perception for Augmented Reality-Workstations by Simulation of Perspective” - Rafael Radkowski and James Oliver; JOURNAL OF DISPLAY TECHNOLOGY, VOL. 10, NO. 5, MAY 2014.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JAY PATEL can be reached on (571)272-2988.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Mainul Hasan/
Primary Examiner, Art Unit 2485