DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 9/19/2022 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Huang (US 2021/0076081 A1) in view of Wang (US 2020/0382796 A1). Hereinafter referred as Huang and Wang.
Regarding claims 1, 14 and 20, Huang teaches a decoding and encoding method for decoding video data for immersive media (page 2 paragraph (0035)), the method comprising: accessing immersive media data (the terminal may establish a network connection with the streaming media server through any suitable type of access network and request access to the media segment files (page 3 paragraph (0042))) comprising: a set of tracks, wherein: each track of the set of tracks comprises associated to-be-decoded immersive media data that corresponds to an associated spatial portion of immersive media content that is different than the associated spatial portions of other tracks in the set of tracks (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)); an elementary data track comprising first immersive media elementary data, wherein at least one track of the set of tracks references the elementary data track (a VR video image is projected on a unit sphere, an original point of a global coordinate axis is same as a center point of an audio/video acquisition device and a position of an observer’s head in a three-dimensional space. […] As shown in fig. 2, the position of the center point of a user viewpoint (page 2 paragraph (0038))); grouping data that specifies a spatial relationship among the tracks in the set of tracks in the immersive media content (a VR video projection frame may be segmented into a sub-image sequence or a motion-constrained tile set before being encoded, so that video transmission bandwidth requirements may be reduced or video decoding complexity may be reduced unto a condition of providing same video resolution/quality for the user (page 2 paragraph (0039))); region metadata comprising data that specifies a spatial relationship between a viewing region in the immersive media content and a subset of tracks of the set of tracks (if the sphere region of the recommended viewport video playing viewport viewport#2 in the VR video is different from the sphere region of the current video playing viewport viewport#1, the client requests a media segment, i.e. viewport#2 in one or more video files of the sphere region covered by the video content corresponding to the sphere region of the recommended viewport viewport#2 according to the playing time information of the recommended viewport viewport#2 from the server (page 6 paragraph (0111))), wherein each track in the subset of tracks contributes at least a portion of the visual content of the region (figures 2-3); and performing a decoding operation based on the set of tracks (each motion-constrained tile set sequence serves as a subset of a tile track covering a VR video spatial region, and may be independently decoded and encapsulated into a video file for a streaming transmission (page 2 paragraph (0040)), the elementary data track, the grouping data, and the region metadata to generate decoded immersive media data (the encapsulator encapsulates an original audio and video elementary stream into multiple media segment files with fixed time intervals. In addition, the encapsulator is also responsible for providing index information of the media segment files, such as a media presentation description (MPD) in a dynamic adaptive streaming over hypertext transfer protocol (HTTP) (DASH) (page 2 paragraph (0041)). 
However, Huang is silent in teaching decoding video data for immersive metadata at a client device. Wang teaches on (page 5 paragraph (0067)) the decoder is a device on a user’s location that is configured to reverse the coding process to reconstruct the sub-picture video streams from the encoded bitstreams (also see page 3 paragraph (0050)). Wang further teaches a set of tracks at the client device (page 4 paragraph (0058) and page 3 paragraph 50)).
Therefore, it would have been obvious at the time of the invention to modify Huang’s reference to include the teachings of Wang for decoding video data for immersive metadata at a client device before the effective filing date of the claimed invention. A useful combination is found on Wang (page 1 paragraph (0002)) the present disclosure is generally related to virtual reality (VR), also referred to as omnidirectional media, immersive media, and 360 degree video, and is specifically related to immersive media metrics for virtual reality content with multiple viewpoints.

Regarding claims 2 and 15, Huang and Wang teach the decoding and encoding method of claims 1 and 14. Huang teaches accessing the immersive media data comprises: accessing an immersive media bit-stream (the terminal may establish a network connection with the streaming media server through any suitable type of access network and request access to the media segment files (page 3 paragraph (0042))) comprising: a set of patch tracks, wherein each patch track corresponds to an associated track in the set of tracks (a VR video projection frame may be segmented into a sub-image sequence or a motion-constrained tile set before being encoded, so that video transmission bandwidth requirements may be reduced or video decoding complexity may be reduced unto a condition of providing same video resolution/quality for the user (page 2 paragraph (0039))); and the elementary data track, wherein each patch track in the set of patch tracks references the elementary data track (a VR video image is projected on a unit sphere, an original point of a global coordinate axis is same as a center point of an audio/video acquisition device and a position of an observer’s head in a three-dimensional space. […] As shown in fig. 2, the position of the center point of a user viewpoint (page 2 paragraph (0038))). 
Regarding claims 3 and 16, Huang and Wang teach the decoding and encoding method of claims 1 and 14. Huang teaches accessing the immersive media data comprises: accessing a set of immersive media bitstreams (the terminal may establish a network connection with the streaming media server through any suitable type of access network and request access to the media segment files (page 3 paragraph (0042))), wherein each immersive media bitstream comprises: a track from the set of tracks (a VR video projection frame may be segmented into a sub-image sequence or a motion-constrained tile set before being encoded, so that video transmission bandwidth requirements may be reduced or video decoding complexity may be reduced unto a condition of providing same video resolution/quality for the user (page 2 paragraph (0039))); and an associated elementary data track, wherein the track references the associated elementary data track, such that an immersive media bitstream from the set of immersive media bitstreams comprises the elementary data track (a VR video image is projected on a unit sphere, an original point of a global coordinate axis is same as a center point of an audio/video acquisition device and a position of an observer’s head in a three-dimensional space. […] As shown in fig. 2, the position of the center point of a user viewpoint (page 2 paragraph (0038))). 
Regarding claim 4, Huang and Wang teach the decoding method of claim 1. Huang teaches the region comprises a sub-portion of the viewable immersive media data that is less than a full viewable portion of the immersive media data (figures 2-3 show the sub-portion of viewports 1 and 2 is less than the full viewable portion of the available viewing area). 
Regarding claim 5, Huang and Wang teach the decoding method of claim 1. Huang teaches the region comprises a viewport (viewports 1 and 2 figures 2-3).
Regarding claims 6 and 17, Huang and Wang teach the decoding and encoding method of claims 1 and 14. Huang teaches accessing the region metadata comprises accessing a track grouping box in each track in the set of tracks (each motion-constrained tile set sequence serves as a subset of a tile track covering a VR video spatial region, and may be independently decoded and encapsulated into a video file for a streaming transmission (page 2 paragraph (0040)). 
Regarding claims 7 and 18, Huang and Wang teach the decoding and encoding method of claims 1 and 14. Huang teaches accessing the region metadata comprises accessing a timed metadata track that references the subset of tracks (the encapsulator encapsulates an original audio and video elementary stream into multiple media segment files with fixed time intervals. In addition, the encapsulator is also responsible for providing index information of the media segment files, such as a media presentation description (MPD) in a dynamic adaptive streaming over hypertext transfer protocol (HTTP) (DASH) (page 2 paragraph (0041)).
Regarding claims 8 and 19, Huang and Wang teach the decoding and encoding method of claims 1 and 14. Huang teaches accessing the immersive media data comprises accessing a streaming manifest file that comprises a track representation for each track in the set of tracks (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)).
Regarding claim 9, Huang and Wang teach the decoding method of claim 8. Huang teaches each track representation is associated with a set of component track representations (figures 2 and 3 show each track representation and its association with a set of component). 
Regarding claim 10, Huang and Wang teach the decoding method of claim 8. Huang teaches the streaming manifest file comprises a descriptor that specifies the region metadata (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)).
Regarding claim 11, Huang and Wang teach the decoding method of claim 8. Huang teaches the streaming manifest file comprises a timed metadata representation for a timed metadata track comprising the region metadata (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)).
Regarding claim 12, Huang and Wang teach the decoding method of claim 1. Huang teaches the immersive media content comprises point cloud multimedia (a recommended viewport for playing a virtual reality (VR) video is determined; and one or more video files corresponding to the recommended viewport are requested from a server (page 1 paragraph (0008))). 
Regarding claim 13, Huang and Wang teach the decoding method of claim 1. Huang teaches the elementary data track comprises: at least one geometry track comprising geometry data of the immersive media (geometric track of figures 2 and 3); at least one attribute track comprising attribute data of the immersive media (page 4 paragraph (0080)); and an occupancy track comprising occupancy map data of the immersive media (figures 2 and 3 show a mapping of multiple areas in relation to the field of vision or the virtual reality VR video); accessing the immersive media data (the terminal may establish a network connection with the streaming media server through any suitable type of access network and request access to the media segment files (page 3 paragraph (0042))) comprises accessing: the geometry data in the at least one geometry track (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)); the attribute data in the at least one attribute track (page 4 paragraph (0080)); and the occupancy map data of the occupancy track (page 2 paragraph (0040)); and performing the decoding operation comprises performing the decoding operation using the geometry data (page 4 paragraph (0074)), the attribute data, and the occupancy map data, to generate the decoded immersive media data (page 3 paragraph (0043)).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANKLIN S ANDRAMUNO whose telephone number is (571)270-3004. The examiner can normally be reached Mon - Fri, 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jefferey Harold can be reached on (571) 272-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANKLIN S ANDRAMUNO/Examiner, Art Unit 2424                                                                                                                                                                                                        /JEFFEREY F HAROLD/Supervisory Patent Examiner, Art Unit 2424