DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 03/24/2022 have been fully considered but they are not persuasive. Applicant argues on page 9 first paragraph “Huang does not describe any data indicating relationships between regions and tracks, and the tracks in Huang are at the server, and are thus separate from the recommended viewport information in Huang. It therefore follows that Huang also fails to disclose, ‘performing a decoding operation based on the set of tracks, the elementary data track, the grouping data, and the region metadata to generate decoded immersive media data.” Examiner respectfully disagrees. Nothing in the claims specify whether the tracks of the invention should be at the server or at the customer’s device. Furthermore, Huang teaches in figures 2 and 3 the relationship between regions and the tracks of each title video. Moreover, the standard track would be each video title as shown in figure 3, however, the system not only gives this option to the user but also the independent viewports #1 and #2. In this case, these generates an independent set of tracks, grouping data and region to generate decoded immersive media data. In addition, Huang teaches on (page 2 paragraph (0040)) “each motion-constrained tile set sequence serves as a subset of a tile track covering a VR video spatial region, and may be independently decoded and encapsulated into a video file for a streaming transmission.” In addition, Huang teaches on (page 2 paragraph (0041)) “the encapsulator encapsulates an original audio and video elementary stream into multiple media segment files with fixed time intervals. In addition, the encapsulator is also responsible for providing index information of the media segment files, such as a media presentation description (MPD) in a dynamic adaptive streaming over hypertext transfer protocol (HTTP) (DASH).” As a result, Huang broadly teaches decoding based on a set of tracks, grouping data and the region metadata to generate decoded media data. 


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Huang (US 2021/0076081 A1). Hereinafter referred as Huang.
Regarding claims 1, 14 and 20, Huang teaches a decoding and encoding method for decoding video data for immersive media (page 2 paragraph (0035)), the method comprising: accessing immersive media data (the terminal may establish a network connection with the streaming media server through any suitable type of access network and request access to the media segment files (page 3 paragraph (0042))) comprising: a set of tracks, wherein: each track of the set of tracks comprises associated to-be-decoded immersive media data that corresponds to an associated spatial portion of immersive media content that is different than the associated spatial portions of other tracks in the set of tracks (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)); an elementary data track comprising first immersive media elementary data, wherein at least one track of the set of tracks references the elementary data track (a VR video image is projected on a unit sphere, an original point of a global coordinate axis is same as a center point of an audio/video acquisition device and a position of an observer’s head in a three-dimensional space. […] As shown in fig. 2, the position of the center point of a user viewpoint (page 2 paragraph (0038))); grouping data that specifies a spatial relationship among the tracks in the set of tracks in the immersive media content (a VR video projection frame may be segmented into a sub-image sequence or a motion-constrained tile set before being encoded, so that video transmission bandwidth requirements may be reduced or video decoding complexity may be reduced unto a condition of providing same video resolution/quality for the user (page 2 paragraph (0039))); region metadata comprising data that specifies a spatial relationship between a viewing region in the immersive media content and a subset of tracks of the set of tracks (if the sphere region of the recommended viewport video playing viewport viewport#2 in the VR video is different from the sphere region of the current video playing viewport viewport#1, the client requests a media segment, i.e. viewport#2 in one or more video files of the sphere region covered by the video content corresponding to the sphere region of the recommended viewport viewport#2 according to the playing time information of the recommended viewport viewport#2 from the server (page 6 paragraph (0111))), wherein each track in the subset of tracks contributes at least a portion of the visual content of the region (figures 2-3); and performing a decoding operation based on the set of tracks (each motion-constrained tile set sequence serves as a subset of a tile track covering a VR video spatial region, and may be independently decoded and encapsulated into a video file for a streaming transmission (page 2 paragraph (0040)), the elementary data track, the grouping data, and the region metadata to generate decoded immersive media data (the encapsulator encapsulates an original audio and video elementary stream into multiple media segment files with fixed time intervals. In addition, the encapsulator is also responsible for providing index information of the media segment files, such as a media presentation description (MPD) in a dynamic adaptive streaming over hypertext transfer protocol (HTTP) (DASH) (page 2 paragraph (0041)). 
Regarding claims 2 and 15, Huang teaches the decoding and encoding method of claims 1 and 14, wherein accessing the immersive media data comprises: accessing an immersive media bit-stream (the terminal may establish a network connection with the streaming media server through any suitable type of access network and request access to the media segment files (page 3 paragraph (0042))) comprising: a set of patch tracks, wherein each patch track corresponds to an associated track in the set of tracks (a VR video projection frame may be segmented into a sub-image sequence or a motion-constrained tile set before being encoded, so that video transmission bandwidth requirements may be reduced or video decoding complexity may be reduced unto a condition of providing same video resolution/quality for the user (page 2 paragraph (0039))); and the elementary data track, wherein each patch track in the set of patch tracks references the elementary data track (a VR video image is projected on a unit sphere, an original point of a global coordinate axis is same as a center point of an audio/video acquisition device and a position of an observer’s head in a three-dimensional space. […] As shown in fig. 2, the position of the center point of a user viewpoint (page 2 paragraph (0038))). 
Regarding claims 3 and 16, Huang teaches the decoding and encoding method of claims 1 and 14, wherein accessing the immersive media data comprises: accessing a set of immersive media bitstreams (the terminal may establish a network connection with the streaming media server through any suitable type of access network and request access to the media segment files (page 3 paragraph (0042))), wherein each immersive media bitstream comprises: a track from the set of tracks (a VR video projection frame may be segmented into a sub-image sequence or a motion-constrained tile set before being encoded, so that video transmission bandwidth requirements may be reduced or video decoding complexity may be reduced unto a condition of providing same video resolution/quality for the user (page 2 paragraph (0039))); and an associated elementary data track, wherein the track references the associated elementary data track, such that an immersive media bitstream from the set of immersive media bitstreams comprises the elementary data track (a VR video image is projected on a unit sphere, an original point of a global coordinate axis is same as a center point of an audio/video acquisition device and a position of an observer’s head in a three-dimensional space. […] As shown in fig. 2, the position of the center point of a user viewpoint (page 2 paragraph (0038))). 
Regarding claim 4, Huang teaches the decoding method of claim 1, wherein the region comprises a sub-portion of the viewable immersive media data that is less than a full viewable portion of the immersive media data (figures 2-3 show the sub-portion of viewports 1 and 2 is less than the full viewable portion of the available viewing area). 
Regarding claim 5, Huang teaches the decoding method of claim 1, wherein the region comprises a viewport (viewports 1 and 2 figures 2-3).
Regarding claims 6 and 17, Huang teaches the decoding and encoding method of claims 1 and 14, wherein accessing the region metadata comprises accessing a track grouping box in each track in the set of tracks (each motion-constrained tile set sequence serves as a subset of a tile track covering a VR video spatial region, and may be independently decoded and encapsulated into a video file for a streaming transmission (page 2 paragraph (0040)). 
Regarding claims 7 and 18, Huang teaches the decoding and encoding method of claims 1 and 14, wherein accessing the region metadata comprises accessing a timed metadata track that references the subset of tracks (the encapsulator encapsulates an original audio and video elementary stream into multiple media segment files with fixed time intervals. In addition, the encapsulator is also responsible for providing index information of the media segment files, such as a media presentation description (MPD) in a dynamic adaptive streaming over hypertext transfer protocol (HTTP) (DASH) (page 2 paragraph (0041)).
Regarding claims 8 and 19, Huang teaches the decoding and encoding method of claims 1 and 14, wherein accessing the immersive media data comprises accessing a streaming manifest file that comprises a track representation for each track in the set of tracks (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)).
Regarding claim 9, Huang teaches the decoding method of claim 8, wherein each track representation is associated with a set of component track representations (figures 2 and 3 show each track representation and its association with a set of component). 
Regarding claim 10, Huang teaches the decoding method of claim 8, wherein the streaming manifest file comprises a descriptor that specifies the region metadata (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)).
Regarding claim 11, Huang teaches the decoding method of claim 8, wherein the streaming manifest file comprises a timed metadata representation for a timed metadata track comprising the region metadata (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)).
Regarding claim 12, Huang teaches the decoding method of claim 1, wherein the immersive media content comprises point cloud multimedia (a recommended viewport for playing a virtual reality (VR) video is determined; and one or more video files corresponding to the recommended viewport are requested from a server (page 1 paragraph (0008))). 
Regarding claim 13, Huang teaches the decoding method of claim 1, wherein the elementary data track comprises: at least one geometry track comprising geometry data of the immersive media (geometric track of figures 2 and 3); at least one attribute track comprising attribute data of the immersive media (page 4 paragraph (0080)); and an occupancy track comprising occupancy map data of the immersive media (figures 2 and 3 show a mapping of multiple areas in relation to the field of vision or the virtual reality VR video); accessing the immersive media data (the terminal may establish a network connection with the streaming media server through any suitable type of access network and request access to the media segment files (page 3 paragraph (0042))) comprises accessing: the geometry data in the at least one geometry track (the terminal selects a resolution or quality of a VR video file to be transmitted based on metadata such as a viewpoint orientation, a viewport, and the like. As shown in figure 3, the user viewport at a certain moment is viewport #1, and the resolutions or qualities of video tracks of the tile1 and tile4 corresponding to the viewport Viewport#1 requested by the terminal should be higher than those of video tracks of other invisible regions. But when the user viewport is switched to viewport#2, the terminal is requested to acquire video tracks of the tile3 and tile6 with higher resolutions or qualities (page 3 paragraph (0044)); the attribute data in the at least one attribute track (page 4 paragraph (0080)); and the occupancy map data of the occupancy track (page 2 paragraph (0040)); and performing the decoding operation comprises performing the decoding operation using the geometry data (page 4 paragraph (0074)), the attribute data, and the occupancy map data, to generate the decoded immersive media data (page 3 paragraph (0043)).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANKLIN S ANDRAMUNO whose telephone number is (571)270-3004. The examiner can normally be reached Mon - Fri, 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jefferey Harold can be reached on (571) 272-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANKLIN S ANDRAMUNO/Examiner, Art Unit 2424                                                                                                                                                                                                        /JEFFEREY F HAROLD/Supervisory Patent Examiner, Art Unit 2424