DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

The instant Application No. 16/921,434 is rejected on the grounds of nonstatutory double patenting as being unpatentable over US Patent No. 11,032,570 (claims dated 11/06/2020) as further noted is the map below.
The instant application is comprehensively comparable to the patent application or patented claims and is therefore an obvious variant thereof. Although the conflicting claims are not identical, they are not patentably 
Map:
Instant Application No. 16/921,434
Claim no.
U.S. Patent No. 11,032,570 (claims dated 11/06/2020)
Claim no.
1. A media data processing method, comprising: obtaining metadata information, wherein the metadata information includes property information that describes media data, and wherein the metadata information comprises viewpoint identification information; and processing the media data based on the viewpoint identification information.
1. (Currently amended) A method for presenting media data, comprising: receiving, by a media processing device, a plurality of media data tracks, wherein each media data track comprises media data recorded at a viewpoint, and viewpoint identification information of the viewpoint; obtaining, by the media processing device, viewpoint position information of viewpoints associated with the media data tracks; and displaying, by the media processing device, media data of a first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint; wherein obtaining viewpoint position information of a viewpoint comprises: obtaining a plurality of samples in a timed metadata track associated with the viewpoint, on condition that the position of the viewpoint is dynamic, wherein each sample in the timed metadata track comprises a set of viewpoint position information, and each set of viewpoint position information indicates a position of the viewpoint; or obtaining only one set of viewpoint position information from a media data track associated with the viewpoint, on condition that the position of the viewpoint is static, wherein the set of viewpoint position information indicates a position of the viewpoint.
2. The method according to claim 1, wherein the method further comprises: obtaining viewpoint selection information; and wherein the processing the media data based on the viewpoint identification information comprises: determining a first viewpoint based on the viewpoint selection information and the viewpoint identification information; and processing media data corresponding to the first viewpoint.

1. (Currently amended) A method for presenting media data, comprising: receiving, by a media processing device, a plurality of media data tracks, wherein each media data track comprises media data recorded at a viewpoint, and viewpoint identification information of the viewpoint; obtaining, by the media processing device, viewpoint position information of viewpoints associated with the media data tracks; and displaying, by the media processing device, media data of a first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint; wherein obtaining viewpoint position information of a viewpoint comprises: obtaining a plurality of samples in a timed metadata track associated with the viewpoint, on condition that the position of the viewpoint is dynamic, wherein each sample in the timed metadata track comprises a set of viewpoint position information, and each set of 


1. (Currently amended) A method for presenting media data, comprising: … and displaying, by the media processing device, media data of a first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint; …
4. The method according to claim 2, wherein before the processing the media data corresponding to the first viewpoint, the method further comprises: obtaining, based on the viewpoint identification information and the metadata information, the media data corresponding to the first viewpoint.

2. (Previously presented) The method according to claim 1, wherein before displaying the media data of the first viewpoint, the method further comprises: obtaining viewpoint selection information; and determining the first viewpoint based on the viewpoint selection information and the viewpoint identification information of the first viewpoint; wherein the viewpoint selection information comprises: a default viewpoint; or a viewpoint selected by a user of the media processing device.
5. The method according to claim 1, wherein the metadata information further comprises viewpoint position information, and wherein the viewpoint position information is used to indicate a position of a viewpoint in a spherical coordinate system.

6. (Previously presented) The method according to claim 1, wherein the viewpoint position information of a viewpoint indicates a position of the viewpoint in a spherical coordinate system or a three-dimensional spatial coordinate system.
7. The method according to claim 1, wherein the metadata information includes a metadata track.

9. (Previously presented) The method according to claim 1, wherein the method further comprises: receiving information of a recommended viewport; and wherein displaying the media data of the first viewpoint comprises: displaying the media data of the first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint, and based on the information of the recommended viewport.

10. (Previously presented) The method according to claim 9, wherein the information of the recommended viewport is carried in the timed metadata track.
10. The method according to claim 1, wherein the metadata information includes a metadata track, and wherein the metadata track further comprises director viewport information; and Wherein the processing the media data based on the viewpoint identification information comprises: processing the media data based on the viewpoint identification information and the director viewport information.

9. (Previously presented) The method according to claim 1, wherein the method further comprises: receiving information of a recommended viewport; and wherein displaying the media data of the first viewpoint comprises: displaying the media data of the first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint, and based on the information of the recommended viewport.




1. (Currently amended) A method for presenting media data, comprising: receiving, by a media processing device, a plurality of media data tracks, wherein each media data track comprises media data recorded at a viewpoint, and viewpoint identification information of the viewpoint; obtaining, by the media processing device, viewpoint position information of viewpoints associated with the media data tracks; and displaying, by the media processing device, media data of a first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint; wherein obtaining viewpoint position information of a viewpoint comprises: obtaining a plurality of samples in a timed metadata track associated with the viewpoint, on condition that the position of the viewpoint is dynamic, wherein each sample in the timed metadata track comprises a set of viewpoint position information, and each set of viewpoint position information indicates a position of the viewpoint; or obtaining only one set of viewpoint position information from a media data track associated with the viewpoint, on condition that the position of the viewpoint is static, wherein the set of viewpoint position information indicates a position of the viewpoint.
12. The apparatus according to claim 11, wherein the obtaining module is further configured to obtain viewpoint selection information; and Wherein the processing module is specifically configured to: determine a first viewpoint based on the viewpoint selection information and the viewpoint identification information; and process media data corresponding to the first viewpoint.

1. (Currently amended) A method for presenting media data, comprising: receiving, by a media processing device, a plurality of media data tracks, wherein each media data track comprises media data recorded at a viewpoint, and viewpoint identification information of the viewpoint; obtaining, by the media processing device, viewpoint position information of viewpoints associated with the media data tracks; and displaying, by the media processing device, media data of a first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint; wherein obtaining viewpoint position information of a viewpoint comprises: obtaining a plurality of samples in a timed metadata track associated with the viewpoint, on condition that the position of the viewpoint is dynamic, wherein each sample in the timed metadata track comprises a set of viewpoint position information, and each set of viewpoint position information indicates a position of the viewpoint; or obtaining only one set of viewpoint position information from a media data track associated with the viewpoint, on condition that the position of the viewpoint is static, wherein the set of 


1. (Currently amended) A method for presenting media data, comprising: … and displaying, by the media processing device, media data of a first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint; …
14. The apparatus according to claim 12, wherein before the processing module processes the media data corresponding to the first viewpoint, the processing module is further configured to: obtain, based on the viewpoint identification information and the metadata information, the media data corresponding to the first viewpoint.

2. (Previously presented) The method according to claim 1, wherein before displaying the media data of the first viewpoint, the method further comprises: obtaining viewpoint selection information; and determining the first viewpoint based on the viewpoint selection information and the viewpoint identification information of the first viewpoint; wherein the viewpoint selection information comprises: a default viewpoint; or a viewpoint selected by a user of the media processing device.
15. The apparatus according to claim 11, wherein the metadata information further comprises viewpoint position information, and wherein the viewpoint position information is used to indicate a position of a viewpoint in a spherical coordinate system.

6. (Previously presented) The method according to claim 1, wherein the viewpoint position information of a viewpoint indicates a position of the viewpoint in a spherical coordinate system or a three-dimensional spatial coordinate system.
17. The apparatus according to claim 11, wherein the metadata information includes a metadata track.

9. (Previously presented) The method according to claim 1, wherein the method further comprises: receiving information of a recommended viewport; and wherein displaying the media data of the first viewpoint comprises: displaying the media data of the first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint, and based on the information of the recommended viewport.

10. (Previously presented) The method according to claim 9, wherein the information of the recommended viewport is carried in the timed metadata track.
20. The apparatus according to claim 11, wherein the metadata information includes a metadata track, and the metadata track further comprises director viewport information; and Wherein the processing module is specifically configured to process the media data based on the viewpoint identification information and the director viewport information.
9. (Previously presented) The method according to claim 1, wherein the method further comprises: receiving information of a recommended viewport; and wherein displaying the media data of the first viewpoint comprises: displaying the media data of the first viewpoint based on the viewpoint identification information and viewpoint position information of the first viewpoint, and based on the information of the recommended viewport.

10. (Previously presented) The method according to claim 9, wherein the information of the recommended viewport is carried in the timed metadata track.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
                                                                                                                              
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:

An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


Claim limitations “module”, “and “module” have been interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because they use a generic placeholder coupled with functional language receiving, and processing without reciting sufficient structure to achieve the function.  Furthermore, the generic placeholder is not preceded by a structural modifier.
Since the claim limitations invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, claims 18 has been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof. Examiner notes that there is no corresponding structure described in the specification.
A review of the specification shows that the following there appears to be no corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitation. 

If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. 

For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Regarding claim 11, claim 11 recites “an obtaining module configured to”, and “a processing module configured to”.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 11 and all of its dependents are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Regarding claim 11, recites “an obtaining module configured to”, and “a processing module configured to”. A review of the specification shows that the following there appears to be no corresponding structure described in the specification.

Claim Rejections - 35 USC § 102

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 2, 3, 4, 11, 12, 13, 14 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yoshikawa et al. (US 2020/0195997).

Regarding claim 1, Yoshikawa teaches a media data processing method, comprising: 
obtaining metadata information, 
(In Yoshikawa ¶0165 each of image display apparatuses receives an integrated video and arrangement information (metadata information) transmitted by an image distribution apparatus.)

wherein the metadata information includes property information that describes media data, and 
(In Yoshikawa ¶0151 the arrangement information (metadata information) is information that defines information about each viewpoint image (property information) in the integrated image. In ¶0152 the information about each viewpoint image includes viewpoint information indicating the viewpoint position, coordinates of the viewpoint, or time information about the image.)

wherein the metadata information comprises viewpoint identification information; and 


processing the media data based on the viewpoint identification information.
(In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video by extracting a video in the area having an ID corresponding to the viewpoint indicated in the selected-viewpoint information.)

Regarding claim 2, Yoshikawa teaches the method according to claim 1, wherein the method further comprises: 
obtaining viewpoint selection information; and 
(In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video by extracting a video in the area having an ID corresponding to the viewpoint indicated in the selected-viewpoint information.)

wherein the processing the media data based on the viewpoint identification information comprises: 
determining a first viewpoint based on the viewpoint selection information and the viewpoint identification information; and 
(In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video (first viewpoint) by extracting a video in the area having an ID corresponding to the viewpoint (viewpoint identification information) indicated in the selected-viewpoint information (selection information).)

processing media data corresponding to the first viewpoint.
(In Yoshikawa ¶0202 according to the acquired arrangement information and the acquired selected-viewpoint information, the viewpoint video selector acquires a corresponding viewpoint video from the integrated video.)

Regarding claim 3, Yoshikawa teaches the method according to claim 2, wherein the processing media data corresponding to the first viewpoint comprises: presenting the media data corresponding to the first viewpoint.


Regarding claim 4, Yoshikawa teaches the method according to claim 2, wherein before the processing the media data corresponding to the first viewpoint, the method further comprises: obtaining, based on the viewpoint identification information and the metadata information, the media data corresponding to the first viewpoint.
(In Yoshikawa ¶0151 the arrangement information (metadata information) is information that defines information about each viewpoint image in the integrated image. In ¶0152 the information about each viewpoint image includes viewpoint information indicating a ID/identification of the viewpoint. In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video by extracting a video in the area having an ID corresponding to the viewpoint indicated in the selected-viewpoint information. Where in ¶0201 the viewpoint video selector acquires, from UI controller, the selected-viewpoint information for determining the viewpoint for display.)

Regarding claim 11, Yoshikawa teaches a media data processing apparatus, comprising: 
an obtaining module, configured to obtain metadata information, 
(In Yoshikawa ¶0165 each of image display apparatuses receives an integrated video and arrangement information (metadata information) transmitted by an image distribution apparatus.)

wherein the metadata information includes property information that describes media data, and 
(In Yoshikawa ¶0151 the arrangement information (metadata information) is information that defines information about each viewpoint image (property information) in the integrated image. In ¶0152 the information about each viewpoint image includes viewpoint information indicating the viewpoint position, coordinates of the viewpoint, or time information about the image.)

wherein the metadata information comprises viewpoint identification information; and 


a processing module, configured to process the media data based on the viewpoint identification information.
(In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video by extracting a video in the area having an ID corresponding to the viewpoint indicated in the selected-viewpoint information.)

Regarding claim 12, Yoshikawa teaches the apparatus according to claim 11, 
wherein the obtaining module is further configured to obtain viewpoint selection information; and 
(In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video by extracting a video in the area having an ID corresponding to the viewpoint indicated in the selected-viewpoint information.)

wherein the processing module is specifically configured to: 
determine a first viewpoint based on the viewpoint selection information and the viewpoint identification information; and
(In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video (first viewpoint) by extracting a video in the area having an ID corresponding to the viewpoint (viewpoint identification information) indicated in the selected-viewpoint information (selection information).)

 process media data corresponding to the first viewpoint.
(In Yoshikawa ¶0202 according to the acquired arrangement information and the acquired selected-viewpoint information, the viewpoint video selector acquires a corresponding viewpoint video from the integrated video.)

Regarding claim 13, Yoshikawa teaches the apparatus according to claim 12, wherein the processing module is specifically configured to: present the media data corresponding to the first viewpoint.


Regarding claim 14, Yoshikawa teaches the apparatus according to claim 12, wherein before the processing module processes the media data corresponding to the first viewpoint, the processing module is further configured to: obtain, based on the viewpoint identification information and the metadata information, the media data corresponding to the first viewpoint.
(In Yoshikawa ¶0151 the arrangement information (metadata information) is information that defines information about each viewpoint image in the integrated image. In ¶0152 the information about each viewpoint image includes viewpoint information indicating a ID/identification of the viewpoint. In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video by extracting a video in the area having an ID corresponding to the viewpoint indicated in the selected-viewpoint information. Where in ¶0201 the viewpoint video selector acquires, from UI controller, the selected-viewpoint information for determining the viewpoint for display.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Yoshikawa et al. (US 2020/0195997) in view of Hannuksela (US 2017/0347026).

Regarding claim 5, Yoshikawa teaches the method according to claim 1, wherein the metadata information further comprises viewpoint position information.
(In Yoshikawa ¶0151 the arrangement information (metadata information) is information that defines information about each viewpoint image in the integrated image. In ¶0152 the information about each viewpoint image includes viewpoint information indicating the viewpoint position, and coordinates of the viewpoint, or time information about the image.)

Yoshikawa does not explicitly teach wherein the viewpoint position information is used to indicate a position of a viewpoint in a spherical coordinate system.

However, Hannuksela teaches wherein the viewpoint position information is used to indicate a position of a viewpoint in a spherical coordinate system.
(In Hannuksela Fig 6 and ¶0288 the position of the viewport on a sphere is indicated using two angles of a spherical coordinate system indicating a specific point of the viewport, such as the center point or a particular corner point of the viewport.)

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Hannuksela so that the coordinates of the viewports in Yoshikawa can explicitly represent positions within a spherical region as taught by Wang. This modification would correctly decode the media content, resulting in a continuous sequence of decoded pictures with no gaps in a 360-degree video environment, as taught by Hannuksela ¶0005 and ¶0266, indicating an enhance image for a better user experience.

Regarding claim 15, Yoshikawa teaches the apparatus according to claim 11, wherein the metadata information further comprises viewpoint position information.
indicating the viewpoint position, and coordinates of the viewpoint, or time information about the image.)

Yoshikawa does not explicitly teach wherein the viewpoint position information is used to indicate a position of a viewpoint in a spherical coordinate system.

However, Hannuksela teaches wherein the viewpoint position information is used to indicate a position of a viewpoint in a spherical coordinate system.
(In Hannuksela Fig 6 and ¶0288 the position of the viewport on a sphere is indicated using two angles of a spherical coordinate system indicating a specific point of the viewport, such as the center point or a particular corner point of the viewport.)

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Hannuksela so that the coordinates of the viewports in Yoshikawa can explicitly represent positions within a spherical region as taught by Wang. This modification would correctly decode the media content, resulting in a continuous sequence of decoded pictures with no gaps in a 360-degree video environment, as taught by Hannuksela ¶0005 and ¶0266, indicating an enhance image for a better user experience.

Claims 6, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Yoshikawa et al. (US 2020/0195997) in view of Hannuksela (US 2017/0347026) in view of Filippini et al. (US 2018/0336929).

Regarding claim 6, the combination of Yoshikawa and Hannuksela teaches the method according to claim 5, wherein the metadata information comprises box information.


The combination of Yoshikawa and Hannuksela does not explicitly teach wherein the box information comprises the viewpoint position information.

However, Filippini teaches wherein the box information comprises the viewpoint position information.
(In Filippini ¶0052 teaches a spherical coordinate system in which a position is identified by two angles, as defined by International Organization for Standardization (ISO). Using the coordinate system, a location can be specified for a virtual light source relative to a fixed position in the viewing room (e.g., the user's optimal viewpoint) as a set of coordinates. The coordinates for virtual light sources are included as part of the environmental effects metadata in a metadata track. Each set of coordinates defining a location of a virtual light source represents part of the metadata defining an environmental event that is associated with a time in the metadata track. )

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa, Hannuksela, and Filippini so that the metadata provided in the box of Hannukesela can explicitly include a complete media dataset including media content data and environmental effects metadata defining a set of environmental events, as taught by Filippini. This modification adjusts parameters accordingly to more accurately reproduce the requested ambient effects of the content, as taught in Filippini the abstract and ¶0045, indicating an enhance experience for the user.

Regarding claim 16, the combination of Yoshikawa and Hannuksela teaches the apparatus according to claim 15, wherein the metadata information comprises box information.
(In Hannuksela ¶0369, Hannuksela’s embodiments have been described in relation to ISOBMFF and/or formats derived from ISOBMFF. In ¶0069 a building block in the ISOBMFF is called a box.)



However, Filippini teaches wherein the box information comprises the viewpoint position information.
(In Filippini ¶0052 teaches a spherical coordinate system in which a position is identified by two angles, as defined by International Organization for Standardization (ISO). Using the coordinate system, a location can be specified for a virtual light source relative to a fixed position in the viewing room (e.g., the user's optimal viewpoint) as a set of coordinates. The coordinates for virtual light sources are included as part of the environmental effects metadata in a metadata track. Each set of coordinates defining a location of a virtual light source represents part of the metadata defining an environmental event that is associated with a time in the metadata track. )

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa, Hannuksela, and Filippini so that the metadata provided in the box of Hannukesela can explicitly include a complete media dataset including media content data and environmental effects metadata defining a set of environmental events, as taught by Filippini. This modification adjusts parameters accordingly to more accurately reproduce the requested ambient effects of the content, as taught in Filippini the abstract and ¶0045, indicating an enhance experience for the user.

Claims 7, 8, 9, 10, 17, 18, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yoshikawa et al. (US 2020/0195997) in view of Wang (US 2018/0276890).

Regarding claim 7, Yoshikawa teaches the method according to claim 1.

Yoshikawa does not explicitly teach wherein the metadata information includes a metadata track.

However, Wang teaches wherein the metadata information includes a metadata track.


Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Wang so that the information of the viewports in Yoshikawa can explicitly represented via an ISO, as taught by Wang. This modification would indicate that a file that has been formatted according to the ISOBMFF, includes virtual reality content, so that video player devices can properly render the virtual reality content, as taught by Wang the abstract and ¶0004. 

Regarding claim 8, Yoshikawa teaches the method according to claim 1.

Yoshikawa does not explicitly teach wherein the metadata information includes a media presentation description.

However, Wang teaches wherein the metadata information includes a media presentation description.
(In Wang Fig 2 and ¶0135 the movie box includes a movie header box. The movie header box includes information that describes the presentation as a whole.)

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Wang so that the information of the viewports in Yoshikawa can explicitly represented via an ISO, as taught by Wang. This modification would indicate that a file that has been formatted according to the ISOBMFF, includes virtual reality content, so that video player devices can properly render the virtual reality content, as taught by Wang the abstract and ¶0004.)

Regarding claim 9, Yoshikawa teaches the method according to claim 1.

Yoshikawa does not explicitly teach wherein the metadata information includes supplemental enhancement information.

(In Wang ¶0069 supplemental Enhancement Information SE) messages are included in video bitstreams.)

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Wang so that the information of the viewports in Yoshikawa can incorporate SEI messages, as taught in Wang, which are used to carry information/metadata that is not essential in order to decode the bitstream. This information is useful in improving the display or processing of the decoded output, where such information could be used by decoder-side entities to improve the view-ability of the content, as taught by Wang ¶0069.

Regarding claim 10, Yoshikawa teaches the method according to claim 1.

Yoshikawa alone does not explicitly teach wherein the metadata information includes a metadata track, and wherein the metadata track further comprises director viewport information; and wherein the processing the media data based on the viewpoint identification information comprises: processing the media data based on the viewpoint identification information and the director viewport information.

However, Yoshikawa in view of Wang teaches
wherein the metadata information includes a metadata track, and wherein the metadata track further comprises director viewport information; and 
(In Wang ¶0150-¶0152 a recommended viewport timed metadata track indicates the viewport that should be displayed when the user does not have control of the viewing orientation or has released control of the viewing orientation. In ¶0151 the recommended viewport timed metadata track may be used for indicating a recommended viewport based on a director's cut.)

wherein the processing the media data based on the viewpoint identification information comprises: 

processing the media data based on the viewpoint identification information and the director viewport information.
(In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video by extracting a video in the area having an ID corresponding to the viewpoint indicated in the selected-viewpoint information. Note ¶0208 is able to automatically select a viewpoint.)
(In Wang ¶0150-¶0152 a recommended viewport timed metadata track indicates the viewport that should be displayed when the user does not have control of the viewing orientation or has released control of the viewing orientation. In ¶0151 the recommended viewport timed metadata track may be used for indicating a recommended viewport based on a director's cut.)
Note: Yoshikawa teaches each viewpoint image includes viewpoint information indicating a ID/identification of the viewpoint, therefore Wang’s director’s cut viewport would include ID/identification information. The combination teaches selecting a selected-viewport having an ID/identification information, and acquiring the desired video, according to Yoshikawa. Where the selected viewpoint is Wang’s recommended viewport based on a director’s cut.

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Wang so that the information of the viewports in Yoshikawa can explicitly represented via an ISO, as taught by Wang. This modification would indicate that a file that has been formatted according to the ISOBMFF, includes virtual reality content, so that video player devices can properly render the virtual reality content, as taught by Wang the abstract and ¶0004.

Regarding claim 17, Yoshikawa teaches the apparatus according to claim 11.

Yoshikawa does not explicitly teach wherein the metadata information includes a metadata track.

However, Wang teaches wherein the metadata information includes a metadata track.
(In Wang Fig 2 and ¶0135 the movie box includes one or more track boxes. The track box includes the information for a track in the presentation.)

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Wang so that the information of the viewports in Yoshikawa can explicitly represented via an ISO, as taught by Wang. This modification would indicate that a file that has been formatted according to the ISOBMFF, includes virtual reality content, so that video player devices can properly render the virtual reality content, as taught by Wang the abstract and ¶0004. 

Regarding claim 18, Yoshikawa teaches the apparatus according to claim 11.

Yoshikawa does not explicitly teach wherein the metadata information includes a media presentation description.

However, Wang teaches wherein the metadata information includes a media presentation description.
(In Wang Fig 2 and ¶0135 the movie box includes a movie header box. The movie header box includes information that describes the presentation as a whole.)

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Wang so that the information of the viewports in Yoshikawa can explicitly represented via an ISO, as taught by Wang. This modification would indicate that a file that has been formatted according to the ISOBMFF, includes virtual reality content, so that video player devices can properly render the virtual reality content, as taught by Wang the abstract and ¶0004.)



Yoshikawa does not explicitly teach wherein the metadata information includes supplemental enhancement information.

However, Wang teaches wherein the metadata information includes supplemental enhancement information.
(In Wang ¶0069 supplemental Enhancement Information SE) messages are included in video bitstreams.)

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Wang so that the information of the viewports in Yoshikawa can incorporate SEI messages, as taught in Wang, which are used to carry information/metadata that is not essential in order to decode the bitstream. This information is useful in improving the display or processing of the decoded output, where such information could be used by decoder-side entities to improve the view-ability of the content, as taught by Wang ¶0069.

Regarding claim 20, Yoshikawa teaches the apparatus according to claim 11.

Yoshikawa alone does not explicitly teach wherein the metadata information includes a metadata track, and the metadata track further comprises director viewport information; and wherein the processing module is specifically configured to process the media data based on the viewpoint identification information and the director viewport information.

However, Yoshikawa in view of Wang teaches 
wherein the metadata information includes a metadata track, and the metadata track further comprises director viewport information; and 
(In Wang ¶0150-¶0152 a recommended viewport timed metadata track indicates the viewport that should be displayed when the user does not have control of the viewing orientation or has released control of the viewing 

wherein the processing module is specifically configured to process the media data based on the viewpoint identification information and the director viewport information.
(In Yoshikawa ¶0151 the arrangement information is information that defines information about each viewpoint image in the integrated image. In ¶0152 the information about each viewpoint image includes viewpoint information indicating a ID/identification of the viewpoint. In Yoshikawa ¶0203 a viewpoint video selector acquires the desired video by extracting a video in the area having an ID corresponding to the viewpoint indicated in the selected-viewpoint information. Note ¶0208 is able to automatically select a viewpoint.)
(In Wang ¶0150-¶0152 a recommended viewport timed metadata track indicates the viewport that should be displayed when the user does not have control of the viewing orientation or has released control of the viewing orientation. In ¶0151 the recommended viewport timed metadata track may be used for indicating a recommended viewport based on a director's cut.)
Note: Yoshikawa teaches each viewpoint image includes viewpoint information indicating a ID/identification of the viewpoint, therefore Wang’s director’s cut viewport would include ID/identification information. The combination teaches selecting a selected-viewport having an ID/identification information, and acquiring the desired video, according to Yoshikawa. Where the selected viewpoint is Wang’s recommended viewport based on a director’s cut.

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the instant application, to combine Yoshikawa and Wang so that the information of the viewports in Yoshikawa can explicitly represented via an ISO, as taught by Wang. This modification would indicate that a file that has been formatted according to the ISOBMFF, includes virtual reality content, so that video player devices can properly render the virtual reality content, as taught by Wang the abstract and ¶0004.

Conclusion

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on 571-272-4195.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/TERRIKA PETERSON/Examiner, Art Unit 2426



/NASSER M GOODARZI/Supervisory Patent Examiner, Art Unit 2426