DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant has filed a second RCE on 28 September 2021 along with a Reply that includes amendments and arguments.

Response to Amendment
In the Reply filed 28 September 2021 Applicant:
Broadened the scope of the independent claims 1, 19 and 20 by removing a substantial portion of these claims that was originally added in the Reply filed 03 May 2021 but since subjected to a Final Rejection mailed 28 June 2021;
Removed the conditional claim language thereby overcoming the conditional claim language interpretation of claims 1, 4, 5, 6-12, 14-18;
Removed the “is to be used” language from claims 1, 19, and 20 thereby overcoming the 35 USC 112(b) rejection of these claims and their dependent claims 4-12, 14-18, and 21-23;
Added a generating step similar to cancelled claim 8 but also reciting “without the use of the predicted viewport” which is disclosed by Varerkar as explained below; and
Added new claim 24 with language parallel to claim 11.


Response to Arguments
Applicant's arguments filed 28 September 2021 have been fully considered but they are not persuasive. 
Applicant mischaracterizes Vererkar’s Fig 25 by stating, on page 9, that Fig. 25 “presents yet another technique for encoding 3D information also based on a viewport, in which both a low resolution version of a 3D point cloud scene within a viewport and a high resolution version of key features of the scene within the viewport (as 3D a point cloud) are compressed and sent to client” (emphasis added to highlight mischaracterizations).  The “low resolution version of a 3D point cloud scene” is actually a three-dimensional reduced resolution version of the frame of the volumetric video and is not “within a viewport” but instead encompasses the full scene as evidenced by Fig. 25 showing the “full resolution, full scene point cloud in 2501 being downscaled/sampled in 2503 to generate “low-resolution full scene point cloud” frames.  
Furthermore and in direct rebuttal to the main theme of the arguments on pages 10-11 and the amendments to independent claims 1, 19, and 20; the Varerkar’s three-dimensional reduced resolution version of the frame of the volumetric video is generated without use of the predicted viewport as clearly shown in Fig. 25 in which two parallel, independent processing pathways are used to 1) downscale/sample 2503 the full resolution, full scene point cloud without reference or use of the predicted viewport and independently 2) select key features which include a predicted viewport via metadata to generate high resolution key feature point cloud.  In other words, the process in 1) is unaffected by the server’s predicted viewing perspective by the user such that the client device can potentially apply a viewing perspective different from the 
As to Fig 25, Applicant argues (page 11) that the user position and field of view are fed as a metadata bitstream and that [0256] states “the bistream is then used with the contents of the scene to provide for downscaling/sampling at block 2502 and key feature selection at block 2507” somehow shows use of a viewport in the formation of the 3D reduced resolution version of the frame.  But this quotation is curiously incomplete and ignores the rest of the sentence stating “wherein downscaling/sampling at block 2503 and key feature selection at block 2507 lead to low-resolution full scene point cloud at block 2505 and high resolution key feature point cloud at block 2509, respectively” (emphasis added to highlight clear teachings conveniently ignored by Applicant).  
It is noted that the “key features” used in 2507 specifically include user’s position, FOV and occlusions as well as other features such as distance of objects from cameras and objects of interest as per [0255]-[0258].  Thus, the high resolution key feature point cloud includes the predicted viewport.
Applicant also argues, pages 12-13, that the teachings of Chu call for modifying the teachings of Varerkar away from the amended claims and that the teachings of Chu applied to Varerkar mean that the combined system must send only rendered 2D, not 3D.  In response, the claims do not specify rendering but instead “two-dimensional subframes”.  Furthermore, Chu focusses on predictive rendering, at the server, of the predicted viewport per [0036], [0038] and transmission of the rendered two-dimensional 
Even further, Chu renders not just the viewport but a slightly wider field of view 302 as illustrated in Fig. 3A and [0057] to enable fast processing the client side in response to viewport mispredictions.  But if the viewport misprediction is large enough such as when the user has a large head movement or completely turns around, the rendered wider field of view would not be sufficient to continue displaying the viewport with such a large change in view.  As such, the combined system would benefit from also having a three-dimensional reduced resolution version of the (whole) frame of volumetric video such that the client could continue displaying the viewport with a large change of view by rendering the three-dimensional version wherein the reduced resolution disclosed by Varerkar would reduce the impact on transmission bandwidth.  
Claim Interpretation
The claims extensively refer to a “reduced resolution”. For example, each of the independent claims 1, 19 and 20 refer to transmitting “a reduced resolution version of the frame”.  The ordinary meaning of the term “reduced resolution” is the number of pixels comprising an image (e.g. 1920 x 1080 signifies a resolution by specifying the number of pixels in the horizontal and vertical dimensions of an image).  See Miriam-Webster defining resolution as “a measure of the sharpness of an image or of the fineness with which a device (such as a video display, printer, or scanner) can produce or record such an image usually expressed as the total number or density of pixels in the image”.

	In the Reply filed 03 May 2021, Applicant amended the independent claims 1, 19, and 20 to recite “wherein the three-dimensional reduced resolution version of the frame of volumetric video is of a reduced video quality compared with the frame of the volumetric video”.  This claim language excludes lossless image compression because lossless compression does not inherently reduce the video quality as now claimed but the amended claim language includes both voxel subsampling and compression such as octree-based compression, 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6, 7, 12, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Varerkar (US 2020-0045285 A1) and Chu (US-20170257609)
Claim 1
In regards to claim 1, Varerkar discloses a method comprising:
predicting, by a server including at least one processor, a viewport needed by a client device for a volumetric video, the viewport comprising a viewing perspective for use in rendering two-dimensional content based on three-dimensional content {initially it is noted that “for use in rendering” may be interpreted as mere intended use; although the “obtaining” step appears to be performing rendering to “obtain” a two-dimensional subframe particularly in view of the “as such frame appears” language implying a rendered (2D/pixel-based) image as per [0019] of the instant specification, there is no claim language specifying actual rendering.  For evidence regarding the claimed “predicting” step see server-based viewport prediction 2329 in Fig. 23B illustrating servers 2301 performing viewport prediction of a viewport needed by client side including wearable device 2303 providing feedback 2327 to viewport predictor 2329 as 
obtaining,

{see Fig. 25, downscaling/sampling 2503 in which the full resolution version is used to generate the three-dimensional reduced resolution version of the frame while noting that downscaling/sampling process removes pixels thereby degrading/reducing the quality compared with the full resolution version of the frame.  See also [0227], [0231], [0256].   See also [0237]-[0231] and the adaptive resolution citation below for low resolution (reduced resolution) version of the frame which is necessarily a reduced video quality particularly given Applicant’s expansive definition of “reduced resolution” as per the claim interpretation section above
In regards to “without use of the predicted viewport”, note that Varerkar’s three-dimensional reduced resolution version of the frame of the volumetric video is generated without use of the predicted viewport as clearly shown in Fig. 25 in which two parallel, independent processing pathways are used to 1) downscale/sample 2503 the full resolution, full scene point cloud without reference or use of the predicted viewport and independently 2) select key features which include a predicted viewport via metadata to generate high resolution key feature point cloud.  In other words, the process in 1) is unaffected by the server’s predicted viewing perspective by the user such that the client device can potentially apply a viewing perspective different from the full scene point cloud”) as illustrated in Fig. 25 and the citations above.}; and
transmitting, by the server to the client device, 

    PNG
    media_image1.png
    939
    1043
    media_image1.png
    Greyscale

For more on client-side use (e.g. rendering), see [0227]-[0231] indicating that lower resolution data (e.g. Fig. 25, step 2505) outside the predicted viewport is transmitted for use in rendering the reduced resolution version of the frame while noting that the two-dimensional subframe is rendered when the two-dimensional subframe matches a current viewport and the reduced resolution version is used for rendering when outside the viewport as per.  See also the Adaptive Resolution in [0204]-[0216] including foveated rendering for viewports, prediction and relevance mechanism 2010 and classification as hi fidelity or low-fidelity regions.  For more regarding client-side rendering see rendering logic 2167 at client device 2150 renders these 2-D patches/subframes per [0238], [0249], [0263] and where [0211] states that the rendered info includes “immersive media such as 3DoF+ video, 6DoF+video, etc”}

Chu is an analogous reference from the same field of viewport prediction for a head mounted display in a client-server environment.  See abstract, Figs. 1, 3a-c, 14 and cites below.
Chu provides conclusive evidence that content rendering is advantageously performed at the server side.  See [0002] discussing that server-side rendering offers several advantages including client devices leveraging the high-end graphics provided by server GPUs allowing users to enjoy high-end graphics on less powerful client computing devices while developers can target their software for datacenter servers alleviating platform compatibility problems and increasing efficiency in platform performance tuning while also increasing the ease of bug fixing, software updates and hardware updates.  Although Chu may not be processing volumetric video, Chu clearly operates on 3D space as per [0058]-[0059], Fig. 23 and predicts viewports for rendering at the client per [0004] via rendering module 48 per [0044]-[0045].  Moreover, the advantages detailed by Chu for server-side rendering apply with even greater force to volumetric video sources due to the extremely large amount of information being processed to render two-dimensional subframes.

Claim 6
In regards to claim 6, Varerkar discloses [the method of claim 1, further comprising]: obtaining the frame of the volumetric video {see Figs. 23B, 23C and paragraphs [0222]-[0226] discussing various input media obtained by the server including point cloud videos, 6DoF video and video-frame based processing of such input media.  See also the selection 2353 of 2-D patches/subframes (from such 
Claim 7
In regards to claim 7, Varerkar discloses [the method of claim 6,] wherein the frame of the volumetric video is obtained as part of at least a portion of the volumetric video comprising a plurality of encoded frames {see cites for claim 6 above while encoding is shown in in Fig. 23B and discussed in [0235] using encoding logic 2109), wherein the method further comprises:
decoding the frame of the volumetric video from the plurality of encoded frames {see Fig. 23B and [0235] decoding logic 2163}.
Claim 12
In regards to claim 12, Varerkar discloses [the method of claim 1,] wherein the frame of the volumetric video comprises:
a point cloud; or a three-dimensional mesh {point cloud is discussed in the above citations for claim 1}.
Claim 17
In regards to claim 17, Varerkar discloses [the method of claim 1], wherein the viewport comprises a position and an orientation {the viewport is described as having 6DoF (degrees of freedom) that encompass position and orientation as discussed in paragraphs [0004]-[0005], [0220], [0241], [0245], [0250]-[0251]}.
Claim 18
In regards to claim 18, Varerkar discloses [the method of claim 17], wherein the orientation comprises a yaw, a pitch, and a roll {the viewport is described as having 
Independent Claims 19 and 20
The rejection of method claim 1 above applies mutatis mutandis to the corresponding limitations of server claim 19 and non-transitory computer readable medium claim 20.  Further in regards to claim 19’s additional limitations of a processor and a computer-readable medium storing instructions which, when executed by the processor, cause the processor to perform operations [of claim 1], see the citations above in claim 1 to Vererkar that include servers.  See also [0145]-[0148] of Varerkar discussing computer readable media and processor-based embodiments.


Claims 9, 10, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Varekar and Chu as applied to claims 1/6/7/8 above, and further in view of Ainala  {Khartik Ainala, Rufael N. Mekuria, Birendra Khathariya, Zhu Li, Ye-Kui Wang, Rajan Joshi, "An improved enhancement layer for octree based point cloud compression with plane projection approximation," Proc. SPIE 9971, Applications of Digital Image Processing XXXIX, 99710R (27 September 2016); doi: 10.1117/12.2237753}.
Claim 9
In regards to claim 9, Varerkar [the method of claim 8,] is not relied upon to disclose wherein the generating the three-dimensional reduced resolution version of the frame comprises: performing an interframe encoding between the three-dimensional 
Ainala is a highly analogous because it is from the same field of endeavor as the instant invention (3D point cloud compression, see title, abstract and citations below).  Ainala also teaches transmitting two versions with a different resolutions (both two-dimensional frames and a three-dimensional reduced resolution version of the frame of the volumetric video {For three-dimensional reduced resolution version of the frame of volumetric video see the coarse octree coded point cloud mentioned in the abstract which is also referred to and serves as a base layer to which an enhancement layer may be added.  See also Introduction Section 1; Fig. 1 (box 2 octree compression).  For “subframe” and “viewport” see Fig. 1 Input Point Cloud Frames and box 1 Bounding Box Normalization & filtering which is further discussed in section 2 while noting that Varekar is being relied upon to disclose predicting the viewport and obtaining a two-dimensional subframe of viewport while Ainala clearly teaches the concept of sending two different versions with differing resolutions.   Note also that Fig. 1 Input Point Cloud Frames and box 1 Bounding Box Normalization & filtering which is further discussed in section 2 as being inputs to both the octree compression and the added box 3 for coding the enhancement layer based on plane projection.
For the two-dimensional frames see the enhancement layer mentioned in the Abstract and Introduction Section 1 and discussed further in Section 3 Plane Projection Approximation (PPA) Based Geometry Compression in which the Plane Projection Approximation coding mode is discussed.  For transmission see Introduction section 1 and Fig. 1}.

It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar to include generating the three-dimensional reduced resolution version of the frame comprises: performing an interframe encoding between the three-dimensional reduced resolution version of the frame and at least one additional three-dimensional reduced resolution version of at least one additional frame of the volumetric video as taught by Ainala because doing so increases the coding efficiency by further reducing the total data volume via inter-frame encoding and/or because doing so merely combines prior art elements according to known methods to yield predictable results.
Claim 10
In regards to claim 10, Varerkar is not relied upon to disclose but Ainala teaches [the method of claim 8,] wherein the generating the three-dimensional reduced resolution version of the frame comprises applying an octree-based compression {See Introduction Section 1; Fig. 1 (box 2 octree compression).  Note also that Fig. 1 Input Point Cloud Frames and box 1 Bounding Box Normalization & filtering which is further 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar to include wherein the generating the three-dimensional reduced resolution version of the frame comprises applying an octree-based compression as taught by Ainala because Ainala motivates using Octee compression for 3D point clouds due to its efficiency in the abstract and Section 1 and/or because doing so merely combines prior art elements according to known methods to yield predictable results.
Claim 21
In regards to claim 21, Varerkar is not relied upon to disclose but Ainala teaches wherein the generating the three-dimensional reduced resolution version of the frame comprises applying compression {see the coarse octree coded point cloud mentioned in the abstract which is also referred to and serves as a base layer to which an enhancement layer may be added.  See also Introduction Section 1; Fig. 1 (box 2 octree compression)}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention .

Claims 11, 14, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Varerkar and Chu as applied to claim 1 above, and further in view of Bouazizi (US 20190114830 A1).
Claim 11
In regards to claim 11, the base combination renders obvious the generation of a three-dimensional reduced resolution version of the frame but not selecting this reduced resolution version based upon a throughput between the client device and the processing system.
Bouazizi is a highly analogous reference teaching a method comprising: obtaining, by a processing system including at least one processor, a viewport of a client device for a volumetric video {see Figs. 1-4 illustrating a client-server architecture that includes processor(s) 340 at the client device 300 that obtains a viewport of the client device using sensors as described in paragraphs [0060], [0066]. For volumetric video see paragraphs [0026]-[0030]}; obtaining, by the processing system, a two-
transmitting, by the processing system to the client device, the two-dimensional subframe {see transmit and receive circuitry 220, 305, 310, 325, 315 in the figures and specification enabling transmissions/receptions between the client and server including the two-dimensional subframe of a frame of the volumetric video obtained in the obtaining step.  See also paragraphs [0008]-[0009], [0036]-[0039], [0062], [0068], and [0083]-0088]; claims 7, 19}.
Bouazizi also teaches wherein a three-dimensional reduced resolution version of the frame is selected based upon a throughput between the client device and the processing system {see paragraphs [0067]-[0071]}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified the base combination to include wherein a three-dimensional reduced resolution version of the frame is selected based upon a throughput between the client device and the processing system as taught by Bouazizi because doing so offers a flow control of the stream to maintain an appropriate bandwidth to thereby offer an improved streaming service to the user as explicitly motivated by Bouazizi in [0068].
Claim 14
In regards to claim 14, the base combination is not relied upon to disclose but Bouazizi teaches [the method of claim 1], where the step of obtaining the two-
applying an interframe encoding to the plurality of two-dimensional subframes {the 2-D frames of the time-sequential frames of the video are compressed as per paragraph [0046], [0062], [0066]-[0071], [0083]-0088] while also noting that paragraph [0075] mentions various compression formats while paragraph [0090] lists a variety of video coding standards for compressing the 2-D frames such as HEVC, SVC, AVC that apply interframe coding to videos having respective time sequential frames}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified the base combination and particularly Varerkar’s encoding logic 2109 to include where the step of obtaining the two-dimensional subframe comprises generating a plurality of two-dimensional subframes, wherein each of the plurality of two-dimensional subframes is generated from a respective time sequential frame of the volumetric video, the method further comprising: applying an interframe encoding to the plurality of two-dimensional subframes as taught by Bouazizi because doing so merely combines prior art elements according to known methods to yield predictable results.
Claim 24
The rejection of method claim 11 above applies mutatis mutandis to the corresponding limitations of server claim 24. Further in regards to claim 19’s additional limitations of a processor and a computer-readable medium storing instructions which, .


Claims 4, 15, 16, 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Varerkar and Chu as applied to claim 1 above, and further in view of Mendhekar (US 20170103577 A1).
Claim 4
In regards to claim 4, Varerkar’s system include machine learning {see [0070], [0084], [0161], [0163]}  but is not relied upon to disclose wherein the predicted viewport is predicted in accordance with a machine learning model.
Mendhekar is a highly analogous reference that optimizes 360 virtual reality streaming by predicting the viewport to reduce the bandwidth of 3D immersive content
for transmission to and processing by a thin client device.  See paragraphs [0014], [0026]-[0031], [0036]-[0040].  As to predicted viewport based upon prior viewpoints see paragraph [0034]}. In regards to claim 4, Mendhekar teaches “wherein the predicted viewport is predicted in accordance with a machine learning model” also see paragraph [0034]}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar’s predictive viewport to be predicted in accordance 
Claims 15 and 16
In regards to claims 15 and 16, the base combination of Varerka and Chu renders obvious generating the two-dimensional subframe and the three-dimensional reduced resolution version of the frame (see above) but are not relied upon to disclose “caching, by the server, at least one of” these two data.
Mendhekar teaches caching these data (claim 15) and (claim 16) “wherein the two-dimensional subframe is obtained from a cache of the processing system based upon the predicted viewport” {see paragraphs [0032]-[0035] directed to the cache.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified the base combination of Bouazizi and Chu which renders obvious generating the two-dimensional subframe and the three-dimensional reduced resolution version of the frame of the predicted viewpoint, by caching these data (or obtaining these data from a cache) as taught by Mendhekar because caching reduces data access time by storing frequently used data (e.g. the current viewport data that has been transcoded to 2D subframe) in a local cache thereby improving throughput and avoiding re-computing of the viewport frames as explicitly motivated in paragraph [0033] of Mendhekar}.


Claims 22 and 23
.

Claims 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Varerkar and Chu as applied to claim 1 above, and further in view of Flare {Feng Qian, "Flare: Practical Viewport-Adaptive 360-Degree Video Streaming for Mobile Devices," MobiCom’18, October 29-November 2,2018, New Delhi, India © 2018 Association for Computing Machinery. ACM ISBN 378-1-4503-5903-0/18/10., https://doi.org/10.1145/3241539.3241565}.
Claim 4
In regards to claim 4, Varerkar’s system include machine learning {see [0070], [0084], [0161], [0163]}  but is not relied upon to disclose wherein the predicted viewport is predicted in accordance with a machine learning model.
Flare is a highly analogous system and method for streaming 360 degree videos  using a predictive viewport.  See abstract and Introduction. Flare also teaches wherein the predicted viewport is predicted in accordance with a machine learning model {see Introduction section discussing performing viewport prediction (VP) using a wide spectrum of machine learning algorithms.  See also section 3.2 VP Method for Flare}.   
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar’s predictive viewport such that the predicted viewport is predicted in accordance with a machine learning model as taught by Flare because 
Claim 5
In regards to claim 5, Varerkar’s system include machine learning {see [0070], [0084], [0161], [0163]} but is not relied upon to disclose wherein the predicted viewport is predicted in accordance with the machine learning model based upon at least one viewpoint received from the client device.
Flare teaches wherein the predicted viewport is predicted in accordance with the machine learning model based upon at least one viewpoint received from the client device {see Introduction section discussing performing viewport prediction (VP) using a wide spectrum of machine learning algorithms.  See also section 3.2 VP Method for Flare}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar’s predictive viewport such that the predicted viewport is predicted in accordance with a machine learning model based upon at least one viewpoint received from the client device as taught by Flare because machine learning would increase the accuracy of the viewport prediction and/or because doing so merely combines prior art elements according to known methods to yield predictable results.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Pio (US-20190200083 A1) discloses viewport prediction for 3D 360 degree videos.  See [0035], [0046]-[0059].
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL ROBERT CAMMARATA whose telephone number is (571)272-0113. The examiner can normally be reached M-Th 7am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie Atala can be reached on 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ROBERT CAMMARATA/Primary Examiner, Art Unit 2486