DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant filed a Reply on 13 July 2022 that:
Amended independent claims 1, 19, and 20 to recite generating instead of obtaining a two-dimensional subframe but which amendment is met by the applied art of record as further explained below; 
Cancels claim 16; and
Added new claim 25 which has led to the discovery of new prior art and necessitates new grounds of rejection applying Moreno (Moreno, C., Chen, Y., Li, M.: A dynamic compression technique for streaming Kinect-based Point Cloud data. In: International Conference on Computing, NETWORKING and Communications. IEEE, pp. 550–555 (2017)).
Response to Arguments
Applicant's arguments filed 13 July 2022 have been fully considered but they are not persuasive.
Applicant argues that generating, as opposed to the previously recited obtaining, a two-dimensional subframe from a frame of the volumetric video that is generated based upon the predicted viewport distinguishes over He.  Instead, He is characterized as the server selecting up to three fundamental 2D views for transmission to the client.
In response, He states “we project the point cloud frame into six 2D frames and generate videos with different bitrates … differential transmission can be achieved such that personalized contents like the current consumed viewpoint are transmitted via the interactive broadband channel” Abstract, emphasis added; “After point cloud preprocessing, multiple pairs of output video streams with different bitrates are generated and stored in the broadcast or broadband servers, each pair including six fundamental video streams”  (3.1, emphasis added).  From the client side “the receiving media content of each client is made up of an entire point cloud frame with low bit rate and specific segments of the user’s current viewpoint”.  Although such generation may involve a selection process it is clear that such processes are performed on the server side such that “generating, at the server side, …” is disclosed by He. 
Furthermore, these fundamental (2D) views are each two-dimensional subframes from a frame of the volumetric view, generated by the server, and comprising a viewing perspective for use in rendering two-dimensional content based on three-dimensional content.  Significantly when the user’s predicted viewpoint exactly matches one of the six projection angles then only a single fundamental (2D) view that corresponds to this exactly matching viewpoint is generated by the server and transmitted to the client.  See Fig. 2(1) and Section 3.2.   In other words, the six fundamental projections are themselves predicted viewports that are generated on the server side and transmitted along with the three-dimensional reduced resolution version of the frame of the volumetric video as also shown in Fig. 1.

    PNG
    media_image1.png
    691
    851
    media_image1.png
    Greyscale



    PNG
    media_image2.png
    461
    848
    media_image2.png
    Greyscale

More generally for a random viewpoint that doesn’t exactly match one of the six fundamental views, then up to three fundamental views are generated by the server and transmitted to the client.  See Fig. 2(2) and 3.2.  See also 3.2. and 5 discussing view-dependent streaming in which feedback information is used to adaptively stream and personalize content generation and delivery of the current consumed viewpoint.
Applicant also challenges He as not motivating a POSA to modify Varerkar in such a way that the claimed 2D subframe is generated at the server because He’s generation of the 2D frame occurs at the client device.  The above arguments and citations demonstrate otherwise, however.
Moreover, Applicant’s argument do not fully address the full complement of motivations offered in the last office action and repeated here which are: a) He motivates server-side generation/obtaining of 2-D subframes based on the viewport because doing so enables personalized media content delivery that increases system flexibility in 3.1 since the 2-D subframe views are generated/obtained with different bitrates as per sections 2 and 3.2; b) He motivates the three-dimensional reduced resolution version of the frame of the volumetric video being generated without use of the predicted viewport because “the entire frame with low bitrate is demanded by every node of the network terminal, and such common media content could be delivered by digital broadcasting to exploit its superiority for further increasing the transmission efficiency. Conversely, the content of current viewpoint of each user in a certain moment is probability different among the users, so that such personalized media contents, including the push-pull combination media distribution and interaction strategy, are more suitable to be delivered in the bidirectional broad band network to increase the system flexibility. Thus, the advantages of both broadcast and broadband have been utilized respectively to distribute the network bandwidth reasonably and optimize the streaming efficiency” in Section 3.1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6, 7, 12, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Varerkar (US 2020-0045285 A1) and He (Lanyi He, Wenjie Zhu, Ke Zhang, and Yiling Xu. 2018. View-Dependent Streaming of Dynamic Point Cloud over Hybrid Networks. In Advances in Multimedia Information Processing – PCM 2018. Springer International Publishing, Cham, 50–58).

Claim 1
In regards to claim 1, Varerkar discloses a method comprising:
predicting, by a server including at least one processor, a viewport needed by a client device for a volumetric video, the predicted viewport comprising a viewing perspective for use in rendering two-dimensional content based on three-dimensional content {initially it is noted that “for use in rendering” may be interpreted as mere intended use; although the “generate” step appears to be performing rendering to “obtain” a two-dimensional subframe particularly in view of the “as such frame appears” language implying a rendered (2D/pixel-based) image as per [0019] of the instant specification, there is no claim language specifying actual rendering.  For evidence regarding the claimed “predicting” step see server-based viewport prediction 2329 in Fig. 23B illustrating servers 2301 performing viewport prediction of a viewport needed by client side including wearable device 2303 providing feedback 2327 to viewport predictor 2329 as further discussed in [0232]-[0234].  For volumetric video see the point clouds in [0230], [0233], [0239], [0251], and [0253]-[0254]};
generating,frame of the volumetric video as such frame appears from the viewing perspective of the predicted viewport {see Fig. 23C illustrating the selection 2353 of 2-D patches/subframes obtained from the of the point cloud (volumetric video) for transmission to the client as further discussed in paragraphs [0235]-[0243] in order to recreate the rendered texture for the selected/predicted viewport at the client.  After transmission, rendering logic 2167 at client device 2150 renders these 2-D patches/subframes per [0238], [0249], [0263] and where [0211] states that the rendered info includes “immersive media such as 3DoF+ video, 6DoF+video, etc” and, according to [0216], renders for display on a mobile device display screen or desktop monitor which are conventional displays displaying 2D rendered pixel images.  Thus, a user viewing the mobile device display screen or desktop monitor would view “a two-dimensional view of the frame of the volumetric video as such frame appears from the viewing perspective of the predicted viewport” as claimed. It is noted that the “key features” used in Fig. 25 2507 specifically include user’s position, FOV and occlusions as well as other features such as distance of objects from cameras and objects of interest as per [0255]-[0258].  Thus, the high-resolution key feature point cloud includes the predicted viewport.}; 
generating, by the server, a three-dimensional reduced resolution version of the frame of the volumetric video, wherein the three-dimensional reduced resolution version of the frame of the volumetric video is of a reduced video quality compared with the frame of volumetric video, and 
{see Fig. 25, downscaling/sampling 2503 in which the full resolution version is used to generate the three-dimensional reduced resolution version of the frame while noting that downscaling/sampling process removes pixels thereby degrading/reducing the quality compared with the full resolution version of the frame.  See also [0227], [0231], [0256].   See also [0237]-[0231] and the adaptive resolution citation below for low resolution (reduced resolution) version of the frame which is necessarily a reduced video quality particularly given Applicant’s expansive definition of “reduced resolution” as per the claim interpretation section above.
Note that Varerkar’s three-dimensional reduced resolution version of the frame of the volumetric video is shown in Fig. 25 and described in [0256] as a “low-resolution full scene point cloud” which implies that it is generated without the use of the predicted viewport because it is a full scene and not a partial scene that accounts for or otherwise omits the predicted viewport.  Nevertheless, Applicant’s arguments submitted 15 February 2022 raise some doubt on this point due to the use of the metadata bitstream in the downscampling/sampling block 2503 to generate the three-dimensional reduced resolution version of the frame.  As such, Varekar is not relied upon for “wherein the three-dimensional reduced resolution version of the frame of the volumetric video is generated without use of the predicted viewport” as indicated above in strike-through font}; and
transmitting, by the server to the client device, 
It is noted that [0231] indicates that data outside the predicted viewport may be sent in low resolution such that both the predicted viewport data including the two-dimensional subframe and lower resolution data outside the viewport are transmitted.  This lower resolution data outside the viewport includes, in Fig. 25 (reproduced below), downscaling the full resolution volumetric video to generate a low (reduced) resolution of the frame of the volumetric video and key feature selection that generates a key feature point cloud sub-frame of the viewport.  As such both a three-dimensional reduced resolution version of the full frame of the volumetric video and a high-resolution key feature point cloud for the viewport are transmitted; however, the viewport rendering is performed at the client as detailed above such that the server processing system does not send the rendered (two-dimensional subframe) but instead a non-rendered version as indicated above using strikethrough font.

    PNG
    media_image3.png
    939
    1043
    media_image3.png
    Greyscale

For more on client-side use (e.g. rendering), see [0227]-[0231] indicating that lower resolution data (e.g. Fig. 25, step 2505) outside the predicted viewport is transmitted for use in rendering the reduced resolution version of the frame while noting that the two-dimensional subframe is rendered when the two-dimensional subframe matches a current viewport and the reduced resolution version is used for rendering when outside the viewport.  See also the Adaptive Resolution in [0204]-[0216] including foveated rendering for viewports, prediction and relevance mechanism 2010 and classification as hi-fidelity or low-fidelity regions.  For more regarding client-side rendering see rendering logic 2167 at client device 2150 renders these 2-D patches/subframes per [0238], [0249], [0263] and where [0211] states that the rendered info includes “immersive media such as 3DoF+ video, 6DoF+video, etc”}
Although Varekar discloses transmitting both a three-dimensional reduced resolution version of the frame of the volumetric video and a high-resolution key feature point cloud (predicted viewport) for the viewport, the viewport rendering is performed at the client as detailed above such that the server processing system does not send the rendered (two-dimensional subframe) but instead a non-rendered version as indicated above using strikethrough font}.  Additionally, Varekar is not relied upon to disclose wherein the three-dimensional reduced resolution version of the frame of the volumetric video is generated without use of the predicted viewport.
He is analogous art because it is from the same field of volumetric video (3D dynamic point cloud) transmission and solves the same problem of transmitting such massive data with limited bandwidth resources.
He teaches a method and system that performs view-dependent content streaming from a server to HMD clients as illustrated in Fig. 1.  
He also teaches 
generating, by the server, a two-dimensional subframe from a frame of the volumetric video, wherein the two-dimensional subframe that is generated based upon the predicted viewport of the client device and comprising a two-dimensional view of the frame of the volumetric video as such frame appears from the viewing perspective of the viewport 
{Section 2 Point Cloud Preprocessing, Fig. 1, teaches a server preprocessing the 3D point cloud to project the 3D points into 2D images to acquire six complete views of each point cloud frame.
Applicant argues that generating, as opposed to the previously recited obtaining, a two-dimensional subframe from a frame of the volumetric video that is generated based upon the predicted viewport distinguishes over He.  Instead, He is characterized as the server selecting up to three fundamental 2D views for transmission to the client.
In response, He states “we project the point cloud frame into six 2D frames and generate videos with different bitrates … differential transmission can be achieved such that personalized contents like the current consumed viewpoint are transmitted via the interactive broadband channel” Abstract, emphasis added; “After point cloud preprocessing, multiple pairs of output video streams with different bitrates are generated and stored in the broadcast or broadband servers, each pair including six fundamental video streams”  (3.1, emphasis added).  From the client side “the receiving media content of each client is made up of an entire point cloud frame with low bit rate and specific segments of the user’s current viewpoint”.  
Furthermore, these fundamental (2D) views are each two-dimensional subframes from a frame of the volumetric view, generated by the server, and comprising a viewing perspective for use in rendering two-dimensional content based on three-dimensional content.  Significantly when the user’s predicted viewpoint exactly matches one of the six projection angles then only a single fundamental (2D) view that corresponds to this exactly matching viewpoint is generated by the server and transmitted to the client.  See Fig. 2(1) and Section 3.2.   In other words, the six fundamental projections are themselves predicted viewports that are generated on the server side and transmitted along with the three-dimensional reduced resolution version of the frame of the volumetric video as also shown in Fig. 1.

    PNG
    media_image1.png
    691
    851
    media_image1.png
    Greyscale



    PNG
    media_image2.png
    461
    848
    media_image2.png
    Greyscale

More generally for a random viewpoint that doesn’t exactly match one of the six fundamental views, then up to three fundamental views are generated by the server and transmitted to the client.  See Fig. 2(2) and 3.2.  See also 3.2. and 5 discussing view-dependent streaming in which feedback information is used to adaptively stream and personalize content generation and delivery of the current consumed viewpoint.
Applicant also challenges He as not motivating a POSA to modify Varerkar in such a way that the claimed 2D subframe is generated at the server because He’s generation of the 2D frame occurs at the client device.  The above arguments and citations demonstrate otherwise, however.
Moreover, Applicant’s argument do not fully address the full complement of motivations offered in the last office action and repeated here which are: a) He motivates server-side generation/obtaining of 2-D subframes based on the viewport because doing so enables personalized media content delivery that increases system flexibility in 3.1 since the 2-D subframe views are generated/obtained with different bitrates as per sections 2 and 3.2; b) He motivates the three-dimensional reduced resolution version of the frame of the volumetric video being generated without use of the predicted viewport because “the entire frame with low bitrate is demanded by every node of the network terminal, and such common media content could be delivered by digital broadcasting to exploit its superiority for further increasing the transmission efficiency. Conversely, the content of current viewpoint of each user in a certain moment is probability different among the users, so that such personalized media contents, including the push-pull combination media distribution and interaction strategy, are more suitable to be delivered in the bidirectional broad band network to increase the system flexibility. Thus, the advantages of both broadcast and broadband have been utilized respectively to distribute the network bandwidth reasonably and optimize the streaming efficiency” in Section 3.1}; and
generating, by the server, a three-dimensional reduced resolution version of the frame of the volumetric video, wherein the three-dimensional reduced resolution version of the frame of the volumetric video is of a reduced video quality compared with the frame of volumetric video, and wherein the three-dimensional reduced resolution version of the frame of the volumetric video is generated without use of the predicted viewport 
{see Fig. 1 lower right corner showing broadcast server generating and transmitting a low quality, three-dimensional reduced resolution version of the frame of the volumetric video.  See also sections 3.1 discussing that a low-resolution version of the entire point cloud frame is generated by the server.  Section 3.2 clarifies that “although it is unlikely to watch the other unconsumed points within a short time, we still deliver a low quality version of the entire point cloud frame to avoid the exceptional circumstance”.  See also Section 4 Experiments “The digital broadcasting takes charge of the basic layer, which contains the entire content of the point cloud frame, with low bitrate for visual quality assurance under the exceptional circumstance”}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar’s client-side rendering (obtaining) a two-dimensional subframe from a frame of the volumetric video, the two-dimensional subframe generated based upon the predicted viewport of the client device and comprising a two-dimensional view of the frame of the volumetric video as such frame appears from the viewing perspective of the predicted viewport such that such generating (rendering) is performed on the server-side instead of the client-side and such that the rendered result is transmitted, by the server processing system to the client device as taught by He
and wherein the three-dimensional reduced resolution version of the frame of the volumetric video is generated without use of the predicted viewport as also taught by He
 because a) He motivates server-side generation/obtaining of 2-D subframes based on the viewport because doing so enables personalized media content delivery that increases system flexibility in 3.1 since the 2-D subframe views are generated/obtained with different bitrates as per sections 2 and 3.2; b) He motivates the three-dimensional reduced resolution version of the frame of the volumetric video being generated without use of the predicted viewport because “the entire frame with low bitrate is demanded by every node of the network terminal, and such common media content could be delivered by digital broadcasting to exploit its superiority for further increasing the transmission efficiency. Conversely, the content of current viewpoint of each user in a certain moment is probability different among the users, so that such personalized media contents, including the push-pull combination media distribution and interaction strategy, are more suitable to be delivered in the bidirectional broad band network to increase the system flexibility. Thus, the advantages of both broadcast and broadband have been utilized respectively to distribute the network bandwidth reasonably and optimize the streaming efficiency” in Section 3.1.
Claim 6
In regards to claim 6, Varerkar discloses [the method of claim 1, further comprising]: obtaining the frame of the volumetric video {see Figs. 23B, 23C and paragraphs [0222]-[0226] discussing various input media obtained by the server including point cloud videos, 6DoF video and video-frame based processing of such input media.  See also the selection 2353 of 2-D patches/subframes (from such volumetric/point cloud video frames) and rendering frame by the server for transmission to the client as further discussed in paragraphs [0235]-[0243]}.
Claim 7
In regards to claim 7, Varerkar discloses [the method of claim 6,] wherein the frame of the volumetric video is obtained as part of at least a portion of the volumetric video comprising a plurality of encoded frames {see cites for claim 6 above while encoding is shown in in Fig. 23B and discussed in [0235] using encoding logic 2109), wherein the method further comprises:
decoding the frame of the volumetric video from the plurality of encoded frames {see Fig. 23B and [0235] decoding logic 2163}.

Claim 12
In regards to claim 12, Varerkar discloses [the method of claim 1,] wherein the frame of the volumetric video comprises:
a point cloud; or a three-dimensional mesh {point cloud is discussed in the above citations for claim 1}.
Claim 17
In regards to claim 17, Varerkar discloses [the method of claim 1], wherein the predicted viewport comprises a position and an orientation {the viewport is described as having 6DoF (degrees of freedom) that encompass position and orientation as discussed in paragraphs [0004]-[0005], [0220], [0241], [0245], [0250]-[0251]}.
Claim 18
In regards to claim 18, Varerkar discloses [the method of claim 17], wherein the orientation comprises a yaw, a pitch, and a roll {the viewport is described as having 6DoF (degrees of freedom) that encompass position and orientation as discussed in paragraphs [0004]-[0005], [0220], [0241], [0245], [0250]-[0251]}.
Independent Claims 19 and 20
The rejection of method claim 1 above applies mutatis mutandis to the corresponding limitations of server claim 19 and non-transitory computer readable medium claim 20.  Further in regards to claim 19’s additional limitations of a processor and a computer-readable medium storing instructions which, when executed by the processor, cause the processor to perform operations [of claim 1], see the citations above in claim 1 to Vererkar that include servers.  See also [0145]-[0148] of Varerkar discussing computer readable media and processor-based embodiments.
Claims 9, 10, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Varekar and He as applied to claims 1/6/7/8 above, and further in view of Ainala  {Khartik Ainala, Rufael N. Mekuria, Birendra Khathariya, Zhu Li, Ye-Kui Wang, Rajan Joshi, "An improved enhancement layer for octree based point cloud compression with plane projection approximation," Proc. SPIE 9971, Applications of Digital Image Processing XXXIX, 99710R (27 September 2016); doi: 10.1117/12.2237753}.
Claim 9
In regards to claim 9, Varerkar [the method of claim 8,] is not relied upon to disclose wherein the generating the three-dimensional reduced resolution version of the frame comprises: performing an interframe encoding between the three-dimensional reduced resolution version of the frame and at least one additional three-dimensional reduced resolution version of at least one additional frame of the volumetric video.
Ainala is a highly analogous because it is from the same field of endeavor as the instant invention (3D point cloud compression, see title, abstract and citations below).  Ainala also teaches transmitting two versions with a different resolutions (both two-dimensional frames and a three-dimensional reduced resolution version of the frame of the volumetric video {For three-dimensional reduced resolution version of the frame of volumetric video see the coarse octree coded point cloud mentioned in the abstract which is also referred to and serves as a base layer to which an enhancement layer may be added.  See also Introduction Section 1; Fig. 1 (box 2 octree compression).  For “subframe” and “viewport” see Fig. 1 Input Point Cloud Frames and box 1 Bounding Box Normalization & filtering which is further discussed in section 2 while noting that Varekar is being relied upon to disclose predicting the viewport and obtaining a two-dimensional subframe of viewport while Ainala clearly teaches the concept of sending two different versions with differing resolutions.   Note also that Fig. 1 Input Point Cloud Frames and box 1 Bounding Box Normalization & filtering which is further discussed in section 2 as being inputs to both the octree compression and the added box 3 for coding the enhancement layer based on plane projection.
For the two-dimensional frames see the enhancement layer mentioned in the Abstract and Introduction Section 1 and discussed further in Section 3 Plane Projection Approximation (PPA) Based Geometry Compression in which the Plane Projection Approximation coding mode is discussed.  For transmission see Introduction section 1 and Fig. 1}.
Ainala also teaches performing an interframe encoding between the three-dimensional reduced resolution version of the frame and at least one additional three-dimensional reduced resolution version of at least one additional frame of the volumetric video {see Section 2 discussing interframe predictive coding is a conventional, known technique for compressing (generating reduced resolution version) three-dimensional video frames}.  
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar to include generating the three-dimensional reduced resolution version of the frame comprises: performing an interframe encoding between the three-dimensional reduced resolution version of the frame and at least one additional three-dimensional reduced resolution version of at least one additional frame of the volumetric video as taught by Ainala because doing so increases the coding efficiency by further reducing the total data volume via inter-frame encoding and/or because doing so merely combines prior art elements according to known methods to yield predictable results.
Claim 10
In regards to claim 10, Varerkar is not relied upon to disclose but Ainala teaches [the method of claim 8,] wherein the generating the three-dimensional reduced resolution version of the frame comprises applying an octree-based compression {See Introduction Section 1; Fig. 1 (box 2 octree compression).  Note also that Fig. 1 Input Point Cloud Frames and box 1 Bounding Box Normalization & filtering which is further discussed in section 2 as being inputs to both the octree compression and the added box 3 for coding the enhancement layer based on plane projection. For the two-dimensional frames see the enhancement layer mentioned in the Abstract and Introduction Section 1 and discussed further in Section 3 Plane Projection Approximation (PPA) Based Geometry Compression in which the Plane Projection Approximation coding mode is discussed.}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar to include wherein the generating the three-dimensional reduced resolution version of the frame comprises applying an octree-based compression as taught by Ainala because a) Ainala motivates using Octee compression for 3D point clouds due to its efficiency in the abstract and Section 1, He mentioned that Octree structures are used to deal with the irregular 3D geometry of the point cloud in section 2; and/or c) because doing so merely combines prior art elements according to known methods to yield predictable results.
Claim 21
In regards to claim 21, Varerkar is not relied upon to disclose but Ainala teaches wherein the generating the three-dimensional reduced resolution version of the frame comprises applying compression {see the coarse octree coded point cloud mentioned in the abstract which is also referred to and serves as a base layer to which an enhancement layer may be added.  See also Introduction Section 1; Fig. 1 (box 2 octree compression)}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar which clearly transmits the two-dimensional subframe and a three-dimensional reduced resolution version of the frame of the volumetric video wherein the generating the three-dimensional reduced resolution version of the frame comprises applying compression as taught by Ainala because the dual representation of 3D visual data with projections and octrees permits greater coding efficiency as explicitly motivated in Ainala’s abstract and Introduction section 1 and Section 3 on plane projection and/or because doing so merely combines prior art elements according to known methods to yield predictable results.


Claims 11, 14, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Varerkar and He as applied to claim 1 above, and further in view of Bouazizi (US 20190114830 A1).
Claim 11
In regards to claim 11, the base combination renders obvious the generation of a three-dimensional reduced resolution version of the frame but not selecting this reduced resolution version based upon a throughput between the client device and the processing system.
Bouazizi is a highly analogous reference teaching a method comprising: obtaining, by a processing system including at least one processor, a viewport of a client device for a volumetric video {see Figs. 1-4 illustrating a client-server architecture that includes processor(s) 340 at the client device 300 that obtains a viewport of the client device using sensors as described in paragraphs [0060], [0066]. For volumetric video see paragraphs [0026]-[0030]}; obtaining, by the processing system, a two-dimensional subframe of a frame of the volumetric video, the two-dimensional subframe associated with the viewport of the client device {see the 2D frames generated by the server 200 of the viewport from the point cloud which reduce the processing load of the client paragraphs [0046], [0062], [0066]-[0071], [0079]-[0080], [0083]-0088]}; and
transmitting, by the processing system to the client device, the two-dimensional subframe {see transmit and receive circuitry 220, 305, 310, 325, 315 in the figures and specification enabling transmissions/receptions between the client and server including the two-dimensional subframe of a frame of the volumetric video obtained in the obtaining step.  See also paragraphs [0008]-[0009], [0036]-[0039], [0062], [0068], and [0083]-0088]; claims 7, 19}.
Bouazizi also teaches wherein a three-dimensional reduced resolution version of the frame is selected based upon a throughput between the client device and the processing system {see paragraphs [0067]-[0071]}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified the base combination to include wherein a three-dimensional reduced resolution version of the frame is selected based upon a throughput between the client device and the processing system as taught by Bouazizi because doing so offers a flow control of the stream to maintain an appropriate bandwidth to thereby offer an improved streaming service to the user as explicitly motivated by Bouazizi in [0068].
Claim 14
In regards to claim 14, the base combination is not relied upon to disclose but Bouazizi teaches [the method of claim 1], where the step of generating the two-dimensional subframe comprises generating a plurality of two-dimensional subframes, wherein each of the plurality of two-dimensional subframes is generated from a respective time sequential frame of the volumetric video, the method further comprising:
applying an interframe encoding to the plurality of two-dimensional subframes {the 2-D frames of the time-sequential frames of the video are compressed as per paragraph [0046], [0062], [0066]-[0071], [0083]-0088] while also noting that paragraph [0075] mentions various compression formats while paragraph [0090] lists a variety of video coding standards for compressing the 2-D frames such as HEVC, SVC, AVC that apply interframe coding to videos having respective time sequential frames}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified the base combination and particularly Varerkar’s encoding logic 2109 to include where the step of obtaining the two-dimensional subframe comprises generating a plurality of two-dimensional subframes, wherein each of the plurality of two-dimensional subframes is generated from a respective time sequential frame of the volumetric video, the method further comprising: applying an interframe encoding to the plurality of two-dimensional subframes as taught by Bouazizi because doing so merely combines prior art elements according to known methods to yield predictable results.
Claim 24
The rejection of method claim 11 above applies mutatis mutandis to the corresponding limitations of server claim 24. Further in regards to claim 19’s additional limitations of a processor and a computer-readable medium storing instructions which, when executed by the processor, cause the processor to perform operations [of claim 1], see the citations above in claim 1 to Vererkar that include servers.  See also [0145]-[0148] of Varerkar discussing computer readable media and processor-based embodiments.


Claims 4, 15, 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Varerkar and He as applied to claim 1 above, and further in view of Mendhekar (US 20170103577 A1).
Claim 4
In regards to claim 4, Varerkar’s system include machine learning {see [0070], [0084], [0161], [0163]} but is not relied upon to disclose wherein the predicted viewport is predicted in accordance with a machine learning model.
Mendhekar is a highly analogous reference that optimizes 360 virtual reality streaming by predicting the viewport to reduce the bandwidth of 3D immersive content
for transmission to and processing by a thin client device.  See paragraphs [0014], [0026]-[0031], [0036]-[0040].  As to predicted viewport based upon prior viewpoints see paragraph [0034]}. In regards to claim 4, Mendhekar teaches “wherein the predicted viewport is predicted in accordance with a machine learning model” also see paragraph [0034]}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar’s predictive viewport to be predicted in accordance with a machine learning model as taught by Mendehekar because doing so optimizes the predictive model as explicitly motivated by Mendhekar in [0034] and/or because doing so merely combines prior art elements according to known methods to yield predictable results. 


Claim 15
In regards to claim 15, the base combination of Varerka and He renders obvious generating the two-dimensional subframe and the three-dimensional reduced resolution version of the frame (see above) but are not relied upon to disclose “caching, by the server, at least one of” these two data.
Mendhekar teaches caching these data (claim 15) {see paragraphs [0032]-[0035] directed to the cache}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified the base combination which renders obvious generating the two-dimensional subframe and the three-dimensional reduced resolution version of the frame of the predicted viewpoint, by caching these data (or obtaining these data from a cache) as taught by Mendhekar because caching reduces data access time by storing frequently used data (e.g. the current viewport data that has been transcoded to 2D subframe) in a local cache thereby improving throughput and avoiding re-computing of the viewport frames as explicitly motivated in paragraph [0033] of Mendhekar}.
Claims 22 and 23
The rejection of method claim 4 above applies mutatis mutandis to the corresponding limitations of server system claim 22 and non-transitory computer readable medium claim 23.


Claims 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Varerkar and He as applied to claim 1 above, and further in view of Flare {Feng Qian, "Flare: Practical Viewport-Adaptive 360-Degree Video Streaming for Mobile Devices," MobiCom’18, October 29-November 2,2018, New Delhi, India © 2018 Association for Computing Machinery. ACM ISBN 378-1-4503-5903-0/18/10., https://doi.org/10.1145/3241539.3241565}.
Claim 4
In regards to claim 4, Varerkar’s system include machine learning {see [0070], [0084], [0161], [0163]}  but is not relied upon to disclose wherein the predicted viewport is predicted in accordance with a machine learning model.
Flare is a highly analogous system and method for streaming 360 degree videos  using a predictive viewport.  See abstract and Introduction. Flare also teaches wherein the predicted viewport is predicted in accordance with a machine learning model {see Introduction section discussing performing viewport prediction (VP) using a wide spectrum of machine learning algorithms.  See also section 3.2 VP Method for Flare}.   
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar’s predictive viewport such that the predicted viewport is predicted in accordance with a machine learning model as taught by Flare because machine learning would increase the accuracy of the viewport prediction and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

Claim 5
In regards to claim 5, Varerkar’s system includes machine learning {see [0070], [0084], [0161], [0163]} but is not relied upon to disclose wherein the predicted viewport is predicted in accordance with the machine learning model based upon at least one viewpoint received from the client device.
Flare teaches wherein the predicted viewport is predicted in accordance with the machine learning model based upon at least one viewpoint received from the client device {see Introduction section discussing performing viewport prediction (VP) using a wide spectrum of machine learning algorithms.  See also section 3.2 VP Method for Flare}.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varerkar’s predictive viewport such that the predicted viewport is predicted in accordance with a machine learning model based upon at least one viewpoint received from the client device as taught by Flare because machine learning would increase the accuracy of the viewport prediction and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Varekar and He as applied to claim 20 above, and further in view of Moreno (Moreno, C., Chen, Y., Li, M.: A dynamic compression technique for streaming Kinect-based Point Cloud data. In: International Conference on Computing, NETWORKING and Communications. IEEE, pp. 550–555 (2017))
Claim 25
In regards to claim 25, Varerkar’s system discloses wherein the three-dimensional reduced resolution version of the frame is generated 
Moreno is an analogous reference from the same field of volumetric video compression and transmission. See abstract, Introduction discussing generating, compressing and transmitting point cloud data (PCD) and that current PCD compression techniques do not take into account the network status (static).
Moreno also teaches wherein the three-dimensional reduced resolution version of the frame is generated based upon a throughput between the client device and the server.  See abstract, Introduction and Section IV Dynamic Compression Method teaching a dynamic compression technique that adjusts to network conditions using dynamically measured network throughput to determine compression ratio to deliver a satisfactory QoS.  See also Section V Experiment Setup for hardware implementations.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to have modified Varekar’s static point cloud compression generating the three-dimensional reduced resolution version of the frame such that the generating is dynamic and based upon a throughput between the client device and the server as taught by Moreno because doing so ensure that a desired, satisfactory QoS is delivered despite network congestion or other problems causing reduced throughput as also specifically motivated by Moreno in Section VI(B) Discussion (the dynamic approach results in better overall performance than the static approach resulting in better performance overall because the compression ratio is adjusted to better fit with the constantly fluctuating network thereby enabling more consistent performance).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Araki (US 20220084282 A1) discloses a volumetric streaming method that closely parallels claim 1. See Figs. 14 and 20 copied below.

    PNG
    media_image4.png
    880
    575
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    718
    612
    media_image5.png
    Greyscale

Chu (US-10750139 B2) has extensive disclosure on viewport prediction for 360 video.   See Figs. 14 and 20.
Pio (US-20190200083 A1) discloses viewport prediction for 3D 360 degree videos.  See [0035], [0046]-[0059].
Lungarao (P. Lungaro, R. Sjöberg, A. J. F. Valero, A. Mittal and K. Tollmar, "Gaze-Aware Streaming Solutions for the Next Generation of Mobile VR Experiences," in IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 4, pp. 1535-1544, April 2018, doi: 10.1109/TVCG.2018.2794119.) teaches generating, by the server, a reduced resolution version of the frame of the 360 degree video, wherein the 360 degree video reduced resolution version of the frame of the video is of a reduced video quality compared with the frame of volumetric video, and wherein the reduced resolution version of the frame of the 360 degree video is generated without use of the predicted viewport. See 5.6 and Fig. 8 copied below.

    PNG
    media_image6.png
    336
    615
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    285
    569
    media_image7.png
    Greyscale

Hannuksela (WO 2020141260 A1) also teaches generating, by the server, a three-dimensional reduced resolution version of the frame of the volumetric video, wherein the three-dimensional reduced resolution version of the frame of the volumetric video is of a reduced video quality compared with the frame of volumetric video, and wherein the three-dimensional reduced resolution version of the frame of the volumetric video is generated without use of the predicted viewport.  See [0245]-[0247] and particularly [0255].

    PNG
    media_image8.png
    541
    894
    media_image8.png
    Greyscale


    PNG
    media_image9.png
    364
    955
    media_image9.png
    Greyscale

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL ROBERT CAMMARATA whose telephone number is (571)272-0113. The examiner can normally be reached M-Th 7am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie Atala can be reached on 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ROBERT CAMMARATA/           Primary Examiner, Art Unit 2486