Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This Office Action is in response to the application 17/061,406 filed on 10/01/2020.
Claims 1 – 20 have been examined and are pending in this application.

Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 4 – 6, 8, 11 – 13, 15 and 18 – 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhengyou ("Microsoft Kinect Sensor and Its Effect," in IEEE MultiMedia, vol. 19, no. 2, pp. 4-10, Feb. 2012, doi: 10.1109/MMUL.2012.24) in view of  Elangovan et al. (US 2015/0220158 A1) and further in view of Shanbhogue et al. (WO 2014/158508 A1). 

Regarding claim 1, Zhengyou discloses: “a method for processing holographic video, comprising:
capturing, using a plurality of cameras [see page:5; Figure 1b shows the arrangement of the infrared (IR) projector, the color camera, and the IR camera. The depth sensor consists of the IR projector combined with the IR camera, which is a monochrome complementary metaloxide semiconductor (CMOS) sensor], one or more video frames of an object in accordance with a synchronization check and an external parameter check [see page: 9; A capture device (one or multiple Kinect sensors) at each location captures users in 3D with high fidelity (in both geometry and appearance)]; and
reconstructing the one or more video frames [see page: 5; The relative geometry between the IR projector and the IR camera as well as the projected IR dot pattern are known. If we can match a dot observed in an image with a dot in the projector pattern, we can reconstruct it in 3D using triangulation. Because the dot pattern is relatively random, the matching between the IR image and the projector pattern can be done in a straightforward way by comparing small neighborhoods using, for example, normalized cross correlation] by performing at least following processes:
a segmentation process that produces a foreground mask for each video frame in the one or more video frames [see page: 7; Shotton’s team treats the segmentation of a depth image as a per-pixel classification task (no pairwise terms or conditional random field are necessary). Evaluating each pixel separately avoids a combinatorial search over the different body joints. For training data, we generate realistic synthetic depth images of humans of many shapes and sizes in highly varied poses sampled from a large motion-capture database];
a human body detection process that detects two or more body parts of a human pose captured in the one or more video frames  in accordance with one or more foreground masks produced in the segmentation process [see page: 6; Figure 5. Skeletal tracking. (a) Using a skeletal representation of various body parts, (b) Kinect uses per-pixel, body-part recognition as an intermediate step to avoid a combinatorial search over the different body joints. And see page: 6; col: 2; In skeletal tracking, a human body is represented
by a number of joints representing body parts such as head, neck, shoulders, and arms (see Figure 5a). Each joint is represented by its 3D coordinates. The goal is to determine all the 3D parameters of these joints in real time to allow fluent interactivity and with limited computation resources allocated on the Xbox 360 so as not to impact gaming performance. Rather than trying to determine directly the body pose in this high-dimensional space, Jamie Shotton and his team met the challenge by proposing per-pixel, body-part recognition as an intermediate step (see Figure 5b)] and 
a texturing process that produces a texture map [see page: 8; Figure 7. An example of
a human face captured by the Kinect sensor. (a) Video frame (texture), (b) depth
image, and (c) close up of the facial surface]; and 
a packaging process that compresses mesh data [see page: 8; Figure 8. Facial expression tracking. These sample images show the results of Kinect tracking 2D feature points in video frames using a projected face mesh overlay], the texture map [see page: 8; Figure 7. An example of a human face captured by the Kinect sensor. (a) Video frame (texture), (b) depth image, and (c) close up of the facial surface], and 
Zhengyou does not explicitly disclose: “a human pose recognition technique;
a tracking process executed in accordance with the human pose;

However, Elangovan teaches: “a human pose recognition technique [see para: 0025; Processing of all the data generated to accurately detect the pose of a portion of the human body in real-time and in 3D includes engineering desiderata of event stream interpretation and device power management];
a tracking process executed in accordance with the human pose [see para: 0025; Processing of all the data generated to accurately detect the pose of a portion of the human body in real-time and in 3D includes engineering desiderata of event stream interpretation and device power management, as well as usage of algorithms such as Kalman filtering, complementary filters and other conventional algorithms used to fuse the sensor data into coherent pose estimates. The filtering algorithms used are based on the locality of the sensor and factor in the human anatomy and the joint angles of the bones the sensors are tracking];
It would have been obvious to the person of ordinary skill in the art before the effective filing date of the claimed invention to modify the system disclosed by Zhengyou to add the teachings of Elangovan as above, in order to determine a human pose in the video frames and tracking human pose based on joint angles of the bones that are sensor tracking [Elangovan see para: 0025].
The combination is silent on “audio bit streams associated with the one or more video frames into a file having a predefined format for transmission to a user device;”
However, Shanbhogue teaches: “audio bit streams associated with the one or more video frames [see para: 0023; According to a third aspect of the disclosure, a second electronic device performing the various functional aspects related to audio capture and transmission includes: a microphone that captures surrounding audio] into a file having a predefined format for transmission to a user device [see para: 0105; As again illustrated by FIG. 19, this audio-context tag information can include (a) the identity of the person in the image who spoke (e.g. Krish), what that person said, presented in text format (e.g., "C'mon guys, Titanic isn't sinking. Smile please"), the overall timeline, e.g., t=-4 seconds to t=3 seconds, and specific time of the captured audio].
Therefore, It would have been obvious to the person of ordinary skill in the art before the effective filing date of the claimed invention was made to modify the combination to include the missing limitation as taught by Shanbhogue in order to identify and combine audio bit streams related to the one or more video frames into a file having a predefined format for transmission to a user [Shanbhogue see para: 0023 and 0105].

Regarding claim 4, Zhengyou, Elangovan and Shanbhogue disclose all the limitation of claim 1 and are analyzed as previously discussed with respect to that claim.
Furthermore, Zhengyou discloses: “wherein the human body detection process comprises:
detecting a human face in the one or more video frames using the human pose recognition technique [see page: 7; col: 2; Head-pose and facial-expression tracking has been an active research area in computer vision for several decades. It has many applications including human-computer interaction, performance-driven facial animation, and face recognition]; and
applying a human face improvement process after detecting the human face [see page: 8; col: 1; We developed a regularized maximum likelihood deformable model fitting (DMF) algorithm for 3D face tracking with Kinect.4 We use a linear deformable head model with a linear combination of a neutral face, a set of shape basis units with coefficients that represent a particular person and are static over time, and a set of action basis units with coefficients that represent a person’s facial expression and are dynamic overtime. Because a face cannot perform all facial expressions simultaneously, we believe in general the set of coefficients for the action basis units should be sparse, and thus we impose a L1 regularization].

Regarding claim 5, Zhengyou, Elangovan and Shanbhogue disclose all the limitation of claim 1 and are analyzed as previously discussed with respect to that claim.
Furthermore, Zhengyou discloses: “wherein the human body detection process comprises: 
detecting a predefined human body part in the one or more video frames using the human pose recognition technique [see page: 6; Figure 5. Skeletal tracking. (a) Using a skeletal representation of various body parts, (b) Kinect uses per-pixel, body-part recognition as an intermediate step to avoid a combinatorial search over the different body joints. And see page: 7; col: 1; For further speedup, the classifier can be run in parallel on each pixel on a graphics processing unit (GPU). Finally, spatial modes of the inferred per-pixel distributions are computed using mean shift resulting in the 3D joint proposals. An optimized implementation of our algorithm runs in under 5 ms per frame (200 frames per second) on the Xbox 360 GPU. It works frame by frame across dramatically differing body shapes and sizes, and the learned discriminative approach naturally handles self occlusions and poses cropped by the image frame].

Regarding claim 6, Zhengyou, Elangovan and Shanbhogue disclose all the limitation of claim 1 and are analyzed as previously discussed with respect to that claim.
Furthermore, Zhengyou discloses: “wherein the tracking process comprises a depth-depth tracking process, and/or a mesh-mesh tracking process [see page: 8; Figure 8. Facial expression tracking. These sample images show the results of Kinect tracking 2D feature points in video frames using a projected face mesh overlay].

Regarding claim 8 and 15, claim 8 and 15 is rejected under the same art and evidentiary limitations as determined for the method of claim 1.

Regarding claim 11 and 18, claim 11 and 18 is rejected under the same art and evidentiary limitations as determined for the method of claim 4.

Regarding claim 12, claim 12 is rejected under the same art and evidentiary limitations as determined for the method of claim 5.

Regarding claim 13 and 19, claim 13 and 19 is rejected under the same art and evidentiary limitations as determined for the method of claim 6.

Allowable Subject Matter
Claims 2, 3, 7, 9, 10, 14, 16, 17 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Wu et al (US 2016/0202756 A1)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Masum Billah whose telephone number is (571)270-0701. The examiner can normally be reached Mon - Friday 9 - 5 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie J. Atala can be reached on (571) 272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MASUM BILLAH/Primary Patent Examiner, Art Unit 2486