DETAILED ACTION
This office action is in response to the amendments filed on 07/06/2022.
Claim 12 is cancelled
Claims 1-11, 13-16 are presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see Remarks, filed 07/06/2022, with respect to the objection to claim 11 and the 35 USC 102 rejection of claim 12 have been fully considered and are persuasive.  The objection to claim 11 and the 35 USC 102 rejection of claim 12 has been withdrawn. 

Applicant’s arguments with respect to claim regarding 35 USC 103 rejections in Remarks filed 07/06/2022 on page 9-11 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant further argues in essence:
[a] “"wherein the analyzing is performed in parallel to receiving and/or processing a subsequent part of the video stream.” The cited references do not disclose such a configuration of an intermediary system forwarding the video stream to the receiver system, while in parallel receiving and/or processing a subsequent part of the video stream.”
In response to [a], while examiner relies upon a different combination of reference for the other newly added limitations, examiner maintains rejection for this limitation in view of Kennedy.
[0028] Because of the resource intensive nature of generating the tracking metadata, video analysis module 114 may require larger, more expensive and sophisticated computing devices than a typical user's set-top box or personal computer. A set-top box or personal computer may provide adequate resources for the image synthesis module 122. When video analysis module 114 preprocesses videos to determine metadata that can be used at a later time, the intensive computations are performed in advance by a computer with adequate resources and perhaps not in the video's real time. For example, when tracking is done when a video is played, the analysis cannot take longer than the length of the video. With preprocessing is done in advance, video analysis module 114 could, for example, take a minute to track an object when the video lasts only 30 seconds. By taking advantage of preprocessing performed in advance by video analysis module 114, image synthesis module 122 running on user's computing device 140 can insert visual element 124 in real time as the video is played. In an alternative, the video analysis may be done in real time as the video is played. Further, running image synthesis module 122 on user's computing device 140 allows visual element 124 to be customized to a particular user.
While this section discusses the advantages of pre processing the video stream prior to being requested by the user, an alternative method of analysis in real time is also provided, which would show that parallel to the receiving and processing of the live data stream, the analysis is performed.  This is further supported in para.0018  “In a further embodiment, visual elements may be inserted into video of motion pictures and live events.”
Therefore while the claims are rejected under a new combination of references, examiner maintains rejection in view of Kennedy for this limitation.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 13, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kennedy JR et al. (hereinafter Kennedy, US 2009/0259941 A1) in view of Mariadoss (US 2016/0080698 A1) further in view of Frueh et al. (hereinafter Frueh, “Headset Removal for Virtual and Mixed Reality”, NPL 2017 attached.) in view of Holmes (US 9,729,820 B1).
Regarding Claim 1, Kennedy discloses A processor system (Kennedy: Fig.2 Video processing server 110) configured for assisting a receiver system in processing video data which is streamed as a video stream to the receiver system via a network (Kennedy: para.0035 “As described above with reference to FIG. 1, video processing server 110 is configured to preprocess videos. Video processing server 110 may, for example, be configured to query video database 102 periodically for new videos. When video processing server 110 receives a new video 104, video processing server 110 uses video analysis module 114 to generate the metadata.” and further seen in Fig. 2, new video data is preprocessed at a server 110, and when viewer client 240 attempts to view the content, the preprocessed metadata is used, such as in para.0020), 
wherein the processing of the video data by the receiver system is dependent on an analysis of the video data (Kennedy: para.0022 “As is be described in more detail below with respect to FIG. 2, target data 112 may be defined by a user. Then, video analysis module 114 may be configured to track the object depicted in the portion of the frame defined by target data 112. In another embodiment, target data 112 may indicate that the target is a particular object or has a particular color. In that case, video analysis module 114 may be configured to track the object defined by target data 112. To track the object, video analysis module 114 may use known pattern recognition techniques, such as those employed by the image analyzer described in the '933 patent.” the content is processed by analyzing the video frames to track objects, this is further explained in para.0023.) to detect a position and orientation of the target in the video data (Kennedy:“Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.” para.0037 “For example, the metadata may define the four points of a quadrilateral corresponding to the target. In another example, the metadata could merely define the size, position and orientation of the target.”), 
the processor system (Kennedy: Fig.2 Video processing server 110) comprising: 
- a network interface (Kennedy: para.0052 “video processing server 110, … may be implemented on any computing device. Such computing device can include, but is not limited to, … An exemplary computing device is illustrated in FIG. 7, described below.” Fig.7 724 network interface, and associated para.0069 shows how the video processing server 110 is connected to other elements, therefore all communications to external devices to the server 110 is performed via this network interface) to the network (Kennedy: Fig.2 204); 
- a processor (Kennedy: para.0052 “video processing server 110, … may be implemented on any computing device. Such computing device can include, but is not limited to, … An exemplary computing device is illustrated in FIG. 7, described below.” Fig. 7 processor 704) configured to: 
- via the network interface (Kennedy: Fig.7 processor 704), receive the video stream (Kennedy: Fig.2 para.0022 “After video processing server 110 receives video 104, video analysis module 114 is configured to track a target in the video. The target is described in a target data 112. In an embodiment, target data 112 may define a portion of a frame, perhaps the first frame, in the video.” server 110 receives the video); 
- analyze the video data part to detect the position and orientation of the target in the video data part and to obtain an analysis result comprising the position and orientation of the target in the video data part (Kennedy: para.0023 “Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.” para.0037 “For example, the metadata may define the four points of a quadrilateral corresponding to the target. In another example, the metadata could merely define the size, position and orientation of the target.” the video is analyzed to generate position and appearance information of the target.), 
wherein the analyzing is performed in parallel to receiving and/or processing a subsequent part of the video stream (Kennedy: para.0028 “In an alternative, the video analysis may be done in real time as the video is played.” analysis is performed in real time as the video is being played by the viewer client, therefore as the video stream is being processed by the receiver to be played, the analysis is performed in parallel.); 
- generate processing assist data comprising the analysis result (Kennedy: para.0023 “Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.” the video is analyzed to generate position and appearance information of the target, and that information is compiled as tracking metadata 106) or a processing instruction derived from the analysis result; 
- via the network interface, provide the processing assist data to the receiver system to enable the receiver system to process the video data using the analysis result (Kennedy: para.0023 “Once video analysis module 114 tracks the target, video processing server may store tracking metadata 106 in metadata database 108 for later use.” metadata is stored in metadata database 108, the data is accessible by the viewer client, it can be seen in para.0043 “In another embodiment, when video player 120 requests a video from video sharing server 210, video player 120 makes a separate request to metadata server 260 for the metadata corresponding to the video. For example, metadata server may send the metadata in XML format. In this embodiment, video player 120 may receive the video and the metadata from different servers and may have to assemble them to synchronize the metadata with the video.” the metadata server 250 accesses the metadata database 108, and sends corresponding metadata to the client device.) or the processing instruction provided by the processing assist data.
However Kennedy does not explicitly disclose wherein the processing of the video data by the receiver system comprises a Head Mounted Display (HMD) removal, to detect a position and orientation of the HMD in the video data, - via the network interface, forward the video stream to the receiver system; decode at least part of the video stream to obtain a decoded video data part; and analyze the decoded video data part to obtain an analysis result, detect the position and orientation of the HMD in the decoded video data part; the position and orientation of the HMD in the decoded video data part; the receiver system to process the video data comprising the HMD removal. In other words, Kennedy does not explicitly show that the video is first decoded prior to being analyzed, the steps do not track an HMD and its removal, and the entity that processes the video is the same entity that sends the video to the receiver.
Mariadoss discloses decode at least part of the video stream to obtain a decoded video data part (Mariadoss: para.0038 “In step 310, the raw video stream from the IP camera can be received. In step 315, the raw video stream can be decoded and processed. “ the video stream is obtained and decoded. ); and 
analyze the decoded video data part to obtain an analysis result (Mariadoss: para.0038 “In step 320, real-time analytics can be performed on the video stream based on one or more processing criteria and/or user profile settings. Criteria/settings can include, face recognition, path tracking, object tracking, motion detection, and the like.” the decoded video is analyzed for various metrics such as object tracking.).
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy and Mariadoss in order to incorporate decode at least part of the video stream to obtain a decoded video data part; and analyze the decoded video data part to obtain an analysis result.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of being of sending compressed data, such as receiving encoded data and decoding prior to processing, brings improved bandwidth, speed and less congestion (Mariadoss: para.0027, para.0038).
However Kennedy-Mariadoss does not explicitly disclose wherein the processing of the video data by the receiver system comprises a Head Mounted Display (HMD) removal, to detect a position and orientation of the HMD in the video data, - via the network interface, forward the video stream to the receiver system; detect the position and orientation of the HMD in the decoded video data part; the position and orientation of the HMD in the decoded video data part; the receiver system to process the video data comprising the HMD removal. 
Frueh discloses wherein the processing of the video data comprises a Head Mounted Display (HMD) removal (Frueh: Fig. 3 headset removal, pg. 1, Approach “We enhance Mixed Reality by augmenting it with our headset removal technique that creates an illusion of revealing the user’s face (Figure 1).”), 
to detect a position and orientation of the HMD in the video data, detect the position and orientation of the HMD in the video data part, the position and orientation of the HMD in the video data part (Fueh: pg. 1 section 1.2 “: first estimating the camera intrinsics like field-of-view, and then computing the extrinsic transformation between the camera and VR controllers. We simplify the process by adding a marker to the front of the headset, which allows computing the calibration parameters automatically from game play data—the marker is removed virtually during the rendering phase by inpainting it from surrounding headset pixels. Face alignment: To render the virtual face, we need to align the 3D face model with the visible portion of the face in the camera stream, so that they blend seamlessly with each other. A reasonable proxy to this alignment is to position the face model just behind the headset, where the user’s face rests during the VR session. This positioning is estimated based on the geometry and coordinate system of the headset. e calibration computed above is theoretically sufficient to track the headset in the camera view, but in practice there may be errors due to drift or jitter in the Vive tracking. Hence, we further refine the tracking (continuously in every frame) by rendering a virtual model of the headset from the camera viewpoint, and using silhouette matching to align it with the camera frame.” position and orientation is tracked in video data, this is also assisted using QR code as seen in Fig. 3.  It can be seen that orientation is further tracked as the face is lined up based on orientation, described in section 1.3.); 
to process the video data comprising the HMD removal (Frueh: section 1.3 “Translucent rendering: Humans have high perceptual sensitivity to faces, and even small imperfections in synthesized faces can feel unnatural and distracting, a phenomenon known as the uncanny valley. To mitigate this problem, instead of removing the headset completely, we choose a user experience that conveys a ‘scuba mask effect’ by compositing the color-corrected face proxy with a translucent headset. Reminding the viewer of the presence of the headset helps avoid the uncanny valley and also makes our algorithms robust to small errors in misalignment and color correction.” the HMD is removed and replaced with user face.). 
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy-Mariadoss with Frueh in order to incorporate wherein the processing of the video data comprises a Head Mounted Display (HMD) removal, to detect a position and orientation of the HMD in the video data, detect the position and orientation of the HMD in the video data part, the position and orientation of the HMD in the video data part, to process the video data comprising the HMD removal, and apply this ideas to processing performed by the receiver device and analysis by the server of the decoded video stream. Both Kennedy and Frueh are in the field of video augmentation using tracked data in para.0046-49 show content placement based on tracked objects, and Frueh tracks head mounted displays for inclusion of a face.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of improved viewer experience of content by providing a more personal video (Frueh: Abstract).
However Kennedy-Mariadoss-Frueh does not explicitly disclose via the network interface, forward the video stream to the receiver system, in that the same device that provides the analysis is not the same device that provides the video stream.
Holmes discloses via the network interface, forward the video stream to the receiver system (Holmes: col.8 lines 40-63 “FIG. 3 is a flowchart illustrating a computer-implemented method of superimposing video 210 carried out by a pro cessor 31. As shown in FIG. 3, the method of superimposing video 210 carried out by a processor 31 begins with the processor 31, at a first step 240 receiving a first live video 212 from a first user's device 20. Reception 240 by a processor 31 is illustrated in FIG. 4, wherein the user device 20 of a first user transmits a first live video 212 (in this case a video 210 captured by the user's rear camera 119) to a processor 31 containing central server 30. The second step 242 of superimposing video 210 carried out by a processor 31 is receiving a second live video 214 from a second user's device 20. Again referring to FIG. 4, reception of the second live video 214 from a second user's device 20 by a processor 31 is illustrated (with the second live video 214 being captured by the second user's rear camera 119). The third step 244 of this method calls for the processor 31 to identify a first human element 216 in the first video 212 and a second human element 218 in a second video 214. Such human elements 216, 218 are illustrated in FIG. 4 with the first human element 216 being a hand (captured by the first user's rear camera 119) and the second human element 218 being a face (captured by the second users front camera 118).” col.9 55-60 “The final step 248 of the computer-implemented method of superimposing video 210 carried out by a processor 31 is transmitting the superimposed video 310 to a user device 20.” Fig.3 and Fig. 8A- 8B.  It can be seen that video data from each device is obtained, augmented and transmitted to the users, for example in the first step performed  by server 30 in Fig. 8B, extraneous elements are removed from the video, processed, and the video stream is provided to the receivers.).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kennedy-Mariadoss-Frueh with Holmes in order to incorporate via the network interface, forward the video stream to the receiver system, and replace the separate video Sharing server 210 and Video processing server 110 of Kennedy, with a singular server that provides both analysis metadata and the video content of Holmes. Kennedy-Maridoss as combined with Frueh discloses the removal of Head mounted displays in virtual content viewed by others in abstract of Frueh, and Holmes is in analogous field as it deals with augmenting virtual conference video data and providing the videos to each user.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of both providing augmented content to users, which improves user experience, as well as the expected benefit of fewer servers/entities would provide a simpler system for sharing video content (Holmes: col.8 lines 40-63, col.1 background of the invention.).

Regarding Claim 2, Kennedy-Mariadoss-Frueh-Holmes discloses claim 1 as set forth above.
Kennedy further discloses wherein the processor is configured to analyze the video data part by at least one of the group of: 
- a segmentation technique, whereby the analysis result comprises a segmentation of an object in the decoded video data part (Kennedy: para.0040 “In another embodiment, the metadata could include a mask that defines whether the target is in the foreground or background on a pixel-by-pixel basis, similar to an alpha channel”); 
- an object tracking technique, whereby the analysis result comprises a position of an object in the video data part (Kennedy: para.0023 “Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106. A” the position of the target is tracked in each frame.); and 
- a calibration technique, whereby the analysis result comprises a calibration parameter used in the processing of the video data.
However Kennedy does not explicitly disclose wherein the processor is configured to analyze the decoded video data part by at least one of the group of: - a segmentation technique, whereby the analysis result comprises a segmentation of an object in the decoded video data part; - an object tracking technique, whereby the analysis result comprises a position of an object in the decoded video data part; and - a calibration technique, whereby the analysis result comprises a calibration parameter used in the processing of the video data.
Mariadoss discloses wherein the processor is configured to analyze the decoded video data part by at least one of the group of: 
- a segmentation technique, whereby the analysis result comprises a segmentation of an object in the decoded video data part; 
- an object tracking technique, whereby the analysis result comprises a position of an object in the decoded video data part (Mariadoss: para.0038 “In step 310, the raw video stream from the IP camera can be received. In step 315, the raw video stream can be decoded and processed. “ the video stream is obtained and decoded. para.0038 “In step 320, real-time analytics can be performed on the video stream based on one or more processing criteria and/or user profile settings. Criteria/settings can include, face recognition, path tracking, object tracking, motion detection, and the like.” the decoded video is analyzed for various metrics such as object tracking.); and 
- a calibration technique, whereby the analysis result comprises a calibration parameter used in the processing of the video data.
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy and Mariadoss in order to incorporate wherein the processor is configured to analyze the decoded video data part by at least one of the group of: - a segmentation technique, whereby the analysis result comprises a segmentation of an object in the decoded video data part; - an object tracking technique, whereby the analysis result comprises a position of an object in the decoded video data part; and - a calibration technique, whereby the analysis result comprises a calibration parameter used in the processing of the video data.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of being of sending compressed data, such as receiving encoded data and decoding prior to processing, brings improved bandwidth, speed and less congestion (Mariadoss: para.0027, para.0038).

Regarding Claim 3, Kennedy-Mariadoss-Frueh-Holmes discloses claim 1 as set forth above.
Kennedy further discloses wherein the processing of the video data by the receiver system comprises compositing an object (Kennedy: para.0020 visual element) into the video data (Kennedy: para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video. Each of the components of system 100 is described in more detail below.” receiving device uses the metadata to insert a visual element into the video), and wherein the processor is configured to: -3- 3380798.v1Attorney's Docket No.: 4965.1117-001 
- via the network interface, provide object data to the receiver system, the object data defining at least part of the object (Kennedy: para.0047 “The metadata may specify the advertisement, e.g. URL defining a particular advertising server 250 and a particular visual element. Advertisement 252 may be customized for a user based on, for example, a profile, as would be known to those skilled in the art given this description. Image synthesis module 122 may insert advertisement 252 into the video as defined by the metadata.” the metadata can consist of the object itself, the visual element along side a url for the advertisement associated with the visual element..); 
- analyze the video data part to determine, as the analysis result to be included in the processing assist data, a characteristic of said composition of the object into the video data, such as a position and/or orientation of the object (Kennedy: para.0037 “In different embodiments, the metadata can define the appearance of the target in different ways. In an embodiment, the metadata describes a shape into which visual elements would be inserted. For example, the metadata may define the four points of a quadrilateral corresponding to the target. In another example, the metadata could merely define the size, position and orientation of the target. In another embodiment, the metadata could define a camera model according to the camera movements. The camera model may define how the camera moves relative to the target at each frame of the video. For example, if the target gets larger, the metadata may indicate that the camera is zooming in. In another example, if the target is moving to the left of a frame, the metadata may indicate that the camera is turning right.” the analysis provides information such as size position and orientation of the target, which provides information of the maximum shape in which visual elements can be inserted.).

Regarding Claim 4, Kennedy-Mariadoss-Frueh-Holmes discloses claim 1 as set forth above.
Kennedy further discloses wherein the processor is configured to include timing information in the processing assist data, the timing information being indicative of the part of the video stream or the decoded video data part from which the processing assist data was generated (Kennedy: para.0044 “Video player 120 may assemble the videos in several different ways. Video player 120 may use the timestamp of each frame to correlate the frame with its associated metadata. For example, if the frame is played 12.43 seconds into the video, video player 120 may find the portion of metadata for the frame at that time. Similarly, the frames may be numbered and the corresponding portion of metadata may be recalled based on that number. For example, the 37th frame may have a corresponding portion of metadata keyed off the number 37.” metadata for a frame is associated with a timestamp and/or sequence number for recalling by the client device.).

Regarding Claim 5, Kennedy-Mariadoss-Frueh-Holmes discloses claim 4 as set forth above.
Kennedy further discloses wherein the timing information comprises at least one of the group of: - a sequence number; and - a content timestamp (Kennedy: para.0044 “Video player 120 may assemble the videos in several different ways. Video player 120 may use the timestamp of each frame to correlate the frame with its associated metadata. For example, if the frame is played 12.43 seconds into the video, video player 120 may find the portion of metadata for the frame at that time. Similarly, the frames may be numbered and the corresponding portion of metadata may be recalled based on that number. For example, the 37th frame may have a corresponding portion of metadata keyed off the number 37.” both a timestamp and a sequence number is disclosed by Kennedy.).

Regarding Claim 6, Kennedy-Mariadoss-Frueh-Holmes discloses claim 1 as set forth above.
Kennedy further discloses wherein the processor is configured to:
 - sequentially analyze, and generate processing assist data for, individual ones of the video data parts to obtain a series of processing assist data (Kennedy: para.0023 “Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.” the video is analyzed to generate position and appearance information of the target in a series of frames to track the objects position.); and 
– provide the series of processing assist data to the receiver system as a processing assist data stream (Kennedy: para.0023 “Once video analysis module 114 tracks the target, video processing server may store tracking metadata 106 in metadata database 108 for later use.” metadata is stored in metadata database 108, the data is accessible by the viewer client, it can be seen in para.0043 “In another embodiment, when video player 120 requests a video from video sharing server 210, video player 120 makes a separate request to metadata server 260 for the metadata corresponding to the video. For example, metadata server may send the metadata in XML format. In this embodiment, video player 120 may receive the video and the metadata from different servers and may have to assemble them to synchronize the metadata with the video.” the metadata server 250 accesses the metadata database 108, and sends corresponding metadata to the client device.).
However Kennedy does not explicitly disclose - sequentially decode the video stream to obtain a series of decoded video data parts;  - sequentially analyze, and generate processing assist data for, individual ones of the decoded video data parts to obtain a series of processing assist data.
Mariadoss discloses - sequentially decode the video stream to obtain a series of decoded video data parts (Mariadoss: para.0038 “In step 310, the raw video stream from the IP camera can be received. In step 315, the raw video stream can be decoded and processed. “ the video stream is obtained and decoded. as the entire stream is decoded, this would result in a series of decoded data parts.);  
- sequentially analyze, and generate processing assist data for, individual ones of the decoded video data parts to obtain a series of processing assist data (Mariadoss: para.0032 “Analytics engine 246 can perform one or more examinations of video stream 216. In one embodiment, one or more frames of video stream 216 can be analyzed based on one or more specialized algorithms. “ para.0038 “In step 320, real-time analytics can be performed on the video stream based on one or more processing criteria and/or user profile settings. Criteria/settings can include, face recognition, path tracking, object tracking, motion detection, and the like.” the decoded video is analyzed for various metrics such as object tracking. These types of tracking are performed over a series of frames.).
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy and Mariadoss in order to incorporate - sequentially decode the video stream to obtain a series of decoded video data parts;  - sequentially analyze, and generate processing assist data for, individual ones of the decoded video data parts to obtain a series of processing assist data.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of being of sending compressed data, such as receiving encoded data and decoding prior to processing, brings improved bandwidth, speed and less congestion (Mariadoss: para.0027, para.0038).

Regarding Claim 7, Kennedy-Mariadoss-Frueh-Holmes discloses claim 1 as set forth above.
Kennedy further discloses herein the processor is configured to, via the network interface (Kennedy: para.0052 “video processing server 110, … may be implemented on any computing device. Such computing device can include, but is not limited to, … An exemplary computing device is illustrated in FIG. 7, described below.” Fig.7 724 network interface, and associated para.0069 shows how the video processing server 110 is connected to other elements, therefore all communications to external devices to the server 110 is performed via this network interface), 
receive the video stream from a stream source in the network (Kennedy: Fig.2 para.0022 “After video processing server 110 receives video 104, video analysis module 114 is configured to track a target in the video. The target is described in a target data 112. In an embodiment, target data 112 may define a portion of a frame, perhaps the first frame, in the video.” server 110 receives the video) 
and to forward the video stream to the receiver system (Kennedy: para.0023 “Once video analysis module 114 tracks the target, video processing server may store tracking metadata 106 in metadata database 108 for later use.” metadata is stored in metadata database 108, the data is accessible by the viewer client, it can be seen in para.0043 “In another embodiment, when video player 120 requests a video from video sharing server 210, video player 120 makes a separate request to metadata server 260 for the metadata corresponding to the video. For example, metadata server may send the metadata in XML format. In this embodiment, video player 120 may receive the video and the metadata from different servers and may have to assemble them to synchronize the metadata with the video.” the metadata server 250 accesses the metadata database 108, and sends corresponding metadata to the client device.).

Regarding Claim 13, it lists all of the same elements as claim 1, but in A computer-implemented method for assisting a receiver system in processing video data which is streamed as a video stream to the receiver system via a network, (Kennedy: para.0035 “As described above with reference to FIG. 1, video processing server 110 is configured to preprocess videos. Video processing server 110 may, for example, be configured to query video database 102 periodically for new videos. When video processing server 110 receives a new video 104, video processing server 110 uses video analysis module 114 to generate the metadata.” para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video” para.0042  “When the user selects a video, viewer client 240 sends a request for the video to video sharing server 210. Video provider module 214 streams the video to video player 120. I” Therefore it can be seen that the receiver receives the video and processes the video based on analysis data from a server that preprocessed the video to generate metadata.) rather than system form.  Therefore the supporting rationale of the rejection to claim 1 applies equally as well to claim 13.

Regarding Claim 15, Kennedy-Mariadoss-Frueh-Holmes teaches claim 13 as set forth above
Kennedy further discloses a non-transitory computer-readable medium comprising a computer program, the computer program comprising instructions for causing a processor system to perform the method (Kennedy: para.0052 “video processing server 110, … may be implemented on any computing device. Such computing device can include, but is not limited to, … An exemplary computing device is illustrated in FIG. 7, described below.” para.0067 “As will be appreciated, the removable storage unit 718 includes a computer usable storage medium having stored therein computer software and/or data.” ) according to claim 13.

Claims 8-10, 14, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kennedy JR et al. (hereinafter Kennedy, US 2009/0259941 A1) in view of Hord et al. (hereinafter Hord, US 2011/0016487 A1) further in view of Frueh et al. (hereinafter Frueh, “Headset Removal for Virtual and Mixed Reality”, NPL 2017 attached.) in view of Holmes (US 9,729,820 B1) .

Regarding Claim 8, Kennedy discloses A processor system configured for processing video data which is received as a video stream via a network (Kennedy: Fig. 2 Viewer Client 240, para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video” it can be seen in fig.2 that viewer client 240 contains module 122, and in para.0042 it can be seen that the client receives the video stream from video sharing server 210 and processes it using meta data from metadata database 108., para.0042  “When the user selects a video, viewer client 240 sends a request for the video to video sharing server 210. Video provider module 214 streams the video to video player 120. I”), 
the processor system comprising:
- a network interface to the network (Kennedy: para.0069 “Computing device 700 may also include a communications interface 724. Communications interface 724 allows software and data to be transferred between computing device 700 and external devices. ”); 
- a processor (Kennedy: para.0065 “FIG. 7 is a diagram illustrating an example computing device which may be used in embodiments of this invention. The computing device 700 includes one or more processors, such as processor 704.”) configured to: 
- via the network interface, receive the video stream (Kennedy: para.0062 “At step 404, video and corresponding metadata are received. ” para.0042  “When the user selects a video, viewer client 240 sends a request for the video to video sharing server 210. Video provider module 214 streams the video to video player 120.” the client 240, as seen in Fig.2, receives the video stream via network interface 724, as seen in Fig. 7.); -4- 3380798.v1Attorney's Docket No.: 4965.1117-001 
wherein the processor is further configured to: 
- via the network interface, receive processing assist data (Kennedy: para.0020, para.0044 metadata) comprising an analysis result of an analysis (Kennedy: para.0023 “Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.”) of at least the part of the video data (Kennedy: para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video. ”, para.0044 “Video player 120 may use the timestamp of each frame to correlate the frame with its associated metadata. For example, if the frame is played 12.43 seconds into the video, video player 120 may find the portion of metadata for the frame at that time.” the module 122, that exists in Viewer client 240 as seen in Fig.2, receives the metadata for the particular portions of the video.), or a processing instruction derived from the analysis result; and 
wherein the analysis result comprises a position and orientation of the target in the at least the part of the video data (Kennedy: para.0023 “Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.” para.0037 “For example, the metadata may define the four points of a quadrilateral corresponding to the target. In another example, the metadata could merely define the size, position and orientation of the target.” the video is analyzed to generate position and appearance information of the target.), and 
wherein the processing assist data is received from the intermediary system in parallel to receiving and/or decoding a subsequent part of the part of the video data (Kennedy: para.0028 “In an alternative, the video analysis may be done in real time as the video is played.” analysis is performed in real time as the video is being played by the viewer client, therefore as the video stream is being processed by the receiver to be played, the analysis is performed in parallel.); and 
process the video data to obtain processed video data comprising the target from the video data using the analysis result (Kennedy: para.0027 “Using the metadata for the video, image synthesis module 122 is configured to place a visual element 124 in the video. Visual element 124 may be an advertisement. Visual element 124 may have a variety of formats. For example, visual element 124 may be text, a graphic, or even a video to create a video-in-video effect. Image synthesis module 122 may be configured to adjust visual element 124 according to geometry information in metadata for the frame being processed.” para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video. ” the video is modified using the received metadata.) or the processing instruction provided by the processing assist data.
However Kennedy does not explicitly disclose, wherein the processing comprises a Head Mounted Display (HMD) removal, via the network interface, receive the video stream from an intermediary system,  decode the video stream to obtain the video data, a position and orientation of the HMD in the at least the part of the video data, process the video data to obtain processed video data comprising the HMD removal from the video data using the analysis result or the processing instruction provided by the processing assist data.
Hord discloses decode the video stream to obtain the video data (Hord: para.0034 “The signal processing system 314 outputs packetized compressed streams and presents them as input for storage in the storage device 373 via an interface 375, or in other implementations, as input to the media engine 322 for decompression by a video decompression engine 323 (or video decoder) and an audio decompression engine 325 (or audio decoder), in cooperation with media memory 324, for display on the TV 341 via the output system 348.” claim 11 “The method of claim 1, further including the step of inserting at a media client device the advertisement in a decoded video stream” it can be seen in Fig.3 that after receiving the compressed stream, it is decoded and processed prior to placing advertisements in the content in para.0035.)
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy and Hord in order to incorporate decode the video stream to obtain the video data.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of being of sending compressed data, such as receiving encoded data and decoding prior to processing, brings improved bandwidth, speed and less congestion (Hord: para.0034-para.0035).
However Kennedy-Hord does not explicitly disclose wherein the processing comprises a Head Mounted Display (HMD) removal, via the network interface, receive the video stream from an intermediary system, a position and orientation of the HMD in the at least the part of the video data, process the video data to obtain processed video data comprising the HMD removal from the video data using the analysis result or the processing instruction provided by the processing assist data.
Frueh discloses wherein the processing comprises a Head Mounted Display (HMD) removal (Frueh: Fig. 3 headset removal, pg. 1, Approach “We enhance Mixed Reality by augmenting it with our headset removal technique that creates an illusion of revealing the user’s face (Figure 1).”), 
a position and orientation of the HMD in the at least the part of the video data (Fueh: pg. 1 section 1.2 “: first estimating the camera intrinsics like field-of-view, and then computing the extrinsic transformation between the camera and VR controllers. We simplify the process by adding a marker to the front of the headset, which allows computing the calibration parameters automatically from game play data—the marker is removed virtually during the rendering phase by inpainting it from surrounding headset pixels. Face alignment: To render the virtual face, we need to align the 3D face model with the visible portion of the face in the camera stream, so that they blend seamlessly with each other. A reasonable proxy to this alignment is to position the face model just behind the headset, where the user’s face rests during the VR session. This positioning is estimated based on the geometry and coordinate system of the headset. e calibration computed above is theoretically sufficient to track the headset in the camera view, but in practice there may be errors due to drift or jitter in the Vive tracking. Hence, we further refine the tracking (continuously in every frame) by rendering a virtual model of the headset from the camera viewpoint, and using silhouette matching to align it with the camera frame.” position and orientation is tracked in video data, this is also assisted using QR code as seen in Fig. 3.  It can be seen that orientation is further tracked as the face is lined up based on orientation, described in section 1.3.), 
process the video data to obtain processed video data comprising the HMD removal from the video data using the analysis result (Frueh: section 1.3 “Translucent rendering: Humans have high perceptual sensitivity to faces, and even small imperfections in synthesized faces can feel unnatural and distracting, a phenomenon known as the uncanny valley. To mitigate this problem, instead of removing the headset completely, we choose a user experience that conveys a ‘scuba mask effect’ by compositing the color-corrected face proxy with a translucent headset. Reminding the viewer of the presence of the headset helps avoid the uncanny valley and also makes our algorithms robust to small errors in misalignment and color correction.” the HMD is removed and replaced with user face using the positional information from section 1.2 and 1.3 cited above.) or the processing instruction provided by the processing assist data.
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy-Hord with Frueh in order to incorporate wherein the processing comprises a Head Mounted Display (HMD) removal, via the network interface, a position and orientation of the HMD in the at least the part of the video data, process the video data to obtain processed video data comprising the HMD removal from the video data using the analysis result or the processing instruction provided by the processing assist data, and apply this ideas to processing performed by the receiver device and analysis by the server of the decoded video stream. Both Kennedy and Frueh are in the field of video augmentation using tracked data in para.0046-49 show content placement based on tracked objects, and Frueh tracks head mounted displays for inclusion of a face.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of improved viewer experience of content by providing a more personal video (Frueh: Abstract).
However Kennedy-Hord-Frueh does not explicitly disclose receive the video stream from an intermediary system, in that the same device that provides the analysis is not the same device that provides the video stream.
Holmes discloses receive the video stream from an intermediary system (Holmes: col.8 lines 40-63 “FIG. 3 is a flowchart illustrating a computer-implemented method of superimposing video 210 carried out by a pro cessor 31. As shown in FIG. 3, the method of superimposing video 210 carried out by a processor 31 begins with the processor 31, at a first step 240 receiving a first live video 212 from a first user's device 20. Reception 240 by a processor 31 is illustrated in FIG. 4, wherein the user device 20 of a first user transmits a first live video 212 (in this case a video 210 captured by the user's rear camera 119) to a processor 31 containing central server 30. The second step 242 of superimposing video 210 carried out by a processor 31 is receiving a second live video 214 from a second user's device 20. Again referring to FIG. 4, reception of the second live video 214 from a second user's device 20 by a processor 31 is illustrated (with the second live video 214 being captured by the second user's rear camera 119). The third step 244 of this method calls for the processor 31 to identify a first human element 216 in the first video 212 and a second human element 218 in a second video 214. Such human elements 216, 218 are illustrated in FIG. 4 with the first human element 216 being a hand (captured by the first user's rear camera 119) and the second human element 218 being a face (captured by the second users front camera 118).” col.9 55-60 “The final step 248 of the computer-implemented method of superimposing video 210 carried out by a processor 31 is transmitting the superimposed video 310 to a user device 20.” Fig.3 and Fig. 8A- 8B.  It can be seen that video data from each device is obtained, augmented and transmitted to the users, for example in the first step performed  by server 30 in Fig. 8B, extraneous elements are removed from the video, processed, and the video stream is provided to the receivers.).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kennedy-Hord-Frueh with Holmes in order to incorporate receive the video stream from an intermediary system, and replace the separate video Sharing server 210 and Video processing server 110 of Kennedy, with a singular server that provides both analysis metadata and the video content of Holmes. Kennedy-Maridoss as combined with Frueh discloses the removal of Head mounted displays in virtual content viewed by others in abstract of Frueh, and Holmes is in analogous field as it deals with augmenting virtual conference video data and providing the videos to each user.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of both providing augmented content to users, which improves user experience, as well as the expected benefit of fewer servers/entities would provide a simpler system for sharing video content (Holmes: col.8 lines 40-63, col.1 background of the invention.).

Regarding Claim 9, Kennedy-Hord-Frueh-Holmes discloses claim 8 as set forth above.
Kennedy further discloses wherein the processing assist data comprises a segmentation of an object in the part of the video data (Examiner notes that in the bottom of specification pg. 4, segmentation of an object merely means data that differentiates the object from the rest of the frame, an example being a mask that shows where the object is in the frame.  Kennedy: para.0023 “When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106. As an example, to record the position of the target, tracking metadata 106 may include positional data for the center of the target in each frame. To record the appearance of the target, tracking metadata 106 may include information describing the geometry (e.g., size and orientation) and occlusion of the target.” details regarding the position, size and orientation of the object is included in the geometry, as well as occlusion of the object, such as then the object is blocked by something else.  para.0038 “For example, if the target is partially occluded by an individual's head, then the metadata may define the outline of the head. ” para.0040 “In another embodiment, the metadata could include a mask that defines whether the target is in the foreground or background on a pixel-by-pixel basis, similar to an alpha channel” All of this information included in the metadata provides segmentation type information that would differentiate the object from other parts of the frame.), and 
wherein the processor is configured to use the segmentation of the object for processing video data of the object or video data outside of the object (Kennedy: para.0027 “Image synthesis module 122 may be configured to adjust visual element 124 according to geometry information in metadata for the frame being processed. Further, image synthesis module 122 may be configured to block out a portion of visual element 124 according to occlusion information in the metadata for the frame being processed. In each frame of the video where the target is present, image synthesis module 122 inserts the modified visual element 124.” para.0048 “ In examples, if the metadata defines a camera model, image synthesis module 122 may distort the advertisement according to how the camera moves.” this information is used to modify the video frame being processed using the tracking information in the metadata.).

Regarding Claim 10, Kennedy-Hord-Frueh-Holmes discloses claim 8 as set forth above.
Kennedy further discloses wherein the processing assist data comprises timing information, the timing information being indicative of the part of the video stream or the decoded video data part from which the processing assist data was generated (Kennedy: para.0044 “Video player 120 may assemble the videos in several different ways. Video player 120 may use the timestamp of each frame to correlate the frame with its associated metadata. For example, if the frame is played 12.43 seconds into the video, video player 120 may find the portion of metadata for the frame at that time. Similarly, the frames may be numbered and the corresponding portion of metadata may be recalled based on that number. For example, the 37th frame may have a corresponding portion of metadata keyed off the number 37.” both a timestamp and a sequence number is disclosed by Kennedy that is associated with the metadata such that is can be recalled using timestamp information.), and 
wherein the processor is configured to identify the part of the video stream or the decoded video data part on the basis of the timing information (Kennedy: para.0044 “Video player 120 may assemble the videos in several different ways. Video player 120 may use the timestamp of each frame to correlate the frame with its associated metadata. For example, if the frame is played 12.43 seconds into the video, video player 120 may find the portion of metadata for the frame at that time. Similarly, the frames may be numbered and the corresponding portion of metadata may be recalled based on that number. For example, the 37th frame may have a corresponding portion of metadata keyed off the number 37.” when using the metadata to modify the video stream, metadata for particular portions of the video are obtained using timestamps or sequence numbers) and 
to use the analysis result or the processing instruction provided by the processing assist data specifically for the processing of said part (Kennedy: para.0044 “Video player 120 may assemble the videos in several different ways. Video player 120 may use the timestamp of each frame to correlate the frame with its associated metadata. For example, if the frame is played 12.43 seconds into the video, video player 120 may find the portion of metadata for the frame at that time. Similarly, the frames may be numbered and the corresponding portion of metadata may be recalled based on that number. For example, the 37th frame may have a corresponding portion of metadata keyed off the number 37.” para.0043 “In another embodiment, when video player 120 requests a video from video sharing server 210, video player 120 makes a separate request to metadata server 260 for the metadata corresponding to the video. For example, metadata server may send the metadata in XML format. In this embodiment, video player 120 may receive the video and the metadata from different servers and may have to assemble them to synchronize the metadata with the video.”  when using the metadata to modify the video stream, metadata for particular portions of the video are obtained using timestamps or sequence numbers and used to modify the particular frames associated with the metadata.).

Regarding Claim 14, it lists all of the same elements as claim 8 but in A computer-implemented method for processing video data which is received as a video stream via a network (Kennedy: Fig. 2 Viewer Client 240, para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video” it can be seen in fig.2 that viewer client 240 contains module 122, and in para.0042 it can be seen that the client receives the video stream from video sharing server 210 and processes it using meta data from metadata database 108., para.0042  “When the user selects a video, viewer client 240 sends a request for the video to video sharing server 210. Video provider module 214 streams the video to video player 120. I”) rather than system form. Therefore the supporting rationale of the rejection to claim 8 applies equally as well to claim 14.

Regarding Claim 16, Kennedy-Hord-Frueh-Holmes teaches claim 14 as set forth above.
Kennedy further discloses A non-transitory computer-readable medium comprising a computer program, the computer program comprising instructions for causing a processor system to perform the method (Kennedy: Fig. 2 Viewer Client 240, para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video” it can be seen in fig.2 that viewer client 240 contains module 122, and in para.0042 it can be seen that the client receives the video stream from video sharing server 210 and processes it using meta data from metadata database 108., para.0042  “When the user selects a video, viewer client 240 sends a request for the video to video sharing server 210. Video provider module 214 streams the video to video player 120” para.0067 “Computing device 700 also includes a main memory 708, … Removable storage unit 718 represents a floppy disk, magnetic tape, optical disk, etc. which is read by and written to by removable storage drive 714. As will be appreciated, the removable storage unit 718 includes a computer usable storage medium having stored therein computer software and/or data.”) according to claim 14.

Claims 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kennedy JR et al. (hereinafter Kennedy, US 2009/0259941 A1) in view of Mariadoss (US 2016/0080698 A1) further in view of Hord et al. (hereinafter Hord, US 2011/0016487 A1) further in view of Frueh et al. (hereinafter Frueh, “Headset Removal for Virtual and Mixed Reality”, NPL 2017 attached.) in view of Holmes (US 9,729,820 B1).
Regarding Claim 11, Kennedy discloses A system comprising an intermediary system (Kennedy: Fig.2 Video Processing Server 110) and a receiver system (Kennedy: Fig.2 Viewer Client 240), wherein:
the intermediary system (Kennedy: Fig.2 Video Processing Server 110) is configured for assisting the receiver system in processing video data which is streamed as a video stream to the receiver system via a network (Kennedy: para.0035 “As described above with reference to FIG. 1, video processing server 110 is configured to preprocess videos. Video processing server 110 may, for example, be configured to query video database 102 periodically for new videos. When video processing server 110 receives a new video 104, video processing server 110 uses video analysis module 114 to generate the metadata.” and further seen in Fig. 2, new video data is preprocessed at a server 110, and when viewer client 240 attempts to view the content, the preprocessed metadata is used, such as in para.0020), 
and the processing is dependent on an analysis of the video data (Kennedy: para.0022 “As is be described in more detail below with respect to FIG. 2, target data 112 may be defined by a user. Then, video analysis module 114 may be configured to track the object depicted in the portion of the frame defined by target data 112. In another embodiment, target data 112 may indicate that the target is a particular object or has a particular color. In that case, video analysis module 114 may be configured to track the object defined by target data 112. To track the object, video analysis module 114 may use known pattern recognition techniques, such as those employed by the image analyzer described in the '933 patent.” the content is processed by analyzing the video frames to track objects, this is further explained in para.0023.) 
to detect a position and orientation of the target in the video data (Kennedy:“Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.” para.0037 “For example, the metadata may define the four points of a quadrilateral corresponding to the target. In another example, the metadata could merely define the size, position and orientation of the target.”), 
the intermediary system (Kennedy: Fig.2 Video Processing Server 110) comprising: 
- a network interface to the network (Kennedy: para.0052 “video processing server 110, … may be implemented on any computing device. Such computing device can include, but is not limited to, … An exemplary computing device is illustrated in FIG. 7, described below.” Fig.7 724 network interface, and associated para.0069 shows how the video processing server 110 is connected to other elements, therefore all communications to external devices to the server 110 is performed via this network interface) to the network (Kennedy: Fig.2 204); 
- a processor (Kennedy: para.0052 “video processing server 110, … may be implemented on any computing device. Such computing device can include, but is not limited to, … An exemplary computing device is illustrated in FIG. 7, described below.” Fig. 7 processor 704)  
configured to: - via the network interface (Kennedy: Fig.7 processor 704), receive the video stream from a stream source in the network (Kennedy: Fig.2 para.0022 “After video processing server 110 receives video 104, video analysis module 114 is configured to track a target in the video. The target is described in a target data 112. In an embodiment, target data 112 may define a portion of a frame, perhaps the first frame, in the video.” server 110 receives the video); 
- analyze the video data part to detect a position and orientation of the target in the video data part and to obtain an analysis result comprising the position and orientation of the target in the video part (Kennedy: para.0023 “Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.” para.0037 “For example, the metadata may define the four points of a quadrilateral corresponding to the target. In another example, the metadata could merely define the size, position and orientation of the target.” the video is analyzed to generate position and appearance information of the target.), 
wherein the analyzing is performed in parallel to receiving and/or processing a subsequent part of the video stream (Kennedy: para.0028 “In an alternative, the video analysis may be done in real time as the video is played.” analysis is performed in real time as the video is being played by the viewer client, therefore as the video stream is being processed by the receiver to be played, the analysis is performed in parallel.); -5- 3539918.v1Docket No.: 4965.1117-001 
 generate processing assist data comprising the analysis result (Kennedy: para.0023 “Video analysis module 114 is configured to generate tracking metadata 106 for the video. When the video analysis module 114 tracks the target, video analysis module 114 may record the position and appearance of the target in multiple frames (perhaps each and every frame) as tracking metadata 106.” the video is analyzed to generate position and appearance information of the target, and that information is compiled as tracking metadata 106)  or a processing instruction derived from the analysis result; 
- via the network interface, provide the processing assist data to the receiver system (Kennedy: para.0023 “Once video analysis module 114 tracks the target, video processing server may store tracking metadata 106 in metadata database 108 for later use.” metadata is stored in metadata database 108, the data is accessible by the viewer client, it can be seen in para.0043 “In another embodiment, when video player 120 requests a video from video sharing server 210, video player 120 makes a separate request to metadata server 260 for the metadata corresponding to the video. For example, metadata server may send the metadata in XML format. In this embodiment, video player 120 may receive the video and the metadata from different servers and may have to assemble them to synchronize the metadata with the video.” the metadata server 250 accesses the metadata database 108, and sends corresponding metadata to the client device.); 
the receiver system is configured for processing video data which is received as a video stream via a network (Kennedy: Fig. 2 Viewer Client 240, para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video” it can be seen in fig.2 that viewer client 240 contains module 122, and in para.0042 it can be seen that the client receives the video stream from video sharing server 210 and processes it using meta data from metadata database 108., para.0042  “When the user selects a video, viewer client 240 sends a request for the video to video sharing server 210. Video provider module 214 streams the video to video player 120. I” ),
the receiver system (Kennedy: Fig. 2 Viewer Client 240) comprising: 
- a network interface to the network (Kennedy: para.0069 “Computing device 700 may also include a communications interface 724. Communications interface 724 allows software and data to be transferred between computing device 700 and external devices. ”); 
- a processor (Kennedy: para.0065 “FIG. 7 is a diagram illustrating an example computing device which may be used in embodiments of this invention. The computing device 700 includes one or more processors, such as processor 704.”) configured to: 
- via the network interface, receive the video stream from (Kennedy: para.0062 “At step 404, video and corresponding metadata are received. ” para.0042  “When the user selects a video, viewer client 240 sends a request for the video to video sharing server 210. Video provider module 214 streams the video to video player 120.” the client 240, as seen in Fig.2, receives the video stream via network interface 724, as seen in Fig. 7.) 
- via the network interface, receive the processing assist data (Kennedy: para.0020, para.0044 metadata) of a video part from the intermediary system (Kennedy: Fig.2 Video Processing Server 110) (Kennedy: para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video. ”, para.0044 “Video player 120 may use the timestamp of each frame to correlate the frame with its associated metadata. For example, if the frame is played 12.43 seconds into the video, video player 120 may find the portion of metadata for the frame at that time.” the module 122, that exists in Viewer client 240 as seen in Fig.2, receives the metadata for the particular portions of the video.), 
and wherein the processing assist data is received from the intermediary system in parallel to receiving and/or decoding a subsequent part of the part of the video data (Kennedy: para.0028 “In an alternative, the video analysis may be done in real time as the video is played.” analysis is performed in real time as the video is being played by the viewer client, therefore as the video stream is being processed by the receiver to be played, the analysis is performed in parallel.), 
- process the video data to obtain processed video data comprising the target from the video data using the analysis result (Kennedy: para.0027 “Using the metadata for the video, image synthesis module 122 is configured to place a visual element 124 in the video. Visual element 124 may be an advertisement. Visual element 124 may have a variety of formats. For example, visual element 124 may be text, a graphic, or even a video to create a video-in-video effect. Image synthesis module 122 may be configured to adjust visual element 124 according to geometry information in metadata for the frame being processed.” para.0020 “At a later point in time, when a user views an on-demand video, image synthesis module 122 receives metadata from metadata database 108 and uses the metadata to insert a visual element 124 into the video. ” the video is modified using the received metadata.)  or the processing instruction provided by the processing assist data.
However Kennedy does not explicitly disclose wherein the processing of the video data by the receiver system comprises a Head Mounted Display (HMD) removal, to detect a position and orientation of the HMD in the decoded video data part and to obtain an analysis result comprising the position and orientation of the HMD in the decoded video part, - via the network interface, forward the video stream to the receiver system; - decode at least part of the video stream to obtain a decoded video data part; - analyze the decoded video data part to detect a position and orientation of the HMD in the video data part and to obtain an analysis result comprising the position and orientation of the HMD in the decoded video part,  receive the video stream from the intermediary system; decode the video stream to obtain the video data, via the network interface, receive the processing assist data of a decoded video part from the intermediary system,  process the video data to obtain processed video data comprising the HMD removal from the video data.
Mariadoss discloses the intermediary system comprising a processor configured to decode at least part of the video stream to obtain a decoded video data part (Mariadoss: para.0038 “In step 310, the raw video stream from the IP camera can be received. In step 315, the raw video stream can be decoded and processed. “ the video stream is obtained and decoded. )
analyze the decoded video data part to obtain an analysis result (Mariadoss: para.0038 “In step 320, real-time analytics can be performed on the video stream based on one or more processing criteria and/or user profile settings. Criteria/settings can include, face recognition, path tracking, object tracking, motion detection, and the like.” the decoded video is analyzed for various metrics such as object tracking.).
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy and Mariadoss in order to incorporate the intermediary system comprising a processor configured to decode at least part of the video stream to obtain a decoded video data part; and analyze the decoded video data part to obtain an analysis result.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of being of sending compressed data, such as receiving encoded data and decoding prior to processing, brings improved bandwidth, speed and less congestion (Mariadoss: para.0027, para.0038).
However Kennedy-Mariadoss does not explicitly disclose wherein the processing of the video data by the receiver system comprises a Head Mounted Display (HMD) removal, to detect a position and orientation of the HMD in the decoded video data part and to obtain an analysis result comprising the position and orientation of the HMD in the decoded video part, - via the network interface, forward the video stream to the receiver system; - analyze the decoded video data part to detect a position and orientation of the HMD in the video data part and to obtain an analysis result comprising the position and orientation of the HMD in the decoded video part,  receive the video stream from the intermediary system; decode the video stream to obtain the video data, via the network interface, receive the processing assist data of a decoded video part from the intermediary system,  process the video data to obtain processed video data comprising the HMD removal from the video data
Hord discloses decode the video stream to obtain the video data, via the network interface (Hord: para.0034 “The signal processing system 314 outputs packetized compressed streams and presents them as input for storage in the storage device 373 via an interface 375, or in other implementations, as input to the media engine 322 for decompression by a video decompression engine 323 (or video decoder) and an audio decompression engine 325 (or audio decoder), in cooperation with media memory 324, for display on the TV 341 via the output system 348.” claim 11 “The method of claim 1, further including the step of inserting at a media client device the advertisement in a decoded video stream” it can be seen in Fig.3 that after receiving the compressed stream, it is decoded and processed prior to placing advertisements in the content in para.0035.)
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy-Mariadoss and Hord in order to incorporate decode the video stream to obtain the video data.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of being of sending compressed data, such as receiving encoded data and decoding prior to processing, brings improved bandwidth, speed and less congestion (Hord: para.0034-para.0035).
However Kennedy-Mariadoss-Hord does not explicitly disclose wherein the processing of the video data by the receiver system comprises a Head Mounted Display (HMD) removal, to detect a position and orientation of the HMD in the decoded video data part and to obtain an analysis result comprising the position and orientation of the HMD in the decoded video part, - via the network interface, forward the video stream to the receiver system; - analyze the decoded video data part to detect a position and orientation of the HMD in the video data part and to obtain an analysis result comprising the position and orientation of the HMD in the decoded video part,  receive the video stream from the intermediary system; process the video data to obtain processed video data comprising the HMD removal from the video data. 
Frueh discloses wherein the processing of the video data by the receiver system comprises a Head Mounted Display (HMD) removal (Frueh: Fig. 3 headset removal, pg. 1, Approach “We enhance Mixed Reality by augmenting it with our headset removal technique that creates an illusion of revealing the user’s face (Figure 1).”), 
to detect a position and orientation of the HMD in the decoded video data part and to obtain an analysis result comprising the position and orientation of the HMD in the video part (Fueh: pg. 1 section 1.2 “: first estimating the camera intrinsics like field-of-view, and then computing the extrinsic transformation between the camera and VR controllers. We simplify the process by adding a marker to the front of the headset, which allows computing the calibration parameters automatically from game play data—the marker is removed virtually during the rendering phase by inpainting it from surrounding headset pixels. Face alignment: To render the virtual face, we need to align the 3D face model with the visible portion of the face in the camera stream, so that they blend seamlessly with each other. A reasonable proxy to this alignment is to position the face model just behind the headset, where the user’s face rests during the VR session. This positioning is estimated based on the geometry and coordinate system of the headset. e calibration computed above is theoretically sufficient to track the headset in the camera view, but in practice there may be errors due to drift or jitter in the Vive tracking. Hence, we further refine the tracking (continuously in every frame) by rendering a virtual model of the headset from the camera viewpoint, and using silhouette matching to align it with the camera frame.” position and orientation is tracked in video data, this is also assisted using QR code as seen in Fig. 3.  It can be seen that orientation is further tracked as the face is lined up based on orientation, described in section 1.3.), 
- analyze the decoded video data part to detect a position and orientation of the HMD in the video data part and to obtain an analysis result comprising the position and orientation of the HMD in the video part (Fueh: pg. 1 section 1.2 “: first estimating the camera intrinsics like field-of-view, and then computing the extrinsic transformation between the camera and VR controllers. We simplify the process by adding a marker to the front of the headset, which allows computing the calibration parameters automatically from game play data—the marker is removed virtually during the rendering phase by inpainting it from surrounding headset pixels. Face alignment: To render the virtual face, we need to align the 3D face model with the visible portion of the face in the camera stream, so that they blend seamlessly with each other. A reasonable proxy to this alignment is to position the face model just behind the headset, where the user’s face rests during the VR session. This positioning is estimated based on the geometry and coordinate system of the headset. e calibration computed above is theoretically sufficient to track the headset in the camera view, but in practice there may be errors due to drift or jitter in the Vive tracking. Hence, we further refine the tracking (continuously in every frame) by rendering a virtual model of the headset from the camera viewpoint, and using silhouette matching to align it with the camera frame.” position and orientation is tracked in video data, this is also assisted using QR code as seen in Fig. 3.  It can be seen that orientation is further tracked as the face is lined up based on orientation, described in section 1.3.),  
process the video data to obtain processed video data comprising the HMD removal from the video data (Frueh: section 1.3 “Translucent rendering: Humans have high perceptual sensitivity to faces, and even small imperfections in synthesized faces can feel unnatural and distracting, a phenomenon known as the uncanny valley. To mitigate this problem, instead of removing the headset completely, we choose a user experience that conveys a ‘scuba mask effect’ by compositing the color-corrected face proxy with a translucent headset. Reminding the viewer of the presence of the headset helps avoid the uncanny valley and also makes our algorithms robust to small errors in misalignment and color correction.” the HMD is removed and replaced with user face.).
Therefore it would have been obvious to one of ordinary skill in the art to combine Kennedy-Mariadoss- Hord with Frueh in order to incorporate wherein the processing of the video data by the receiver system comprises a Head Mounted Display (HMD) removal, to detect a position and orientation of the HMD in the decoded video data part and to obtain an analysis result comprising the position and orientation of the HMD in the decoded video part,- analyze the video data part to detect a position and orientation of the HMD in the video data part and to obtain an analysis result comprising the position and orientation of the HMD in the video part, process the video data to obtain processed video data comprising the HMD removal from the video data, and apply this ideas to processing performed by the receiver device and analysis by the server of the video stream. Both Kennedy and Frueh are in the field of video augmentation using tracked data in para.0046-49 show content placement based on tracked objects, and Frueh tracks head mounted displays for inclusion of a face.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of improved viewer experience of content by providing a more personal video (Frueh: Abstract).
However Kennedy-Mariadoss-Hord-Frueh does not explicitly disclose via the network interface, forward the video stream to the receiver system; receive the video stream from the intermediary system.
Holmes discloses via the network interface, forward the video stream to the receiver system and receive the video stream from the intermediary system (Holmes: col.8 lines 40-63 “FIG. 3 is a flowchart illustrating a computer-implemented method of superimposing video 210 carried out by a pro cessor 31. As shown in FIG. 3, the method of superimposing video 210 carried out by a processor 31 begins with the processor 31, at a first step 240 receiving a first live video 212 from a first user's device 20. Reception 240 by a processor 31 is illustrated in FIG. 4, wherein the user device 20 of a first user transmits a first live video 212 (in this case a video 210 captured by the user's rear camera 119) to a processor 31 containing central server 30. The second step 242 of superimposing video 210 carried out by a processor 31 is receiving a second live video 214 from a second user's device 20. Again referring to FIG. 4, reception of the second live video 214 from a second user's device 20 by a processor 31 is illustrated (with the second live video 214 being captured by the second user's rear camera 119). The third step 244 of this method calls for the processor 31 to identify a first human element 216 in the first video 212 and a second human element 218 in a second video 214. Such human elements 216, 218 are illustrated in FIG. 4 with the first human element 216 being a hand (captured by the first user's rear camera 119) and the second human element 218 being a face (captured by the second users front camera 118).” col.9 55-60 “The final step 248 of the computer-implemented method of superimposing video 210 carried out by a processor 31 is transmitting the superimposed video 310 to a user device 20.” Fig.3 and Fig. 8A- 8B.  It can be seen that video data from each device is obtained, augmented and transmitted to the users, for example in the first step performed  by server 30 in Fig. 8B, extraneous elements are removed from the video, processed, and the video stream is provided to the receivers.).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kennedy-Mariadoss-Frueh with Holmes in order to incorporate via the network interface, forward the video stream to the receiver system; receive the video stream from the intermediary system, and replace the separate video Sharing server 210 and Video processing server 110 of Kennedy, with a singular server that provides both analysis metadata and the video content of Holmes. Kennedy-Maridoss as combined with Frueh discloses the removal of Head mounted displays in virtual content viewed by others in abstract of Frueh, and Holmes is in analogous field as it deals with augmenting virtual conference video data and providing the videos to each user.
One of ordinary skill in the art would have been motivated to combine because of the expected benefit of both providing augmented content to users, which improves user experience, as well as the expected benefit of fewer servers/entities would provide a simpler system for sharing video content (Holmes: col.8 lines 40-63, col.1 background of the invention.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chalozin et al. (US 2011/0016487 A1).  Please see abstract para.0055, and para.0056 that shows object tracking being performed by the server in order to insert other objects to place holder positions.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to EUI H KIM whose telephone number is (571)272-8133. The examiner can normally be reached 7:30-5 M-R, M-F alternating.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamal B Divecha can be reached on 5712725863. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/EUI H KIM/Examiner, Art Unit 2453                                                                                                                                                                                                        

/KAMAL B DIVECHA/Supervisory Patent Examiner, Art Unit 2453