DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 12-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter:
In regards to dependent claim 12, none of the cited prior art alone or in combination provides motivation to teach “wherein the determining the world coordinates corresponding to the marked point according to the second three-dimensional estimation rules, the preset reconstruction algorithm and the two-dimensional pixel coordinates, comprises: determining first mapping straight line corresponding to the two-dimensional pixel coordinates in the camera coordinate system according to the fifth transformation relationship between the pixel coordinate system and the camera coordinate system; determining a target camera point in the camera coordinate system according to a sixth transformation relationship between the world coordinate system and the camera coordinate system, preset filtering rules, and a three-dimensional world point cloud in the world coordinate system, wherein the three-dimensional world point cloud and the sixth transformation relationship are determined according to the two-dimensional video and the preset reconstruction algorithm; determining camera coordinates corresponding to the marked point in the camera coordinate system according to third estimation rules corresponding to the preset filtering rules, the first mapping straight line and the target camera point; and determining world coordinates corresponding to the marked point in the world coordinate system according to the sixth transformation relationship and the camera coordinates” as the references only teach implementation of 3D reconstruction from 2D video and reconstruction algorithms for detecting and matching feature points with respect to stereo display systems for increasing 3D modelling accuracy and incorporation in AR applications, however the references fail to explicitly detail the  process for determining the world coordinates corresponding to a specified point or feature according to secondary 3D estimation rules, a preset reconstruction algorithm, and 2D pixel coordinates through a process that incorporates preset filtering requirements, multiple transformation relationships, and straight line mapping in relation to a target camera point in conjunction with the features of claim 2 from which it depends.
In addition, there is no teaching, suggestion, or motivation found in the current references and none that can be inferred from the examiner’s own knowledge with respect to the current limitation.
In regards to dependent claims 13-17, these claims depend from base claim 12, and thus are indicated as possessing allowable subject matter as well.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 2, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hu (CN 102129708 A, hereinafter referenced “Hu”) in view of Zhuang (CN 1920886 A, hereinafter referenced “Zhuang”).

In regards to claim 1. Hu discloses an augmented reality-based remote guidance method (Hu, paragraph [0031]), comprising: 
-2acquiring a two-dimensional video of a target scene (Hu, paragraph [0032]; Reference discloses use of dual-channel cameras to collect dynamic stereo video images of real scenes in real time), 
-

-and 11rendering a three-dimensional virtual model corresponding to the marked point 12according to a presentation mode and the current camera coordinates to display the three- 13dimensional virtual model in the target scene (Hu, paragraphs [0033] and [0036]; Reference at [0033] discloses take out key frames from the dual-channel video stream at regular intervals, calculate the dense depth map on them, establish a three-dimensional model of the real object to be occluded, and extract sparse feature points at the same time. Paragraph [0036] discloses the sparse feature point tracking strategy is adopted for all intermediate frames in the video stream, combined with the position of the sparse feature points in the image, to estimate the current camera position and posture (interpreted as the current camera coordinates for displaying the 3D virtual model. The dynamic stereo video interpreted as the presentation mode.).  
Hu does not explicitly disclose but Zhuang teaches
Zhuang, paragraph [0125]; reference discloses Pentium computer (i.e. remote terminal) for processing video sequence); 
-4if a guidance mode of remote guidance is marking mode, acquiring two-5dimensional pixel coordinates corresponding to a marked point in a marked image frame of the 6two-dimensional video at the remote terminal (Zhuang, paragraph [0093]; Reference discloses manually labeling the positions of facial feature points in the first frame of an input video; mapping, frame by frame, the position coordinates of two-dimensional feature points in each frame obtained by tracking to a reconstructed face model corresponding to each video frame); 
-7determining current camera coordinates corresponding to the marked point 8according to first three-dimensional coordinate estimation rules and the two-dimensional pixel 9coordinates, wherein the current camera coordinates are current three-dimensional space 10coordinates corresponding to the marked point in a camera coordinate system (Zhuang, paragraph [0093]; Reference discloses mapping, frame by frame, the position coordinates of two-dimensional feature points (i.e. 2D pixel coordinates) in each frame obtained by tracking to a reconstructed face model corresponding to each video frame…reconstructing the input video by means of eigenface, and automatically performing dynamic texture mapping (i.e. coordinate estimation rules)  on dynamic three-dimensional faces in combination with two-dimensional tracking data); 
Hu and Zhuang are combinable because they are in the same field of endeavor regarding 3D video adjustment. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include the 3D video modelling features of Zhuang in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output based on the dynamic mapping and modelling applicable to improving realism in reconstructed 3D models such as that taught in Hu.
In regards to claim 142. Hu in view of Zhuang teach the method according to claim 1.
Hu further discloses
-wherein the determining the current camera 15coordinates corresponding to the marked point according to the first three-dimensional 16coordinate estimation rules and the two-dimensional pixel coordinates, comprises: 17determining world coordinates corresponding to the marked point according to 18second three-dimensional estimation rules, a preset reconstruction algorithm and the two- 19dimensional pixel coordinates (Hu, paragraph [0067]; Reference at [0067] discloses according to the new position of the tracked sparse feature points in each frame, and their three-dimensional information, that is, the coordinates in the target object coordinate system, the POSIT iterative algorithm is used to estimate the relative pose of the target object and the camera coordinate system. The method of estimating the pose of an object is to calculate the relative pose of the video images obtained by the left and right cameras. According to the two sets of calculation results obtained, the object is calculated in the world coordinate system according to the external parameters and basic matrix of the dual camera. This is thus used for the 3D reconstruction process as disclosed in paragraph [0068]), 
-wherein the world coordinates refer to world three-dimensional 20space coordinates corresponding to the marked point in a world coordinate system (Hu, paragraph [0067]; Reference at [0067] discloses according to the new position of the tracked sparse feature points in each frame, and their three-dimensional information, that is, the coordinates in the target object coordinate system, the POSIT iterative algorithm is used to estimate the relative pose of the target object and the camera coordinate system….According to the two sets of calculation results obtained, the object is calculated in the world coordinate system according to the external parameters and basic matrix of the dual camera. The tracked sparse feature points refer to the 3D pace coordinate upon reconstruction as they correlate to the object calculated in the world coordinate system); 
-21determining a current camera pose according to the preset reconstruction 22algorithm and the two-dimensional video (Hu, paragraph [0069]; Reference discloses each time a key frame image is extracted from the video stream for 3D reconstruction (i.e. according to preset reconstruction algorithm and 2D video), and sparse feature points are extracted at the same time. The subsequent intermediate frames are used to track the sparse feature points. According to the new position of the 3D model obtained from the last pair of key frames and the sparse feature points in the intermediate frame, estimate the relative pose of the object and the camera, and adjust the pose of the object model accordingly (i.e. determining a current camera pose)); 
-and 23determining current camera coordinates corresponding to the marked point according to the world coordinates and the current camera pose (Hu, paragraph [0067]; Reference at [0067] discloses according to the new position of the tracked sparse feature points in each frame, and their three-dimensional information, that is, the coordinates in the target object coordinate system, the POSIT iterative algorithm is used to estimate the relative pose of the target object and the camera coordinate system. The method of estimating the pose of an object is to calculate the relative pose of the video images obtained by the left and right cameras. According to the two sets of calculation results obtained, the object is calculated in the world coordinate system according to the external parameters and basic matrix of the dual camera. Paragraph [0068] discloses pose of the current frame should be re-estimated against the original key frame used for modeling, and the pose of the newly created model should be adjusted accordingly before rendering, and then used for the next round of 3D tracking. The method of moving and rotating the real object model according to the camera pose is: after this modeling is completed, before the new model is put into use, compare the original key frame used for modeling, and re-estimate the object in the current frame. Movement of real object based on camera pose based on world coordinates with respect to the tracked sparse feature points interpreted as the determined current camera coordinates corresponding to the marked point according to the world coordinates and the current camera pose.)

Hu further discloses
-a non-transitory computer-readable storage medium including computer 198programs, which, when being executed by a processor, performs the augmented reality-based remote guidance method according to claim 1 (Hu, paragraph [0077]; Reference discloses the entire set of augmented reality occlusion processing program runs on a PC with a 2.5GHz dual-core CPU).

Claims 3-9, 11, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hu (CN 102129708 A) in view of Zhuang (CN 1920886 A) as applied to claim 1 above, and further in view of Kezele (US 2016/0012643 A1, hereinafter referenced “Kezele”)

In regards to claim 3. Hu in view of Zhuang teach the method according to claim 1.
Hu and Zhuang does not disclose but Kezele teaches
-wherein the rendering the three-dimensional 26virtual model corresponding to the marked point according to the presentation mode and the 27current camera coordinates to display the three-dimensional virtual model in the target scene, 28comprises: 29if the presentation mode is a mode of binocular optical see-through (OST) lens (Kezele, paragraph [0013]; Reference discloses system/method for stereo calibration of a (virtual) optical see-through (OST) head-mounted display (HMD) having left and right eye views to provide stereo images (i.e. a stereoscopic display)), 30determining left eye pixel coordinates Kezele, paragraph [0013]; Reference discloses Defining the left and right views include modeling left and right calibration matrices (with intrinsic and extrinsic parameters modeled separately in separate calibration matrices) that define 3D-to-2D point correspondences between 3D coordinates of a real, reference object in a defined world coordinate system, and the 2D position of a corresponding virtual object in the left and right projected images of the OST HMD. The point to point correspondence determination via calibration matrices for left and right images interpreted as the determining left eye pixel coordinates corresponding to the three-dimensional virtual model in 31relation to the marked point according to a first transformation relationship between the camera 32coordinate system and a left eye virtual three-dimensional coordinate system.), 
-a second 33transformation relationship between the left eye virtual three-dimensional coordinate system and 34a left eye pixel coordinate system, and the current camera coordinates (Kezele, paragraph [0013] and [0099]-[0100]; Reference discloses defining the left and right views include modeling left and right calibration matrices (with intrinsic and extrinsic parameters modeled separately in separate calibration matrices) that define 3D-to-2D point correspondences between 3D coordinates of a real, reference object in a defined world coordinate system, and the 2D position of a corresponding virtual object in the left and right projected images of the OST HMD. The point to point correspondence determination via calibration matrices for left and right images interpreted as the determining left eye pixel coordinates corresponding to the three-dimensional virtual model in 31relation to the marked point according to a first transformation relationship between the camera 32coordinate system and a left eye virtual three-dimensional coordinate system. Paragraphs [0099]-[0100] discloses the example of performing affine scale-invariant feature transform (i.e. first, second etc.) for matching feature points (i.e. pixels) between different images from different view angles (i.e. current camera coordinates)); 
-35rendering the three-dimensional virtual model corresponding to the marked point 36at the left eye pixel coordinates to display a left eye image corresponding to the three-37dimensional virtual model in a left eye OST lens (Kezele, paragraph [0106]; Reference discloses preferably, HMD 50 includes a right optical see-through display unit 51 and a left optical see-through display unit 52 that work together to provide left and right images of a stereo image pair that displays a virtual 3D object (i.e. rendered three-dimensional virtual model corresponding to the marked point at the left eye pixel coordinates to display a left eye image corresponding to the three-dimensional virtual model in a left eye OST lens)); 
-38determining right eye pixel coordinates corresponding to the three-dimensional 39virtual model in relation to the marked point according to a third transformation relationship 40between the camera coordinate system and a right eye virtual three-dimensional coordinate 41system (Kezele, paragraph [0013]; Reference discloses Defining the left and right views include modeling left and right calibration matrices (with intrinsic and extrinsic parameters modeled separately in separate calibration matrices) that define 3D-to-2D point correspondences between 3D coordinates of a real, reference object in a defined world coordinate system, and the 2D position of a corresponding virtual object in the left and right projected images of the OST HMD. The point to point correspondence determination via calibration matrices for left and right images interpreted as the determining left eye pixel coordinates corresponding to the three-dimensional virtual model in 31relation to the marked point according to a first transformation relationship between the camera 32coordinate system and a left eye virtual three-dimensional coordinate system), 
-a fourth transformation relationship between the right eye virtual three-dimensional 42coordinate system and a right eye pixel coordinate system, and the current camera coordinates (Kezele, paragraph [0013] and [0099]-[0100]; Reference discloses defining the left and right views include modeling left and right calibration matrices (with intrinsic and extrinsic parameters modeled separately in separate calibration matrices) that define 3D-to-2D point correspondences between 3D coordinates of a real, reference object in a defined world coordinate system, and the 2D position of a corresponding virtual object in the left and right projected images of the OST HMD. The point to point correspondence determination via calibration matrices for left and right images interpreted as the determining left eye pixel coordinates corresponding to the three-dimensional virtual model in 31relation to the marked point according to a first transformation relationship between the camera 32coordinate system and a left eye virtual three-dimensional coordinate system. Paragraphs [0099]-[0100] discloses the example of performing affine scale-invariant feature transform (i.e. first, second etc.) for matching feature points (i.e. pixels) between different images from different view angles (i.e. current camera coordinates)); 
Kezele, paragraph [0106]; Reference discloses preferably, HMD 50 includes a right optical see-through display unit 51 and a left optical see-through display unit 52 that work together to provide left and right images of a stereo image pair that displays a virtual 3D object (i.e. rendered three-dimensional virtual model corresponding to the marked point 45at the right eye pixel coordinates to display a right eye image corresponding to the three- 46dimensional virtual model in a right eye OST lens)).  
Hu and Zhuang are combinable because they are in the same field of endeavor regarding 3D video adjustment. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include the 3D video modelling features of Zhuang in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output based on the dynamic mapping and modelling applicable to improving realism in reconstructed 3D models such as that taught in Hu.
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

In regards to claim 474. Hu in view of Zhuang in further view of Kezele teach the method according to claim 3.
Hu and Zhuang does not disclose but Kezele teaches
-wherein the determining the left eye pixel 48coordinates corresponding to the three-dimensional virtual model in relation to the marked point 49according to the first Kezele, paragraph [0013]; Reference discloses defining the left and right views include modeling left and right calibration matrices (with intrinsic and extrinsic parameters modeled separately in separate calibration matrices) that define 3D-to-2D point correspondences between 3D coordinates of a real, reference object in a defined world coordinate system, and the 2D position of a corresponding virtual object in the left and right projected images of the OST HMD. The point to point correspondence determination via calibration matrices for left and right images interpreted as the determining left eye pixel coordinates corresponding to the three-dimensional virtual model in 31relation to the marked point according to a first transformation relationship between the camera 32coordinate system and a left eye virtual three-dimensional coordinate system.); 
-and 57determining left eye pixel coordinates corresponding to the three-dimensional 58virtual model according to the left eye virtual three-dimensional coordinates and the second 59transformation relationship between the left eye virtual three-dimensional coordinate system and 60the left eye pixel coordinate system (Kezele, paragraph [0013] and [0099]-[0100]; Reference discloses defining the left and right views include modeling left and right calibration matrices (with intrinsic and extrinsic parameters modeled separately in separate calibration matrices) that define 3D-to-2D point correspondences between 3D coordinates of a real, reference object in a defined world coordinate system, and the 2D position of a corresponding virtual object in the left and right projected images of the OST HMD. The point to point correspondence determination via calibration matrices for left and right images interpreted as the determining left eye pixel coordinates corresponding to the three-dimensional virtual model in 31relation to the marked point according to a first transformation relationship between the camera 32coordinate system and a left eye virtual three-dimensional coordinate system. Paragraphs [0099]-[0100] discloses the example of performing affine scale-invariant feature transform (i.e. first, second etc.) for matching feature points (i.e. pixels) between different images from different view angles (i.e. current camera coordinates)).  
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

In regards to claim 615. Hu in view of Zhuang teach the method according to claim 1.
Hu and Zhuang does not disclose but Kezele teaches
-wherein the rendering the three-dimensional 62virtual model corresponding to the marked point according to the presentation mode and the 63current camera coordinates to display the three-dimensional virtual model in the target scene, 64comprises: 65if the presentation mode is a mode of video see-through (VST) lens (Kezele, paragraph [0060]; Reference discloses a head-mounted display (HMD) in an AR system may include a human-eye/optical see-through (OST) display, which is a see-though display on which virtual objects are incorporated, or a video see-through (VST) display, which is a video display (e.g. screen) that integrates virtual objects into displayed images of a real scene), 
-projecting the 66three-dimensional virtual model corresponding to the marked point to a pixel coordinate system 67according to the current camera coordinates and a fifth transformation relationship between the 68pixel coordinate system and the camera coordinate system, and determining pixel coordinates 69corresponding to the three-dimensional virtual model (Kezele, paragraph [0106]; Reference discloses preferably, HMD 50 includes a right optical see-through display unit 51 and a left optical see-through display unit 52 that work together to provide left and right images of a stereo image pair that displays a virtual 3D object…Left and right optical see-through display units 51 and 52 may provide an image by means of image projection); 
-and 70rendering the three-dimensional virtual model into a current image frame of the 71two-dimensional video according to the pixel coordinates corresponding to the three-dimensional 72virtual model to display the rendered current image frame in the VST lens (Kezele, paragraph [0106]; Reference discloses preferably, HMD 50 includes a right optical see-through display unit 51 and a left optical see-through display unit 52 that work together to provide left and right images (i.e. 2D image frames) of a stereo image pair that displays a virtual 3D object (i.e. rendered three-dimensional virtual model with corresponding to pixel coordinates to display rendered current image frame as reference previously discloses using OST or VST lens in [0060])).  
Hu and Zhuang are combinable because they are in the same field of endeavor regarding 3D video adjustment. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include the 3D video modelling features of Zhuang in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output based on the dynamic mapping and modelling applicable to improving realism in reconstructed 3D models such as that taught in Hu.
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.



Hu does not disclose but Zhuang teaches
-wherein the rendering the three-dimensional 74virtual model corresponding to the marked point according to the presentation mode and the 75current camera coordinates, comprises: 76sending pixel coordinates corresponding to the three-dimensional virtual model to 77the remote terminal Zhuang, paragraph [0092]; Reference discloses Before performing dynamic texture mapping, pre-designate 40 initial 3D feature vertices on the 3D face model based on 40 feature points. The coordinates of the aforementioned 40 feature points have been obtained during video tracking and can be regarded as the set of 3D feature vertices. The feature points mapped interpreted as the sent pixel coordinates corresponding to the 3D virtual model as paragraph [0125] discloses the Pentium computer (i.e. remote terminal) for processing the dynamic 3D expression video sequence ); 
-and 78rendering, by the remote terminal, the three-dimensional virtual model according to the pixel coordinates corresponding to the three-dimensional virtual model (Zhuang, paragraph [0092]; Reference discloses Before performing dynamic texture mapping, pre-designate 40 initial 3D feature vertices on the 3D face model based on 40 feature points. The coordinates of the aforementioned 40 feature points have been obtained during video tracking and can be regarded as the set of 3D feature vertices. The feature points mapped interpreted as the sent pixel coordinates corresponding to the 3D virtual model as paragraph [0125] discloses the Pentium computer (i.e. remote terminal) for processing the dynamic 3D expression video sequence).  
Zhuang does not disclose but Kezele teaches
Kezele, paragraph [0060]; Reference discloses a head-mounted display (HMD) in an AR system may include a human-eye/optical see-through (OST) display, which is a see-though display on which virtual objects are incorporated, or a video see-through (VST) display, which is a video display (e.g. screen) that integrates virtual objects into displayed images of a real scene)
Hu and Zhuang are combinable because they are in the same field of endeavor regarding 3D video adjustment. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include the 3D video modelling features of Zhuang in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output based on the dynamic mapping and modelling applicable to improving realism in reconstructed 3D models such as that taught in Hu.
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

In regards to claim 167. Hu in view of Zhuang in further view of Kezele teach the method according to claim 6.
Hu and Zhuang does not disclose but Kezele teaches
-wherein the sending the pixel coordinates 81corresponding to the three-dimensional virtual model to the remote terminal according to the 82presentation mode, comprises: 83if the presentation mode is a mode of binocular optical see-through (OST) lens (Kezele, paragraph [0013]; Reference discloses system/method for stereo calibration of a (virtual) optical see-through (OST) head-mounted display (HMD) having left and right eye views to provide stereo images (i.e. a stereoscopic display)), 84determining pixel Kezele, paragraph [0013] and [0099]-[0100]; Reference discloses defining the left and right views include modeling left and right calibration matrices (with intrinsic and extrinsic parameters modeled separately in separate calibration matrices) that define 3D-to-2D point correspondences between 3D coordinates of a real, reference object in a defined world coordinate system, and the 2D position of a corresponding virtual object in the left and right projected images of the OST HMD. The point to point correspondence determination via calibration matrices for left and right images interpreted as the determining pixel coordinates corresponding to the three-dimensional virtual model according to a transformation relationship between the camera 32coordinate system and the virtual three-dimensional coordinate system. Paragraphs [0099]-[0100] discloses the example of performing affine scale-invariant feature transform (i.e. first, second, third etc.) for matching feature points (i.e. pixels) between different images from different view angles (i.e. current camera coordinates)), 
-and sending the pixel coordinates 87corresponding to the three-dimensional virtual model to the remote terminal (Zhuang, paragraph [0092]; Reference discloses Before performing dynamic texture mapping, pre-designate 40 initial 3D feature vertices on the 3D face model based on 40 feature points. The coordinates of the aforementioned 40 feature points have been obtained during video tracking and can be regarded as the set of 3D feature vertices. The feature points mapped interpreted as the sent pixel coordinates corresponding to the 3D virtual model as paragraph [0125] discloses the Pentium computer (i.e. remote terminal) for processing the dynamic 3D expression video sequence).  
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

In regards to claim 888. Hu in view of Zhuang in further view of Kezele teach the method according to claim 6.
Hu does not disclose but Zhuang teaches

-91if the presentation mode is a mode of video see-through (VST) lens, sending the 92pixel coordinates corresponding to the three-dimensional virtual model to the remote terminal (Zhuang, paragraph [0092]; Reference discloses Before performing dynamic texture mapping, pre-designate 40 initial 3D feature vertices on the 3D face model based on 40 feature points. The coordinates of the aforementioned 40 feature points have been obtained during video tracking and can be regarded as the set of 3D feature vertices. The feature points mapped interpreted as the sent pixel coordinates corresponding to the 3D virtual model as paragraph [0125] discloses the Pentium computer (i.e. remote terminal) for processing the dynamic 3D expression video sequence).  
Zhuang does not disclose but Kezele teaches
-wherein the sending the pixel coordinates 89corresponding to the three-dimensional virtual model to the remote terminal according to the 90presentation mode, comprises: 
-91if the presentation mode is a mode of video see-through (VST) lens (Kezele, paragraph [0060]; Reference discloses a head-mounted display (HMD) in an AR system may include a human-eye/optical see-through (OST) display, which is a see-though display on which virtual objects are incorporated, or a video see-through (VST) display, which is a video display (e.g. screen) that integrates virtual objects into displayed images of a real scene), sending the 92pixel coordinates corresponding to the three-dimensional virtual model to the remote terminal (Zhuang, paragraph [0092]; Reference discloses before performing dynamic texture mapping, pre-designate 40 initial 3D feature vertices on the 3D face model based on 40 feature points. The coordinates of the aforementioned 40 feature points have been obtained during video tracking and can be regarded as the set of 3D feature vertices. The feature points mapped interpreted as the sent pixel coordinates corresponding to the 3D virtual model as paragraph [0125] discloses the Pentium computer (i.e. remote terminal) for processing the dynamic 3D expression video sequence).
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.


Hu further discloses
-wherein the rendering, by the remote 94terminal, the three-dimensional virtual model according to the pixel coordinates, comprises: 95rendering, Hu, paragraphs [0033] and [0036]; Reference at [0033] discloses take out key frames from the dual-channel video stream at regular intervals, calculate the dense depth map on them, establish a three-dimensional model of the real object to be occluded, and extract sparse feature points at the same time. Paragraph [0040] discloses the three-dimensional model of the occluded real object is extracted at the same time, and the sparse feature points are extracted; the sparse feature point tracking strategy is adopted for all the intermediate frames in the video stream, and the position and posture of the current camera are estimated by combining the position of the sparse feature points in the image; Obtain 3D information of the real object itself according to the most recently established model; move and rotate the real object model according to the camera pose, which can be used for virtual and real occlusion processing; use the latest adjustment while waiting for the completion of the next 3D reconstruction. The 3D information of the real object is compared with the registered 3D virtual object in depth relationship to realize the correct multi-level virtual and real occlusion processing (interpreted as displaying the 3D virtual model in current frames based on positon and posture of camera and sparse features or pixel coordinates). The dynamic stereo video interpreted as the presentation mode.).  
Hu does not explicitly disclose but Zhuang teaches
-by the remote terminal / in the remote terminal (Zhuang, paragraph [0092]; Reference discloses Before performing dynamic texture mapping, pre-designate 40 initial 3D feature vertices on the 3D face model based on 40 feature points. The coordinates of the aforementioned 40 feature points have been obtained during video tracking and can be regarded as the set of 3D feature vertices. The feature points mapped interpreted as the sent pixel coordinates corresponding to the 3D virtual model as paragraph [0125] discloses the Pentium computer (i.e. remote terminal) for processing the dynamic 3D expression video sequence)
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

In regards to claim 11. Hu in view of Zhuang teach the method according to claim 1.
Hu and Zhuang does not disclose but Kezele teaches
-further comprising: 17if the guidance mode of the remote guidance is audio mode, acquiring scene audio 106information in the target scene, and sending the scene audio information to the remote terminal (Kezele, paragraphs [0106] and [0107]; Reference at [0106] discloses HMD 50 includes a right optical see-through display unit 51 and a left optical see-through display unit 52 that work together to provide left and right images of a stereo image pair that displays a virtual 3D object…paragraph [0107] discloses In the present example, HMD 50 includes right earphone 51 b and left earphone 52 b to provide audio information to a user)  
Hu and Zhuang are combinable because they are in the same field of endeavor regarding 3D video adjustment. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include the 3D video modelling features of Zhuang in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output based on the dynamic mapping and modelling applicable to improving realism in reconstructed 3D models such as that taught in Hu.
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

In regards to claim 18819. Hu in view of Zhuang discloses the method of claim 1.
Hu and Zhuang does not explicitly disclose but Kezele teaches
-a terminal (Kezele, Fig. 12; Reference discloses HMD 50), comprises: 
-189one or more processors (Kezele, paragraph [0105]; Reference discloses auxiliary control unit 53); 
-190a storage apparatus for storing one or more programs (Kezele, paragraph [0105]; Reference discloses All computing resources may be incorporated into HMD 50, or alternatively may be divided between HMD 50 and an auxiliary control unit 53, or some other remote computing resource, e.g. a personal computer, server, etc. (i.e. personal computer or server implies possessing memory for software or programs)); 
-191an input apparatus for acquiring a two-dimensional video (Kezele, paragraph [0064]; Reference discloses AR system: video see-through and optical see-through. In the case of video see-through, computer generated virtual objects are superimposed onto a video stream obtained by a real camera attached to the HMD); 
-192an output apparatus for displaying a three-dimensional virtual model 193corresponding to a marked point (Kezele, paragraphs [0100] and [0106]; Reference at [0100] discloses the concept of a marked point based on the feature points selected within a given window with respect to an array of pixels. Paragraph [0106] discloses HMD 50 includes a right optical see-through display unit 51 and a left optical see-through display unit 52 that work together to provide left and right images of a stereo image pair that displays a virtual 3D object); 
-194while the one or more programs are executed by the one or more processors, the 195one or more processors are configured to implement the augmented reality-based remote 196guidance method according to claim 1 (Kezele, paragraph [0114]; Reference at [0114] discloses In essence, the present invention deals with stereo calibration of an optical system comprised of a human user's eyes and optical see-through (OST) head mounted (HMD) displays. The calibration results are used for the purposes of anchoring virtual 3D objects to real environments (i.e. real scenes) and perceptually fusing their views, within a framework of an augmented reality (AR) system).  
Hu and Kezele are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the HMD calibration features of Kezele in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the HMD calibration features of Kezele allows for use of optical see through and video see through HMD systems which are calibrated for aligning virtual objects based on predefined incremental adjustments providing the best aligned virtual output in the scene, applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

Claims 10 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hu (CN 102129708 A) in view of Zhuang (CN 1920886 A) as applied to claim 1 above, and further in view of Chen (2018 “SLAM-based dense surface reconstruction in monocular Minimally Invasive Surgery and its application to Augmented Reality”, hereinafter referenced “Chen”)

In regards to claim 9910. Hu in view of Zhuang teach the method according to claim 1.
Hu and Zhuang does not disclose but Chen teaches
-further comprising: 100if the guidance mode of the remote guidance is text annotation mode, acquiring 101text information sent by the remote terminal and/or by a data server; and 102rendering the text information according to the presentation mode to display the 103text information in the target scene (Chen, Fig. 7(c)  “4.3. “Real endoscopic video evaluation” page 142; Reference at page 142 discloses with our new 3D surface reconstruction approach, we have developed a geometry-aware AR framework for depth correct AR argumentation within the intra-operative endoscope scene in real- time. Our AR framework is an important step towards high quality AR in MIS, since incorrect depth placement will cause virtual objects to appear to drift away when the viewing angle changes. Furthermore, accurate global geometric information plays a crucial role in augmenting the real surgical scenes with annotations, labels, and tumor measurements, inguinal measurements to estimate optimal mesh size for inguinal herniorrhaph [20] or even a 3D re-construction of anatomy structures at the target surgical location. Computer technology used for processing of images as numerical label of measurements shown in fig. 7(b) interpreted as the rendered text animation presented in the AR interface or presentation mode for the target scene).  
Hu and Zhuang are combinable because they are in the same field of endeavor regarding 3D video adjustment. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include the 3D video modelling features of Zhuang in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output based on the dynamic mapping and modelling applicable to improving realism in reconstructed 3D models such as that taught in Hu.
Hu and Chen are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the SLAM-based AR features of Chen in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the SLAM-based AR features of Chen allows for use annotations and labels of augmented reality as 3D reconstruction is performed from videos for application in geometry aware AR applications providing increased accuracy in the reconstruction process applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

In regards to claim 18118. Hu in view of Zhuang teach the method according to claim 2.
Hu and Zhuang does not disclose but Chen teaches
-wherein, the preset reconstruction algorithm comprises SLAM algorithm based on ORB feature points (Chen, “3.2. Monocular Endoscopic camera tracking and mapping” page 137; Reference at page 137 discloses implementation of ORB–SLAM which combines many state-of-the-art techniques into one SLAM system, such as using an ORB descriptor for tracking, local keyframe for mapping, graph-based optimization, the Bag of Words algorithm for re-localization, and an essential graph for loop closure. These features can enable real-time endoscopic camera tracking and sparse point mapping in an abdominal cavity as shown in Fig. 1); 
-20wherein the determining the current camera pose according to the preset 184reconstruction algorithm and the two-dimensional video, comprises: 185determining the current camera pose according to the ORB feature points in the 186current image frame of the two-dimensional video and the ORB feature points in the previous 187image frame of the current image frame (Chen, “4.2.2 3D Surface Reconstruction Evaluation” and “4.3 Real endoscopic video evaluation” page 142; Reference at section 4.2.2. discloses When the ORB–SLAM system gained enough feature points, we build a 3D surface based on the sparse point cloud. The whole re- construction pipeline takes only 600 ms to generate the surface, which was then exported into the 3D model space to be compared with the ground truth surface data set. Section 4.3 discloses Fig. 7 (b) shows the depth augmentation by fusing the camera pose from the SLAM system and the 3D surface reconstructed from our pro- posed framework. The real-time alignment of the 3D transparent mesh and the video are a good match, suggesting that our method can provide the correct depth information intra-operatively and so help improve surgical performance by displaying 3D mesh structures when performing monocular endoscope procedures. The alignment between the videos and the reconstructed 3D surfaces for depth augmenting based on the ORB-SLAM obtained feature points interpreted as determining the current camera pose according to the ORB feature points in the current image frame of the two-dimensional video and the ORB feature points in the previous image frame of the current image frame).  
Hu and Zhuang are combinable because they are in the same field of endeavor regarding 3D video adjustment. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include the 3D video modelling features of Zhuang in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output based on the dynamic mapping and modelling applicable to improving realism in reconstructed 3D models such as that taught in Hu.
Hu and Chen are also combinable because they are in the same field of endeavor regarding 3D modelling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the virtual and real occlusion AR method of Hu to include, in view of the 3D video modelling features of Zhuang, to include the SLAM-based AR features of Chen in order to provide the user with a method that allows for combining three-dimensional reconstruction based on binocular stereo matching and three-dimensional pose estimation based on sparse feature point tracking, applied to augmented reality with dual cameras as taught by Hu while incorporating the 3D video modelling features of Zhuang in order to incorporate a method for 3D dynamic facial expression modelling for videos by marking facial features within input frames performing affine corrections on the features for the frames for performing subsequent reconstruction generating more realistic 3D output. Further incorporating the SLAM-based AR features of Chen allows for use annotations and labels of augmented reality as 3D reconstruction is performed from videos for application in geometry aware AR applications providing increased accuracy in the reconstruction process applicable to improving display of 3D virtual representations such as those taught in Hu and Zhuang.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: See the Notice of References Cited (PTO-892)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TERRELL M ROBINSON whose telephone number is (571)270-3526. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Zimmerman can be reached on 571-272-7653. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/TERRELL M ROBINSON/Examiner, Art Unit 2619