Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed on 10/17/22 has been entered and made of record. Claims 1, 13 and 20 are amended. Claims 3-5 and 15-16 are cancelled. Claims 1-2, 6-14 and 17-20 are pending.

Response to Arguments
Applicant’s arguments with respect to claims 1, 13 and 20 have been fully considered but they are moot because the arguments do not apply to the references being used in the current rejection.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6-14 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Holzer et al. (US 2017/0109930) in view of Chen et al. (US 10,657,647), HIRANO et al. (US 2019/0066304) and Floch (GB 2573170).
As to Claim 1, Holzer teaches a method comprising: 
projecting one or more triangulated points into a designated frame based on camera pose information determined based on data collected from an inertial measurement unit at the computing device, wherein projecting the one or more triangulated points includes triangulating a three-dimensional representation of the object based on a reference view and the designated frame, the reference view being a multi-view interactive digital media representation (MVIDMR), the MVIDMR including a plurality of images of the object captured from different perspective views (Holzer discloses “a surround view is constructed from multiple images that are captured from different locations” in [0026]; “a surround view is a multi-view interactive digital media representation” in [0027]; “a scene which is captured as a multi-view image data set by a device that has an inertial measurement unit (IMU)… a multi-view image data set shows a scene from different angles… An IMU provides information about the orientation of a device while capturing the images” in [0028]; a triangulated 3D multi-view representation in [0043]; “a reference image 201 refers to a view (i.e. reference view) in the multi-view image where an anchor location 209 is selected for a synthetic object to be placed in the multi-view image…. a target image 203 refers to a view (i.e. target view) in the multi-view image for which a synthetic image is generated” in [0033]);
determining one or more overlay data locations on the object model (Holzer discloses “receiving a selection of an anchor location in a reference image for a synthetic object to be placed within a multi-view image” in [0005]; “to implementing augmented reality by adding a three-dimensional (3D) tag (also referred to herein as a synthetic object) such as an image, text, object, graphic, or the like to a multi-view image… the three-dimensional tag "moves" with the multi-view image, such that as objects or scenery within the multi-view image rotate or otherwise move, the three-dimensional tag also moves as if it were physically present along with the objects or scenery” in [0021]; “A computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026]);
determining, for each of a plurality of frames in the live camera feed, a respective frame location for one or more of the tags, each of the respective frame locations determined based on a correspondence between the object model and the respective frame (Holzer discloses “augmented reality can take the form of a live-action video or photo series with added elements that are computer-generated” in [0002], see also [0020, 0027]; “a reference image 201 refers to a view (i.e. reference view) in the multi-view image where an anchor location 209 is selected for a synthetic object to be placed in the multi-view image….a target image 203 refers to a view (i.e. target view) in the multi-view image for which a synthetic image is generated. This synthetic image is then overlaid on the target image to yield an augmented reality version of the target image. By generating synthetic images for various target views and overlaying these synthetic images on the corresponding target images, an augmented reality version of the multi-view image can be generated” in [0033]).
Holzer doesn’t explicitly teach object identity  and live skeleton detection. The combination of Chen and HIRANO further teaches following limitations:
triangulating a three-dimensional representation of the object based on a reference view and the designated frame (Chen discloses 3D triangle mesh models in C16L60-65);
creating an object model of the object by performing skeleton detection on the live camera feed based on the MVIDMR (Holzer discloses “a surround view is constructed from multiple images that are captured from different locations. A computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026]. Chen further discloses creating a 3D model of the target object from the set of target images in C3L33-39; obtaining image data from different angles, positions, view-points, distances, etc. to determine a 3D triangle mesh models in C16L56-65; object recognition in C1L18-25 & C42L33-35; live-view of image data in C42L33-35; see also a 3D skeleton of a vehicle as shown in Fig 6 below:

    PNG
    media_image1.png
    473
    640
    media_image1.png
    Greyscale

Here, Chen’s live image data can be analyzed by any image recognition algorithm. For example, HIRANO discloses “the methods and systems disclosed herein may recognize objects during a live camera mode… While in live camera mode, the methods and systems disclosed herein may use machine-learning algorithms and/or trained models to automatically segment and accurately identify objects observed during the live camera mode. The objects may be isolated through the creation of a freeform boundary identifier, which may also be referred to herein as "smart dots." Smart dots may automatically form around an object within the live camera mode” in [0003]; “an object is identified and segmented within a live camera mode” in [0005]; see also Fig 4B. Here, HIRANO’s segmented object in the live camera mode by a freeform boundary identifier is interpreted as skeleton detection on the live camera feed.);
determining via a processor at a computing device an object identity for an object represented in a live camera feed (Holzer discloses “a live-action video or photo series…in which a simulated environment is depicted through video and/or image data” in [0002]. Chen further discloses “to detect certain features or characteristics of an image, such as to determine information about objects in the image, to recognize persons or things in the image, etc. For example, there are many image processing system that perform character or facial recognition in images to identify text, people, particular buildings, or other features of objects within images in order to automatically identify people, objects, or other features depicted within the image” in C1L18-25; “may use any image recognition or analysis technology to ascertain the make, model, and/or year of the target vehicle” in C39L16-18; live-view of image data in C42L33-35); 
determining via the processor augmented reality overlay data based on the object identity, the augmented reality overlay data including one or more tags, each of the tags characterizing a feature of the object, each of the tags being associated with a respective overlay data location on the object model (Holzer discloses “receiving a selection of an anchor location in a reference image for a synthetic object to be placed within a multi-view image” in [0005];  “implementing augmented reality by adding a three-dimensional (3D) tag (also referred to herein as a synthetic object) such as an image, text, object, graphic, or the like to a multi-view image, where the multi-view image includes multiple views of a real-world environment” in [0021]; “a synthetic scene from the target view may be rendered using tracking information between the reference image and a target image from the multi-view image” in [0045]. Chen further discloses object analysis and recognition in C1L18-25 & C39L16-18. Here, Holzer’s 3D tag can be identified as image, text, object graphic or the like by Chen’s object identification or face/object detection algorithm);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer with the teaching of Chen so as to perform character or facial recognition in images to identify text, people, particular buildings, or other features of objects within images in order to automatically identify people, objects, or other features depicted within the image (Chen, col 1). The motivation of combining the teaching of HIRANO is to automatically segment and accurately identify objects observed during the live camera mode (HIRANO, [0003]).

Holzer, Chen and HIRANO don’t teach triangulating 2D skeleton joints into 3D skeleton joints. The combination of Floch further discloses following limitation:
wherein projecting the one or more triangulated points includes triangulating 2D skeleton joints into 3D skeleton joints (Floch discloses “In some embodiments, the 2D-to-3D conversions of the pairs of matched 2D skeletons may involve triangulation, meaning generating a weak 3D skeleton from a pair of matched 2D skeletons includes: projecting a part of a first 2D skeleton of the pair as a first line in a 3D space; projecting the same part of the second 2D skeleton of the pair as a second line in the 3D space; and determining a 3D position locating the part for the weak 3D skeleton, based on the first and second lines” in C3L18-25; “Figure 8 schematically illustrates a triangulation way to build a weak 3D skeleton from a matching pair of two matched 2D skeletons according to embodiment of the present invention” in C4L25-27);
presenting the live camera feed on the display screen during a presentation phase after the initialization phase, the live camera feed including the plurality of frames, each of the plurality of frames including a respective one of the tags, each of the tags being positioned at the respective frame location (Holzer discloses “a synthetic object that is rendered into a scene can be represented by a video” in [0039]; “The synthetic image produced from this virtual view is then overlaid on the target image at 410, and blended to produce a new, augmented image from the target view… this process can be repeated for multiple views in the multi-view image to generate an augmented reality version of the multi-view image that appears to include the synthetic object” in [0052]; see also Fig 1-3; camera calibration in [0051]. Holzer doesn’t explicitly teach a well-known camera initialization (or calibration) process. Floch further discloses “the source cameras 12 are calibrated so that they output their source images of the scene at the same cadence and simultaneously. The intrinsic and extrinsic parameters of the cameras are supposed to be known or calculated by using well- known calibration procedures. In particular, these calibration procedures allow the 3D object to be reconstructed into a 3D skeleton at the real scale” in C5L19-25). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer, Chen and HIRANO with the teaching of Floch so as to generate weak 3D skeletons by projecting 2D skeleton parts in 3D space to determine 3D positions of those parts (Floch, Abstract), and also explain the well-known camera calibration process.

As to Claim 2, Holzer in view of Chen, HIRANO and Floch teaches the method recited in claim 1, the method further comprising: for each of the frames, determining the correspondence between the reference view of the object and the respective frame (Holzer discloses “a reference image 201 refers to a view (i.e. reference view) in the multi-view image where an anchor location 209 is selected for a synthetic object to be placed in the multi-view image…. a target image 203 refers to a view (i.e. target view) in the multi-view image for which a synthetic image is generated” in [0033]; see also Fig 2-3.)

As to Claim 6, Holzer in view of Chen, HIRANO and Floch teaches the method recited in claim 3, wherein the presentation phase involves triangulating a three-dimensional representation of the object for each of the frames (Holzer discloses synthetic objects refer to as 3D tags in [0023]; triangulating 3D data in [0043]. Chen, Fig 5-6. Floch, C3L18-25.)

As to Claim 7, Holzer in view of Chen, HIRANO and Floch teaches the method recited in claim 6, wherein the three-dimensional representation is triangulated based on the correspondence between the reference view and the respective frame (Holzer discloses “Accordingly, using the reference image, anchor location, and an estimate of the imaging device/camera's intrinsic parameters enables calculation of the synthetic image from other views” in [0044]; “a synthetic scene from the target view may be rendered using tracking information between the reference image and a target image from the multi-view image” in [0045]. Chen, Fig 5-6. Floch, C3L18-25.)

As to Claim 8, Holzer in view of Chen, HIRANO and Floch teaches the method recited in claim 1, wherein the reference view of the object is a multi-view interactive digital media representation, the multi-view interactive digital media representation including a plurality of images of the object, each of the images of the object being captured from a different perspective view (Holzer discloses “a surround view is constructed from multiple images that are captured from different locations” in [0026]; “a surround view is a multi-view interactive digital media representation” in [0027].)

As to Claim 9, Holzer in view of Chen, HIRANO and Floch teaches the method recited in claim 8, wherein the multi-view interactive digital media representation is navigable in one or more dimensions (Holzer discloses the user can navigates through the multi-view image in [0039].)

As to Claim 10, Holzer in view of Chen, HIRANO and Floch teaches the method recited in claim 9, the method further comprising: generating the multi-view interactive digital media representation via the processor (Holzer discloses “computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026].)

As to Claim 11, Holzer in view of Chen, HIRANO and Floch teaches the method recited in claim 10, determining a three-dimensional model of the object based on the multi-view interactive digital media representation (Holzer discloses “computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026].)

As to Claim 12, Holzer in view of Chen, HIRANO and Floch teaches the method recited in claim 1, wherein the object is a vehicle, and wherein the reference view of the object includes each of a left vehicle door, a right vehicle door, and a windshield (Holzer discloses “a multi-view image data set shows a scene from different angles. For instance, a multi-view image data set can be captured while rotating a camera around its own center (panoramic case), while rotating the camera around one or multiple objects of interest (object case), while translating the camera, or while combining these movements” in [0028]; 3D model of the scene in [0053]. Official notice has been taken of the fact that “a scene of augment reality application may include an object like a vehicle to show any portion of the vehicle, which is well-known in the art (see MPEP 2144.03).)

As to Claim 13, Holzer teaches a computing device comprising:
a camera configured to capture a live camera feed of an object (Holzer discloses “the camera is rotated around an object (such as depicted in FIGS. 1A-1B and 2A-2B)” in [0047]);
a processor configured to:
determine, for each of a plurality of frames in the live camera feed, a respective frame location for one or more of the tags, each of the respective frame locations determined based on a correspondence between the reference view of the object and the respective frame (Holzer discloses “augmented reality can take the form of a live-action video or photo series with added elements that are computer-generated” in [0002], see also [0020, 0027]; “a reference image 201 refers to a view (i.e. reference view) in the multi-view image where an anchor location 209 is selected for a synthetic object to be placed in the multi-view image….a target image 203 refers to a view (i.e. target view) in the multi-view image for which a synthetic image is generated. This synthetic image is then overlaid on the target image to yield an augmented reality version of the target image. By generating synthetic images for various target views and overlaying these synthetic images on the corresponding target images, an augmented reality version of the multi-view image can be generated” in [0033]); and
a display screen configured to present the live camera feed including the plurality of frames during a presentation phase after the initialization phase, each of the plurality of frames including a respective one of the tags, each of the tags being positioned at the respective frame location (Holzer discloses “a synthetic object that is rendered into a scene can be represented by a video” in [0039]; “The synthetic image produced from this virtual view is then overlaid on the target image at 410, and blended to produce a new, augmented image from the target view… this process can be repeated for multiple views in the multi-view image to generate an augmented reality version of the multi-view image that appears to include the synthetic object” in [0052]; see also Fig 1-3; camera calibration in [0051]. Official notice has been taken of the fact that “a camera initialization is performed before capturing and presenting the live image/video”, which is well-known in the art (see MPEP 2144.03).)
Holzer doesn’t explicitly teach object identity and live skeleton detection. The combination of Chen and HIRANO further teaches following limitations:
creating an object model of the object by performing skeleton detection on the live camera feed based on the MVIDMR (Holzer discloses “a surround view is constructed from multiple images that are captured from different locations. A computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026]. Chen further discloses creating a 3D model of the target object from the set of target images in C3L33-39; obtaining image data from different angles, positions, view-points, distances, etc. to determine a 3D triangle mesh models in C16L56-65; object recognition in C1L18-25 & C42L33-35; live-view of image data in C42L33-35; see also a 3D skeleton of a vehicle as shown in Fig 6 below:

    PNG
    media_image1.png
    473
    640
    media_image1.png
    Greyscale

Here, Chen’s live image data can be analyzed by any image recognition algorithm. For example, HIRANO discloses “the methods and systems disclosed herein may recognize objects during a live camera mode… While in live camera mode, the methods and systems disclosed herein may use machine-learning algorithms and/or trained models to automatically segment and accurately identify objects observed during the live camera mode. The objects may be isolated through the creation of a freeform boundary identifier, which may also be referred to herein as "smart dots." Smart dots may automatically form around an object within the live camera mode” in [0003]; “an object is identified and segmented within a live camera mode” in [0005]; see also Fig 4B. Here, HIRANO’s segmented object in the live camera mode by a freeform boundary identifier is interpreted as skeleton detection on the live camera feed.);
determine augmented reality overlay data based on the object identity, the augmented reality overlay data include one or more tags, each of the tags characterizing a feature of the object, each of the tags being associated with a respective location on the object, each of the respective locations being represented in a reference view of the object (Holzer discloses “receiving a selection of an anchor location in a reference image for a synthetic object to be placed within a multi-view image” in [0005];  “implementing augmented reality by adding a three-dimensional (3D) tag (also referred to herein as a synthetic object) such as an image, text, object, graphic, or the like to a multi-view image, where the multi-view image includes multiple views of a real-world environment” in [0021]; “a synthetic scene from the target view may be rendered using tracking information between the reference image and a target image from the multi-view image” in [0045]. Chen further discloses object analysis and recognition in C1L18-25 & C39L16-18. Here, Holzer’s 3D tag can be identified as image, text, object graphic or the like by Chen’s object identification or face/object detection algorithm.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer with the teaching of Chen so as to perform character or facial recognition in images to identify text, people, particular buildings, or other features of objects within images in order to automatically identify people, objects, or other features depicted within the image (Chen, col 1). The motivation of combining the teaching of HIRANO is to automatically segment and accurately identify objects observed during the live camera mode (HIRANO, [0003]).
Holzer, Chen and HIRANO don’t teach triangulating 2D skeleton joints into 3D skeleton joints. The combination of Floch further discloses following limitations:
determine an object identity for an object represented in the live camera feed during an initialization phase, the initialization phase including projecting one or more triangulated points into a designated frame based on camera pose information determined based on data collected from an inertial measurement unit at the computing device, wherein projecting the one or more triangulated points includes triangulating 2D skeleton joints into 3D skeleton joints (Chen discloses object analysis and recognition in C1L18-25 & C39L16-18. Floch further discloses “In some embodiments, the 2D-to-3D conversions of the pairs of matched 2D skeletons may involve triangulation, meaning generating a weak 3D skeleton from a pair of matched 2D skeletons includes: projecting a part of a first 2D skeleton of the pair as a first line in a 3D space; projecting the same part of the second 2D skeleton of the pair as a second line in the 3D space; and determining a 3D position locating the part for the weak 3D skeleton, based on the first and second lines” in C3L18-25; “Figure 8 schematically illustrates a triangulation way to build a weak 3D skeleton from a matching pair of two matched 2D skeletons according to embodiment of the present invention” in C4L25-27; “the source cameras 12 are calibrated so that they output their source images of the scene at the same cadence and simultaneously. The intrinsic and extrinsic parameters of the cameras are supposed to be known or calculated by using well- known calibration procedures. In particular, these calibration procedures allow the 3D object to be reconstructed into a 3D skeleton at the real scale” in C5L19-25; “Each pair of matching 2D skeletons from different views (source images) of the same scene volume can then be processed using triangulation in order to build an intermediate 3D skeleton, the robustness of which is quite low or weak. An intermediate or “weak” 3D skeleton can thus be generated, in 3D space, from each pair of matched 2D skeletons” in C7L10-13.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer, Chen and HIRANO with the teaching of Floch so as to generate weak 3D skeletons by projecting 2D skeleton parts in 3D space to determine 3D positions of those parts (Floch, Abstract), and also explain the well-known camera calibration process.

Claim 14 is rejected based upon similar rationale as Claim 2.

Claim 17 is rejected based upon similar rationale as Claim 6 & 7.
Claim 18 is rejected based upon similar rationale as Claim 8.
Claim 19 is rejected based upon similar rationale as Claim 9 & 10.
Claim 20 recites similar limitations as claim 13 but in a computer readable medium form. Therefore, the same rationale used for claim 13 is applied.

Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIMING HE whose telephone number is (571)270-1221.  The examiner can normally be reached on Monday-Friday, 8:30am-5:00pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Weiming He/
Primary Examiner, Art Unit 2612