Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/18/21 is being considered by the examiner.

Response to Amendment
The amendment filed on 1/4/22 has been entered and made of record. Claims 1, 14 and 20 are amended. Claims 1-20 are pending.

Response to Arguments
Applicant’s arguments with respect to claims 1, 14 and 20 have been considered but they are moot because the arguments do not apply to the references being used in the current rejection.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 14-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Holzer et al. (US 2017/0109930) in view of Weider et al. (US 7,949,529) and Doshi et al. (US 2019/0318759).
As to Claim 1, Holzer teaches a method comprising:
determining, via a processor, a tag characterizing a designated portion of a multi-view interactive digital media representation (MVIDMR), the MVIDMR including a plurality of images of an object, the plurality of images being navigable in one or more dimensions, the tag being determined by applying a grammar to natural language data, wherein the grammar is first identified among a plurality of different grammars based on a type of the object and a context for the tag (Holzer discloses “a surround view is constructed from multiple images that are captured from different locations” in [0026]; “a surround view is a multi-view interactive digital media representation” in [0027]; “a scene which is captured as a multi-view image data set by a device that has an inertial measurement unit (IMU)… a multi-view image data set shows a scene from different angles… An IMU provides information about the orientation of a device while capturing the images” in [0028]; “to implementing augmented reality by adding a three-dimensional (3D) tag (also referred to herein as a synthetic object) such as an image, text, object, graphic, or the like to a multi-view image… the three-dimensional tag "moves" with the multi-view image, such that as dictation grammar or a large vocabulary grammar, among other resources, to transcribe the verbal utterance into a text message” in C19L41-44; “The decoded speech may be processed by the speech recognition engine 120 using the context description grammar module 112” in C26L7-9);
determining, via the processor, an object model location for the tag based on applying the grammar to the natural language data, the object model location identifying a location within a three-dimensional model of the object (Holzer discloses “receiving a selection of an anchor location in a reference image for a synthetic object to be placed within a multi-view image” in [0005]; “to implementing augmented reality by adding a three-dimensional (3D) tag (also referred to herein as a synthetic object) such as an image, text, object, graphic, or the like to a multi-view image… the three-dimensional tag "moves" with the multi-view image, such that as objects or scenery within the multi-view image rotate or otherwise move, the three-dimensional tag also moves as if it were physically present along with the objects or scenery” in [0021]; “A computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026]; see also [0033]. Weider further discloses “wherein the speech recognition engine uses a dynamic grammar to transcribe the natural language utterance into the textual annotation” in claim 16, see also C19L41-44 & C26L7-9); and
storing an updated MVIDMR that includes the tag, the tag being located at a respective position in two or more of the plurality of images, the respective positions being determined based on the object model location (Holzer discloses a memory 503 to store data and other metadata in [0056]; “Current methods of adding three-dimensional (3D) information to video and image data generally involves creating a 3D reconstruction of the scene” in [0020]; “to implementing augmented reality by adding a three-dimensional (3D) tag (also referred to herein as a synthetic object) such as an image, text, object, graphic, or the like to a multi-view image… the three-dimensional tag "moves" with the multi-view image, such that as objects or scenery within the multi-view image rotate or otherwise move, the three-dimensional tag also moves as if it were physically present along with the objects or scenery” in [0021]; “fix the 3D tag to a reference location in the 3D space” in [0023]; dynamic overlay in [0024]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer with the teaching of Weider so as to allow the system to receive multimodal input such as text-based commands and/or voice-based commands via a speech interaction interface (Weider, C3L53-62).
Holzer and Weider don’t explicitly teach identifying grammar based on a type of object. The combination of Doshi further teaches following limitation:
wherein the grammar is first identified among a plurality of different grammars based on a type of the object and a context for the tag (Weider discloses a context description grammar 112 in Fig 5. Doshi further discloses “the semantic networks may also be referred to as a grammar file… the NLU module 140 may select a based on context information of audio signal… Additionally, context information may further indicate which types of specific shopping the user wants to engage in (e.g., ordering a pizza, purchasing flight tickets, or purchasing concert tickets)” in [0042]; “the semantic information may indicate a particular semantic network selected based on the product purchasing context such as pizza order, flight reservation, or movie ticket reservation. The semantic information may also include tagging status associated with at least one net or slot in the selected network” in [0050].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer and Weider with the teaching of Doshi so that context information may indicate that a user is about to place an order for online shopping. Additionally, context information may further indicate which types of specific shopping the user wants to engage in (Doshi, [0042]).

As to Claim 2, Holzer in view of Weider and Doshi teaches the method recited in claim 1, wherein the natural language data includes audio data, and wherein the method further comprises applying speech recognition to the audio data (Weider discloses receiving multimodal input such as voice-based commands in C3L60-62; “capturing the user's question or query through speech recognition operating in a variety of real-world environments” in C6L5-7; “The user may address spoken commands to a mobile device or desktop unit” in C8L60-61).


Claim 3, Holzer in view of Weider and Doshi teaches the method recited in claim 1, wherein applying the grammar to the natural language data comprises parsing the natural language data to identify a plurality of words (Weider discloses “In order for devices to properly respond to requests and/or commands that are submitted in a natural language form, machine processable requests and/or algorithms may be formulated after the natural form questions or commands have been parsed and interpreted” in C22L30-34; “parsing routines that are specialized to recognize particular parts of speech, such as times, locations, movie titles, and other parts of speech” in C27L51-53; see also parser 118 in Fig 5).

As to Claim 4, Holzer in view of Weider and Doshi teaches the method recited in claim 3, wherein applying the grammar to the natural language data further comprises identifying a respective semantic category for two or more of the plurality of words (Weider discloses “Conversational speech analyzer 804 also may include a semantic knowledge-based model that analyzes the textual message and detects command components” in C23L51-54; “The criteria handlers 152 may identify matching phrases and extract semantic attributes from the phrases” in C27L53-55).

As to Claim 5, Holzer in view of Weider and Doshi teaches the method recited in claim 4, wherein applying the grammar to the natural language data further comprises determining one or more phrases based on the semantic categories (Weider discloses “Conversational speech analyzer 804 also may include a semantic knowledge-based model that analyzes the textual message and detects command 

Claim 14 recites similar limitations as claim 1 but in a computing device form. Therefore, the same rationale used for claim 1 is applied.
Claim 15 is rejected based upon similar rationale as Claim 2.
Claim 16 is rejected based upon similar rationale as Claims 3-5.

Claim 20 recites similar limitations as claim 1 but in a computer readable media form. Therefore, the same rationale used for claim 1 is applied.

Claims 6-13 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Holzer in view of Weider and Doshi, further in view of Chen et al. (US 10,657,647).
As to Claim 6, Holzer in view of Weider and Doshi teaches the method recited in claim 1. The combination of Chen further teaches:
determining the object model by applying a neural network to estimate one or more two-dimensional skeleton joints for a respective one of the plurality of images (Chen discloses a base object model represented by skeleton joints in Fig 6; convolutional neural network to a body panel of an automobile using a scanning technique, to detect damage at each pixel or area segment of the body panel” in C3L50-53; “the block 312 may apply one or more convolutional neural networks 134 to the pixels of each body panel of the segmented target vehicle image to determine potential damage, or likelihood of damage to these body panels, as defined by the trained convolutional neural networks (CNNs) 134” in C26L6-11;

    PNG
    media_image1.png
    468
    639
    media_image1.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer, Weider and Doshi with the teaching of Chen so as to apply CNN to the pixels of each body panel of the segmented target image to determine potential damage, or likelihood of damage to the body panels (Chen, C26L6-11).

As to Claim 7, Holzer in view of Weider, Doshi and Chen teaches the method recited in claim 6, wherein determining the object model includes estimating pose information for a designated one of the plurality of images, the pose information including a location and angle of the camera with respect to the designated object for the designated image (Holzer discloses “a surround view is constructed from multiple images that are captured from different locations. A computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026]; “a scene which is captured as a multi-view image data set by a device that has an inertial measurement unit (IMU)… a multi-view image data set shows a scene from different angles… An IMU provides information about the orientation of a device while capturing the images” in [0028]; a triangulated 3D multi-view representation in [0043].) 

As to Claim 8, Holzer in view of Weider, Doshi and Chen teaches the method recited in claim 7, wherein determining the object model includes determining the three-dimensional skeleton of the designated object based on the two-dimensional skeleton joints and the pose information (Holzer discloses “a surround view is constructed from multiple images that are captured from different locations. A computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026]. Chen further discloses 2D/3D object model of an automobile having surface contours or surface segments in Fig 5-6; “different views of the target object from different angles” in C6L52-53.)

As to Claim 9, Holzer in view of Weider, Doshi and Chen teaches the method recited in claim 8, the method further comprising:
constructing the MVIDMR of the object from the object model by positioning each image with respect to the object model, the object model providing a correspondence between locations in the plurality of images (Holzer discloses “a surround view is constructed from multiple images that are captured from different locations. A computer processor is used to create a three-dimensional model that includes the content and context of the surround view” in [0026]; “an anchor location in the image data is tracked in order to fix the 3D tag to a reference location in the 3D space” in [0023].)

As to Claim 10, Holzer in view of Weider and Doshi teaches the method recited in claim 1. The combination of Chen further teaches wherein the object is a vehicle, and wherein each of the images depicts the vehicle from a respective viewpoint (Holzer, [0026-0028]. Chen further discloses creating a 3D model of the target object from the set of target images in C3L33-39; obtaining image data from different angles, positions, view-points, distances, etc. to determine a 3D triangle mesh models in C16L56-65, see also Fig 5-6.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer, Weider and Doshi with the teaching of Chen so as to create a 3D model of a vehicle from a set of images captured from different locations and directions.

As to Claim 11, Holzer in view of Weider and Doshi teaches the method recited in claim 1. The combination of Chen further teaches wherein the tag identifies damage to the object (Holzer discloses “implementing augmented reality by adding a three-dimensional (3D) tag (also referred to herein as a synthetic object) such as an image, text, object, graphic, or the like to a multi-view image…the three-dimensional tag "moves" with the multi-view image, such that as objects or scenery within the multi-view image rotate or otherwise move, the three-dimensional tag also moves as if it were physically present along with the objects or scenery” in [0021]. Chen further discloses “An image processing system that can be used to detect changes in objects, such as to detect damage to automobiles, buildings, and the like” in C2L7-9; see also Fig 7 & 9.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention Holzer, Weider and Doshi with the teaching of Chen so as to detect changes in objects, such as damage to automobiles, by comparing a base object model to one or more target images of the object in the changed condition (Chen, Abstract).

As to Claim 12, Holzer in view of Weider and Doshi teaches the method recited in claim 1. The combination of Chen further teaches identifying damage to the object via the processor by applying a neural network to the plurality of images (Chen discloses “FIG. 13 depicts an example application of a convolutional neural network to a body panel of an automobile using a scanning technique, to detect damage at each pixel or area segment of the body panel” in C3L50-53; “the block 312 may apply one or more convolutional neural networks 134 to the pixels of each body panel of the segmented target vehicle image to determine potential damage, or likelihood of damage 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Holzer, Weider and Doshi with the teaching of Chen so as to apply CNN to the pixels of each body panel of the segmented target image to determine potential damage, or likelihood of damage to the body panels (Chen, C26L6-11).

As to Claim 13, Holzer in view of Weider, Doshi and Chen teaches the method recited in claim 12. The combination of Chen further teaches updating the MVIDMR to include a representation of the damage, the representation comprising a heatmap layer overlain on the plurality of images (Chen discloses “the change determination may be displayed to a user in the form of a heat map that illustrates areas of the target object that have undergone change, the amount of such change, the type of such change, etc.” in C2L66-C3L3; see also Fig 14-15.)

Claim 17 is rejected based upon similar rationale as Claims 6-9.
Claim 18 is rejected based upon similar rationale as Claims 10-11.
Claim 19 is rejected based upon similar rationale as Claims 12-13.

Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIMING HE whose telephone number is (571)270-1221.  The examiner can normally be reached on Monday-Friday, 8:30am-5:00pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 

/Weiming He/
Primary Examiner, Art Unit 2612