DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 19-20, and 27-31 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spoor (U.S. Pat. App. Pub. No. 2019/0332400, hereinafter Spoor) in view of Acharya (U.S. Pat. App. Pub. No. 2014/0310595, hereinafter Acharya).

Regarding claim 1, Spoor discloses A method implemented by a processor of a computing device, comprising (AR system and method; Spoor, ¶¶ [0040]): displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (The method includes “displaying, within the scene, at least one computer-implemented virtual assistant” and “The method also includes associating a navigation object Spoor, ¶¶ [0043]; FIGS. 2 and 13); receiving user input (“ in order to receive a user’s query or command, the assistant first ‘listens’ meaning that the client-side script receives and records an audible input from the user {receiving user input}”; Spoor, ¶¶ [0040]); deriving a simplified user intent from the user input (“the server-side script relays the user’s spoken text to a conversational engine or external service 222... which interprets the user’s spoken text” where the conversational engine/external service “handles the natural language understanding used to determine a user’s intent {deriving a simplified user intent} from spoken word {from the user input}”; Spoor, ¶¶ [0040], [0026]); and in response to the user input, causing a reaction of the virtual agent within the AR scene (the system “interprets the user’s spoken text and returns or responds with the assistant’s programmed response as a text string” which “is then relayed back to the client-side script, [and]... the client-side script receives the speech and is able to output it to the user in form of an audio response... [as well as] parse and process the recognized user text to determine the extent to which such speech included commands such as response, selections and the like,” where the speech and the commands are the reaction of the virtual agent and “displaying, within a scene {within the AR scene}, at least one computer-implemented virtual assistant responsive to voice (audio) commands from a user viewing the scene”; Spoor, ¶¶ [0040], [0043]), the reaction being dependent on the simplified user intent (The “audio response” and the “response… to voice (audio) commands” {the reaction} are dependent on the “user’s [interpreted] spoken text {user intent}”; Spoor, ¶¶ [0040], [0043]). However, Spoor fails to expressly recite receiving an image from a camera, [and] recognizing at least one real-world object in the image.
Acharya teaches systems and methods for “an augmented reality virtual assistant application.” (Acharya, ¶ [0024]). Regarding claim 1, Acharya teaches receiving an image from a camera (“real-time video of a user performing a task is captured {receiving an image} through Acharya, ¶¶ [0024], [0124]); recognizing at least one real-world object in the image (“At block 1512, the system 1212 detects {recognizes} one or more physical objects {at least one real-world object} that are present in the real world scene 1200 {in the image} as captured on the video (e.g., in accordance with the field of view of the video camera). “; Acharya, ¶¶ [0124]); displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (“At block 1530, the system 1212 displays the virtual element {a virtual agent} selected at block 1528 on the real world view {at least one real world object} of the detected object(s)” where the system includes “an interactive display screen 1614 on which an augmented view {augmented reality scene}... is displayed {displaying on a screen}”; Acharya, ¶¶ [0126], [0129]); receiving user input (“At block 1522, the system 1212 interprets user input {thus, receiving user input} relating to the multi-step activity. The user input may include, for example, NL dialog, gestures, or other human-computer interactions, or a combination of different human interactions.”; Acharya, ¶¶ [0125]); deriving a simplified user intent from the user input (“The user input is interpreted {…from the user input} by, at block 1524, determining an intent of the person 1204 {deriving a simplified user intent} with respect to the real world scene 1200 and/or a current state of the real world scene 1200.”; Acharya, ¶¶ [0125]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor to incorporate the teachings of Acharya to include receiving an image from a camera, [and] recognizing at least one real-world object in the image. The determination of objects in the environment allows the system to provide an AR scene with “realistic overlays that match precisely with a real-world scene,” as recognized by Acharya. (Acharya, ¶ [0070]).

Regarding claim 2, the rejection of claim 1 is incorporated. Spoor further discloses further comprising consulting an object database to obtain properties of the at least one real-world object (The system described here is “able to determine which 3D object in a scene is or provides the context for a given message exchange. [such as] if a user looks at an object like a helicopter and says ‘what is this’… the assistant will know the user’s question is in reference to the helicopter as the helicopter is the context for the message.” where “memory 1914 may include cache memory, including a database that stores, among other information, data representing scene objects and elements and relationships there between, as well as, information or links relating to the virtual assistant(s).”; Spoor, ¶¶ [0128], [0126]), wherein the reaction is further dependent on the properties of the at least one real-world object (The system described here is “able to determine which 3D object in a scene is or provides the context for a given message exchange [such as] if a user looks at an object like a helicopter and says ‘what is this’” the assistant then provides information about the real-world object based on properties of the real-world object (the system indicates the type of object [e.g., helicopter] and properties of the object [e.g., this is the tail rotor of the helicopter].; Spoor, ¶¶ [0128]).

Regarding claim 3, the rejection of claim 2 is incorporated. Spoor further discloses wherein deriving a simplified user intent from the user input comprises converting the user input into a user phrase (The system includes “natural language understanding used to determine a user’s intent {user phrase} from spoken word {user input}, and can send back messages in a desired format for the Hootsy system to respond to in the scene,”; Spoor, ¶¶ [0026]) and converting the user phrase into the simplified user intent (The system “determines a user intent {converting the user phrase}” and “send[s] back messages in a desired format {user intent converted into the simplified user intent} for the Hootsy system to respond to in the scene,”; Spoor, ¶¶ [0026]).

Regarding claim 4, the rejection of claim 3 is incorporated. Spoor further discloses wherein the user input includes a received utterance and (“The recorded audio {user input} is processed by a speech-to-text function 216,” thus the recorded audio includes a received utterance from the user.; Spoor, ¶¶ [0040]) wherein converting the user input into a user phrase includes applying speech-to-text processing to the received utterance to produce the user phrase comprising a set of words (“the server-side script relays the user’s spoken text {user phrase} to a conversational engine or external service 222 (e.g., Dialogflow.com) which interprets the user’s spoken text and returns or responds with the assistant’s programmed response as a text string. {simplified user intent}”; Spoor, ¶¶ [0040]).

Regarding claim 19, the rejection of claim 1 is incorporated. Spoor disclose all of the elements of the current invention as stated above. However, Spoor fail(s) to expressly disclose wherein recognizing the at least one real-world object in the image comprises applying machine vision processing to the image.
The relevance of Acharya is described above with relation to claim 1. Regarding claim 19, Acharya teaches wherein recognizing the at least one real-world object in the image comprises applying machine vision processing to the image (“The mapping 1414 establishes semantic relationships between semantically equivalent terminologies so that elements of the real world can be associated with the corresponding elements of the external representations in real time... [as well as] define relationships between the real world objects, the corresponding external representations, the NL speech terminology that may be used to refer to either the real world object or the corresponding external representation, real world activities or steps of activities in which the objects are involved, and virtual elements... using, e.g., one or more machine learning techniques. “; Acharya, ¶¶ [0123]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal Spoor to incorporate the teachings of Acharya to include wherein recognizing the at least one real-world object in the image comprises applying machine vision processing to the image. The determination of objects in the environment allows the system to provide an AR scene with “realistic overlays that match precisely with a real-world scene,” as recognized by Acharya. (Acharya, ¶ [0070]).

Regarding claim 20, the rejection of claim 1 is incorporated. Spoor further discloses further comprising maintaining a 3D virtual world comprising the virtual agent and a virtual model of the real-world object (As depicted is FIG. 23A-24B, the system maintains a “a virtual assistant {virtual agent} … related to the content {thus, comprising the virtual agent} of the scene {3D virtual world}” alongside a helicopter {a virtual model of the real-world object}; Spoor, ¶¶ [0128], FIGS23A-24B).

Regarding claim 27, the rejection of claim 1 is incorporated. Spoor disclose all of the elements of the current invention as stated above. However, Spoor fail(s) to expressly disclose wherein the reaction includes a component that depends on at least one previous instance of the simplified user intent.
The relevance of Acharya is described above with relation to claim 1. Regarding claim 27, Acharya teaches wherein the reaction includes a component that depends on at least one previous instance of the simplified user intent (“The TMUM 205 is responsible for recognizing/interpreting user goals in a given state or context. The scene module 202 and language module 204 described above provide partial information about what the user is trying to do at a given time but in some cases the individual components may not have access to all the information needed to determine user goals. The TMUM 205 merges pieces of information coming from different components, such as scene understanding and language understanding in this Acharya, ¶¶ [0049]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor to incorporate the teachings of Acharya to include wherein the reaction includes a component that depends on at least one previous instance of the simplified user intent. The determination of objects in the environment allows the system to provide an AR scene with “realistic overlays that match precisely with a real-world scene,” as recognized by Acharya. (Acharya, ¶ [0070]).

Regarding claim 28, Spoor discloses A non-transitory computer-readable storage medium comprising computer-readable instructions (the system includes a “storage device (e.g., a disk drive, hard disk, solid state memory, optical disk drive, etc.)… including a database that stores, among other information, data representing scene objects and elements and relationships there between, as well as, information or links relating to the virtual assistant(s)”; Spoor, ¶¶ [0126]) which, when read and executed by at least one processor of a gaming device, cause the gaming device to carry out a method in a video game that comprises (“the application programming interface (API) facilitates the creation of virtual assistants for VR and AR that can be added to an app (application), site or game {video game}” which may be “computer-driven scenarios {…cause the gaming device to carry out a method in a video game}… for a user to interact with the computer-based system {gaming device}”; Spoor, ¶¶ [0007]): displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (The method includes “displaying, within the scene, at least one computer-implemented virtual assistant” and “The method also includes associating a navigation object (e.g., a target area) {the at least one real-world object} to the virtual assistant {virtual agent}; also see FIGS. 2 and 13, which show a virtual agent displayed alongside at least Spoor, ¶¶ [0043]; FIGS. 2 and 13); receiving user input (“ in order to receive a user’s query or command, the assistant first ‘listens’ meaning that the client-side script receives and records an audible input from the user {receiving user input}”; Spoor, ¶¶ [0040]); deriving a simplified user intent from the user input (“the server-side script relays the user’s spoken text to a conversational engine or external service 222... which interprets the user’s spoken text” where the conversational engine/external service “handles the natural language understanding used to determine a user’s intent {deriving a simplified user intent} from spoken word {from the user input}”; Spoor, ¶¶ [0040], [0026]); and in response to the user input, animating the virtual agent within the AR scene (the system “interprets the user’s spoken text and returns or responds with the assistant’s programmed response as a text string” which “is then relayed back to the client-side script, [and]... the client-side script receives the speech and is able to output it to the user in form of an audio response... [as well as] parse and process the recognized user text to determine the extent to which such speech included commands such as response, selections and the like,” where the speech and the commands are the reaction of the virtual agent and “displaying, within a scene {within the AR scene}, at least one computer-implemented virtual assistant responsive to voice (audio) commands from a user viewing the scene” including “the selected animation type to accompany verbal/speech output (e.g., mouth move, hands moving, head nodding, etc.) {animation}”; Spoor, ¶¶ [0040], [0043], [0031]), the reaction being dependent on the simplified user intent (The “audio response” and the “response… to voice (audio) commands” {the reaction} are dependent on the “user’s [interpreted] spoken text {user intent}”; Spoor, ¶¶ [0040], [0043]). However, Spoor fails to expressly recite receiving an image from a camera, [and] recognizing at least one real-world object in the image.
The relevance of Acharya is described above with relation to claim 1. Regarding claim 28, Acharya teaches receiving an image from a camera (“real-time video of a user performing a task is captured {receiving an image} through a visual sensor such as a camera {from a camera}” where “At block 1510, the system 1212 analyzes video depicting a real world scene (illustratively, Acharya, ¶¶ [0024], [0124]); recognizing at least one real-world object in the image (“At block 1512, the system 1212 detects {recognizes} one or more physical objects {at least one real-world object} that are present in the real world scene 1200 {in the image} as captured on the video (e.g., in accordance with the field of view of the video camera). “; Acharya, ¶¶ [0124]); displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (“At block 1530, the system 1212 displays the virtual element {a virtual agent} selected at block 1528 on the real world view {at least one real world object} of the detected object(s)” where the system includes “an interactive display screen 1614 on which an augmented view {augmented reality scene}... is displayed {displaying on a screen}”; Acharya, ¶¶ [0126], [0129]); receiving user input (“At block 1522, the system 1212 interprets user input {thus, receiving user input} relating to the multi-step activity. The user input may include, for example, NL dialog, gestures, or other human-computer interactions, or a combination of different human interactions.”; Acharya, ¶¶ [0125]); deriving a simplified user intent from the user input (“The user input is interpreted {…from the user input} by, at block 1524, determining an intent of the person 1204 {deriving a simplified user intent} with respect to the real world scene 1200 and/or a current state of the real world scene 1200.”; Acharya, ¶¶ [0125]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor to incorporate the teachings of Acharya to include receiving an image from a camera, [and] recognizing at least one real-world object in the image. The determination of objects in the environment allows the system to provide an AR scene with “realistic overlays that match precisely with a real-world scene,” as recognized by Acharya. (Acharya, ¶ [0070]).

Regarding claim 29, Spoor discloses A non-transitory computer-readable storage medium comprising computer-readable instructions (the system includes a “storage device Spoor, ¶¶ [0126]) which, when read and executed by at least one processor of a gaming device, cause the gaming device to carry out a method in a video game that comprises (“the application programming interface (API) facilitates the creation of virtual assistants for VR and AR that can be added to an app (application), site or game {video game}” which may be “computer-driven scenarios {…cause the gaming device to carry out a method in a video game}… for a user to interact with the computer-based system {gaming device}”; Spoor, ¶¶ [0007]): displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (The method includes “displaying, within the scene, at least one computer-implemented virtual assistant” and “The method also includes associating a navigation object (e.g., a target area) {the at least one real-world object} to the virtual assistant {virtual agent}; also see FIGS. 2 and 13, which show a virtual agent displayed alongside at least one real world object.; Spoor, ¶¶ [0043]; FIGS. 2 and 13); receiving user input (“ in order to receive a user’s query or command, the assistant first ‘listens’ meaning that the client-side script receives and records an audible input from the user {receiving user input}”; Spoor, ¶¶ [0040]); deriving a simplified user intent from the user input (“the server-side script relays the user’s spoken text to a conversational engine or external service 222... which interprets the user’s spoken text” where the conversational engine/external service “handles the natural language understanding used to determine a user’s intent {deriving a simplified user intent} from spoken word {from the user input}”; Spoor, ¶¶ [0040], [0026]); and in response to the user input, causing a reaction of the virtual agent within the AR scene (the system “interprets the user’s spoken text and returns or responds with the assistant’s programmed response as a text string” which “is then relayed back to the client-side script, [and]... the client-side script receives the speech and is able to output it to the user in form of an audio response... [as well as] parse and Spoor, ¶¶ [0040], [0043], [0031]), the reaction being dependent on the simplified user intent (The “audio response” and the “response… to voice (audio) commands” {the reaction} are dependent on the “user’s [interpreted] spoken text {user intent}”; Spoor, ¶¶ [0040], [0043]). However, Spoor fails to expressly recite receiving an image from a camera, [and] recognizing at least one real-world object in the image.
The relevance of Acharya is described above with relation to claim 1. Regarding claim 29, Acharya teaches receiving an image from a camera (“real-time video of a user performing a task is captured {receiving an image} through a visual sensor such as a camera {from a camera}” where “At block 1510, the system 1212 analyzes video depicting a real world scene (illustratively, a scene of a multi-step activity).”; Acharya, ¶¶ [0024], [0124]); recognizing at least one real-world object in the image (“At block 1512, the system 1212 detects {recognizes} one or more physical objects {at least one real-world object} that are present in the real world scene 1200 {in the image} as captured on the video (e.g., in accordance with the field of view of the video camera). “; Acharya, ¶¶ [0124]); displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (“At block 1530, the system 1212 displays the virtual element {a virtual agent} selected at block 1528 on the real world view {at least one real world object} of the detected object(s)” where the system includes “an interactive display screen 1614 on which an augmented view {augmented reality scene}... is displayed {displaying on a screen}”; Acharya, ¶¶ [0126], [0129]); receiving user input (“At block 1522, the system 1212 interprets user input {thus, receiving user input} relating to the multi-step activity. Acharya, ¶¶ [0125]); deriving a simplified user intent from the user input (“The user input is interpreted {…from the user input} by, at block 1524, determining an intent of the person 1204 {deriving a simplified user intent} with respect to the real world scene 1200 and/or a current state of the real world scene 1200.”; Acharya, ¶¶ [0125]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor to incorporate the teachings of Acharya to include receiving an image from a camera, [and] recognizing at least one real-world object in the image. The determination of objects in the environment allows the system to provide an AR scene with “realistic overlays that match precisely with a real-world scene,” as recognized by Acharya. (Acharya, ¶ [0070]).

Regarding claim 30, Spoor discloses A gaming device (“computer-driven scenarios… for a user to interact with the computer-based system {gaming device}”; Spoor, ¶¶ [0007]) comprising at least one processor (“The virtual assistant … including a processor and a memory”; Spoor, ¶¶ [0043]) and a memory storing instructions for execution by the processor (the system includes a “storage device (e.g., a disk drive, hard disk, solid state memory, optical disk drive, etc.)… including a database that stores, among other information, data representing scene objects and elements and relationships there between, as well as, information or links relating to the virtual assistant(s)”; Spoor, ¶¶ [0126]), at least one input device configured to receive input from a user, at least one output device configured for providing output to the user (“various input/output (I/O) devices 1916, such as a display {at least one output device}, a keyboard, a mouse, a sensor, a stylus, a microphone or transducer” {at least one input device}; Spoor, ¶¶ [0043]), the at least one processor configured to execute the instructions in the memory for implementing an interactive computer program that generates the output in response to the received input (“The virtual assistant is implemented by a VR or AR display system including a processor and a memory, with computer code instructions (e.g., VR or AR app) and is configured to implement the virtual assistant and respond to user requests,” Spoor, ¶¶ [0126]) and, the interactive computer program including at least one process that comprises (“the application programming interface (API) facilitates the creation of virtual assistants for VR and AR that can be added to an app (application), site or game {video game}”; Spoor, ¶¶ [0007]): displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (The method includes “displaying, within the scene, at least one computer-implemented virtual assistant” and “The method also includes associating a navigation object (e.g., a target area) {the at least one real-world object} to the virtual assistant {virtual agent}; also see FIGS. 2 and 13, which show a virtual agent displayed alongside at least one real world object.; Spoor, ¶¶ [0043]; FIGS. 2 and 13); receiving user input (“ in order to receive a user’s query or command, the assistant first ‘listens’ meaning that the client-side script receives and records an audible input from the user {receiving user input}”; Spoor, ¶¶ [0040]); deriving a simplified user intent from the user input (“the server-side script relays the user’s spoken text to a conversational engine or external service 222... which interprets the user’s spoken text” where the conversational engine/external service “handles the natural language understanding used to determine a user’s intent {deriving a simplified user intent} from spoken word {from the user input}”; Spoor, ¶¶ [0040], [0026]); and in response to the user input, animating the virtual agent within the AR scene (the system “interprets the user’s spoken text and returns or responds with the assistant’s programmed response as a text string” which “is then relayed back to the client-side script, [and]... the client-side script receives the speech and is able to output it to the user in form of an audio response... [as well as] parse and process the recognized user text to determine the extent to which such speech included commands such as response, selections and the like,” where the speech and the commands are the reaction of the virtual agent and “displaying, within a scene {within the AR Spoor, ¶¶ [0040], [0043], [0031]), the reaction being dependent on the simplified user intent (The “audio response” and the “response… to voice (audio) commands” {the reaction} are dependent on the “user’s [interpreted] spoken text {user intent}”; Spoor, ¶¶ [0040], [0043]). However, Spoor fails to expressly recite receiving an image from a camera, [and] recognizing at least one real-world object in the image.
The relevance of Acharya is described above with relation to claim 1. Regarding claim 30, Acharya teaches receiving an image from a camera (“real-time video of a user performing a task is captured {receiving an image} through a visual sensor such as a camera {from a camera}” where “At block 1510, the system 1212 analyzes video depicting a real world scene (illustratively, a scene of a multi-step activity).”; Acharya, ¶¶ [0024], [0124]); recognizing at least one real-world object in the image (“At block 1512, the system 1212 detects {recognizes} one or more physical objects {at least one real-world object} that are present in the real world scene 1200 {in the image} as captured on the video (e.g., in accordance with the field of view of the video camera). “; Acharya, ¶¶ [0124]); displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (“At block 1530, the system 1212 displays the virtual element {a virtual agent} selected at block 1528 on the real world view {at least one real world object} of the detected object(s)” where the system includes “an interactive display screen 1614 on which an augmented view {augmented reality scene}... is displayed {displaying on a screen}”; Acharya, ¶¶ [0126], [0129]); receiving user input (“At block 1522, the system 1212 interprets user input {thus, receiving user input} relating to the multi-step activity. The user input may include, for example, NL dialog, gestures, or other human-computer interactions, or a combination of different human interactions.”; Acharya, ¶¶ [0125]); deriving a simplified user intent from the user input (“The user input is interpreted {…from the user input} Acharya, ¶¶ [0125]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor to incorporate the teachings of Acharya to include receiving an image from a camera, [and] recognizing at least one real-world object in the image. The determination of objects in the environment allows the system to provide an AR scene with “realistic overlays that match precisely with a real-world scene,” as recognized by Acharya. (Acharya, ¶ [0070]).

Regarding claim 31, Spoor discloses A gaming device (“computer-driven scenarios… for a user to interact with the computer-based system {gaming device}”; Spoor, ¶¶ [0007]) comprising at least one processor (“The virtual assistant … including a processor and a memory”; Spoor, ¶¶ [0043]) and a memory storing instructions for execution by the processor (the system includes a “storage device (e.g., a disk drive, hard disk, solid state memory, optical disk drive, etc.)… including a database that stores, among other information, data representing scene objects and elements and relationships there between, as well as, information or links relating to the virtual assistant(s)”; Spoor, ¶¶ [0126]), at least one input device configured to receive input from a user, at least one output device configured for providing output to the user (“various input/output (I/O) devices 1916, such as a display {at least one output device}, a keyboard, a mouse, a sensor, a stylus, a microphone or transducer” {at least one input device}; Spoor, ¶¶ [0043]), the at least one processor configured to execute the instructions in the memory for implementing an interactive computer program that generates the output in response to the received input (“The virtual assistant is implemented by a VR or AR display system including a processor and a memory, with computer code instructions (e.g., VR or AR app) and is configured to implement the virtual assistant and respond Spoor, ¶¶ [0126]) and, the interactive computer program including at least one process that comprises (“the application programming interface (API) facilitates the creation of virtual assistants for VR and AR that can be added to an app (application), site or game {video game}”; Spoor, ¶¶ [0007]): displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (The method includes “displaying, within the scene, at least one computer-implemented virtual assistant” and “The method also includes associating a navigation object (e.g., a target area) {the at least one real-world object} to the virtual assistant {virtual agent}; also see FIGS. 2 and 13, which show a virtual agent displayed alongside at least one real world object.; Spoor, ¶¶ [0043]; FIGS. 2 and 13); receiving user input (“ in order to receive a user’s query or command, the assistant first ‘listens’ meaning that the client-side script receives and records an audible input from the user {receiving user input}”; Spoor, ¶¶ [0040]); deriving a simplified user intent from the user input (“the server-side script relays the user’s spoken text to a conversational engine or external service 222... which interprets the user’s spoken text” where the conversational engine/external service “handles the natural language understanding used to determine a user’s intent {deriving a simplified user intent} from spoken word {from the user input}”; Spoor, ¶¶ [0040], [0026]); and in response to the user input, causing a reaction of the virtual agent within the AR scene (the system “interprets the user’s spoken text and returns or responds with the assistant’s programmed response as a text string” which “is then relayed back to the client-side script, [and]... the client-side script receives the speech and is able to output it to the user in form of an audio response... [as well as] parse and process the recognized user text to determine the extent to which such speech included commands such as response, selections and the like,” where the speech and the commands are the reaction of the virtual agent and “displaying, within a scene {within the AR scene}, at least one computer-implemented virtual assistant responsive to voice (audio) commands from a user viewing the scene” including “the selected animation type to accompany verbal/speech output (e.g., mouth move, hands moving, head nodding, etc.) Spoor, ¶¶ [0040], [0043], [0031]), the reaction being dependent on the simplified user intent (The “audio response” and the “response… to voice (audio) commands” {the reaction} are dependent on the “user’s [interpreted] spoken text {user intent}”; Spoor, ¶¶ [0040], [0043]). However, Spoor fails to expressly recite receiving an image from a camera, [and] recognizing at least one real-world object in the image.
The relevance of Acharya is described above with relation to claim 1. Regarding claim 31, Acharya teaches receiving an image from a camera (“real-time video of a user performing a task is captured {receiving an image} through a visual sensor such as a camera {from a camera}” where “At block 1510, the system 1212 analyzes video depicting a real world scene (illustratively, a scene of a multi-step activity).”; Acharya, ¶¶ [0024], [0124]); recognizing at least one real-world object in the image (“At block 1512, the system 1212 detects {recognizes} one or more physical objects {at least one real-world object} that are present in the real world scene 1200 {in the image} as captured on the video (e.g., in accordance with the field of view of the video camera). “; Acharya, ¶¶ [0124]); displaying on a screen an augmented reality (AR) scene including the at least one real-world object and a virtual agent (“At block 1530, the system 1212 displays the virtual element {a virtual agent} selected at block 1528 on the real world view {at least one real world object} of the detected object(s)” where the system includes “an interactive display screen 1614 on which an augmented view {augmented reality scene}... is displayed {displaying on a screen}”; Acharya, ¶¶ [0126], [0129]); receiving user input (“At block 1522, the system 1212 interprets user input {thus, receiving user input} relating to the multi-step activity. The user input may include, for example, NL dialog, gestures, or other human-computer interactions, or a combination of different human interactions.”; Acharya, ¶¶ [0125]); deriving a simplified user intent from the user input (“The user input is interpreted {…from the user input} by, at block 1524, determining an intent of the person 1204 {deriving a simplified user intent} with respect to the real world scene 1200 and/or a current state of the real world scene 1200.”; Acharya, ¶¶ [0125]).
Spoor to incorporate the teachings of Acharya to include receiving an image from a camera, [and] recognizing at least one real-world object in the image. The determination of objects in the environment allows the system to provide an AR scene with “realistic overlays that match precisely with a real-world scene,” as recognized by Acharya. (Acharya, ¶ [0070]).

Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spoor and Acharya as applied to claim 3 above, and further in view of Miyahira (U.S. Pat. App. Pub. No. 2006/0167675, hereinafter Miyahira).

Regarding claim 5, the rejection of claim 3 is incorporated. Spoor and Acharya disclose all of the elements of the current invention as stated above. Acharya further discloses wherein the user input comprises text (“illustrative augmented reality-capable virtual personal assistant computing system 1212 includes a number of devices 1214, 1216, 1218, 1220 that receive or generate multi-modal inputs, such as video 1222, audio 1224, location/orientation data 1226, and human computer interaction data (e.g., gestures, “taps,” mouse clicks, keypad input, etc.)” thus, the user input can comprise text; Acharya, ¶¶ [0094]). However, Spoor and Acharya fail to expressly recite wherein converting the user input into a user phrase includes carrying out at least one of spell checking, grammar checking and translation to produce the user phrase comprising a set of words.
Miyahira teaches systems and methods “for recognizing an emphasized word in a sentence to automatically translate the sentence.” (Miyahira, ¶ [0003]). Regarding claim 5, Miyahira teaches wherein converting the user input into a user phrase includes carrying out at least one of spell checking, grammar checking and translation to produce the user phrase comprising a set of words (The system can include translation, where “during the Miyahira, ¶¶ [0045], [0017]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya to incorporate the teachings of Miyahira to include wherein converting the user input into a user phrase includes carrying out at least one of spell checking, grammar checking and translation to produce the user phrase comprising a set of words. The systems of Miyahara are “capable of properly translating sentences, even if they contain unregistered words such as emphasized words,” thus allowing for improved understanding of “informal expressions as in colloquial text.” (Miyahira, ¶¶ [0011]-[0012]).

Claims 6-9 and 25-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spoor and Acharya as applied to claim 3 above, and further in view of Bui (U.S. Pat. App. Pub. No. 2020/0160042, hereinafter Bui).

Regarding claim 6, the rejection of claim 3 is incorporated. Spoor and Acharya disclose all of the elements of the current invention as stated above. Spoor further discloses wherein converting the user phrase into the simplified user intent comprises determining at least one semantic element in the user phrase (Determining the user intent includes determining “key terms like ‘pizza hut’,” where key terms are semantic elements in the user phrase; Spoor, ¶¶ Spoor and Acharya fail to expressly recite wherein converting the user phrase into the simplified user intent comprises determining at least one semantic element in the user phrase and converting the at least one semantic element into the simplified user intent.
Bui teaches systems and methods “to select and edit objects within a digital image based on verbal and gesture input.” (Bui, ¶ [0023]). Regarding claim 6, Bui teaches wherein converting the user phrase into the simplified user intent comprises determining at least one semantic element in the user phrase (“the multimodal selection system can implement a natural language processing deep learning model {converting the user phrase into...} to determine user intent {the simplified user intent}” including determining “intent and semantic slots” from the “verbal input” {at least one semantic element}; Bui, ¶¶ [0023]), and converting the at least one semantic element into the simplified user intent (by “map[ping] user commands to intent and semantic slots {at least one semantic element in the user phrase}”; Bui, ¶¶ [0023]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya to incorporate the teachings of Bui to include wherein converting the user phrase into the simplified user intent comprises determining at least one semantic element in the user phrase and converting the at least one semantic element into the simplified user intent. The systems and methods described in Bui “improve accuracy in interpreting poorly-defined, general verbal input.” (Bui, ¶ [0031]).

Regarding claim 7, the rejection of claim 6 is incorporated. Spoor and Acharya disclose all of the elements of the current invention as stated above. However, Spoor and Acharya fail to expressly recite wherein the at least one semantic element includes at least one of an intent identifier and a semantic object tag.
Bui is described above with relation to claim 6. Regarding claim 7, Bui teaches wherein the at least one semantic element includes at least one of an intent identifier and a semantic object tag (“The multimodal selection system can utilize a natural language processing neural network to determine a verbal command based on the verbal input, where the verbal command includes a verbal object class {semantic object tag} and a verbal intention {intent identifier}.”; Bui, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya to incorporate the teachings of Bui to include wherein the at least one semantic element includes at least one of an intent identifier and a semantic object tag. The systems and methods described in Bui “improve accuracy in interpreting poorly-defined, general verbal input.” (Bui, ¶ [0031]).

Regarding claim 8, the rejection of claim 6 is incorporated. Spoor and Acharya disclose all of the elements of the current invention as stated above. However, Spoor and Acharya fail to expressly recite wherein the at least one semantic element includes an intent identifier and a semantic object tag.
The relevance of Bui is described above with relation to claim 6. Regarding claim 8, Bui teaches wherein the at least one semantic element includes an intent identifier and a semantic object tag (“The multimodal selection system can utilize a natural language processing neural network to determine a verbal command based on the verbal input, where the verbal command includes a verbal object class {semantic object tag} and a verbal intention {intent identifier}.”; Bui, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya Bui to include wherein the at least one semantic element includes an intent identifier and a semantic object tag. The systems and methods described in Bui “improve accuracy in interpreting poorly-defined, general verbal input.” (Bui, ¶ [0031]).

Regarding claim 9, the rejection of claim 8 is incorporated. Spoor and Acharya disclose all of the elements of the current invention as stated above. However, Spoor and Acharya fail to expressly recite wherein converting the at least one semantic element into the simplified user intent comprises: (i) obtaining from an intent database an intent associated with the intent identifier; (ii) obtaining from an object database an object identifier associated with the semantic object tag; and (iii) concatenating the intent with the object identifier to create the simplified user intent.
The relevance of Bui is described above with relation to claim 6. Regarding claim 9, Bui teaches wherein converting the at least one semantic element into the simplified user intent comprises: (i) obtaining from an intent database an intent associated with the intent identifier (“The multimodal selection system can utilize a natural language processing neural network to determine a verbal command based on the verbal input, where the verbal command includes... a verbal intention {intent identifier}.” where “the multimodal selection system can utilize an LSTM and CRF to map {obtaining...} user commands [including the verbal intention] {associated with the intent identifier} to intent... {intent associated with...} slots of a table {from an intent database}.”; Bui, ¶¶ [0024], [0026]); (ii) obtaining from an object database an object identifier associated with the semantic object tag (“The multimodal selection system can utilize a natural language processing neural network to determine a verbal command based on the verbal input, where the verbal command includes a verbal object class {semantic object tag} “ where “the multimodal selection system can utilize an LSTM and CRF to map {obtaining...} user commands [including the verbal object class] {associated with the semantic object tag} to... semantic {an object identifier associated with...) slots of a table {from an object database}.”; Bui,  and (iii) concatenating the intent with the object identifier to create the simplified user intent (“The multimodal selection system can then analyze slots in the table {concatenating the intent with the object identifier...} to generate a modified digital image and/or determine if clarification input is needed. {...to create a simplified user intent}”; Bui, ¶¶ [0024], [0026]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya to incorporate the teachings of Bui to include wherein converting the at least one semantic element into the simplified user intent comprises: (i) obtaining from an intent database an intent associated with the intent identifier; (ii) obtaining from an object database an object identifier associated with the semantic object tag; and (iii) concatenating the intent with the object identifier to create the simplified user intent. The systems and methods described in Bui “improve accuracy in interpreting poorly-defined, general verbal input.” (Bui, ¶ [0031]).

Regarding claim 25, the rejection of claim 9 is incorporated. Spoor discloses wherein the reaction includes a component that depends on a distance, in a 3D virtual space, between the virtual agent and the object associated with the object identifier (“Another factor that may be employed with regard to operations 2020 and 2022 above, where the system monitors and detects an intention of the user to interact with the virtual assistant, is an interaction boundary” where the user is an object in the virtual space {e.g., user 2030, indicated in the virtual space}, and where an interaction boundary is a measure of the virtual space between the user and the virtual assistant {virtual agent}.; Spoor, ¶¶ [0044], FIG. 21).

Regarding claim 26, the rejection of claim 9 is incorporated. Spoor discloses wherein the reaction includes a component that depends on a distance, in a 3D virtual space, between the virtual agent and a property of the object associated with the object identifier (“Another factor that may be employed with regard to operations 2020 and 2022 above, where the system monitors and detects an intention of the user to interact with the virtual assistant, is an interaction boundary” where the user is an object in the virtual space {e.g., user 2030, indicated in the virtual space}, and where an interaction boundary is a measure of the virtual space between the user and the virtual assistant {virtual agent}. Further, the system includes that “while a circular interaction boundary 2114 is shown, it will be appreciated that alternative shapes 2116 and/or adjustable settings (e.g., radius)” may be used. In the case of alternate shapes, the coordinate position of the user with reference to the virtual assistant is used to determine whether the user is within the interaction boundary, where the coordinate position of the user is a property of the user as an object and associated with the object {“It will be appreciated that various coordinate systems, and actual (e.g., global positioning system (GPS)) or similar coordinates may be used, or a relative system may be employed (e.g., relative to the area 2110).”}.; Spoor, ¶¶ [0044], FIG. 21).

Claims 10-13, 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spoor, Acharya, and Bui as applied to claim 6 and 8 above, and further in view of Anorga (U.S. Pat. App. Pub. No. 2020/0301959, hereinafter Anorga).

Regarding claim 10, the rejection of claim 6 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. However, Spoor, Acharya, and Bui fail to expressly recite wherein determining at least one semantic element in the user phrase comprises consulting an intent database that stores intent identifiers in an attempt to recognize at least one of the intent identifiers as being present in the user phrase.
Anorga teaches “interface elements for directed display of content data items.” (Anorga, ¶ [0004]). Regarding claim 10, Anorga teaches wherein determining at least one semantic element in the user phrase comprises consulting an intent database that stores intent identifiers in an attempt to recognize at least one of the intent identifiers as being present in the user phrase (“the methods can search eligible content data items {consulting an intent database...in an attempt to recognize...} for [content] characteristics {that stores intent identifiers} that are related to {as being present} the commanded characteristic {...in the user phrase}, e.g., search for characteristics pre-associated with the commanded characteristic or which are determined to be within a type or category of the commanded characteristic {in an attempt to recognize at least one of the intent identifiers as being present}.”; Anorga, ¶¶ [0155]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Anorga to include wherein determining at least one semantic element in the user phrase comprises consulting an intent database that stores intent identifiers in an attempt to recognize at least one of the intent identifiers as being present in the user phrase. “A technical effect of described techniques and features is a reduction in the consumption of system processing resources, such as display and search processing and power consumption, utilized by a system that does not provide one or more of the described techniques or features,” as recognized by Anorga. (Anorga, ¶ [0029]).

Regarding claim 11, the rejection of claim 10 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. However, Spoor, Acharya, and Bui fail to expressly recite wherein determining at least one semantic element in the user phrase comprises consulting an object database that stores semantic object tags in an attempt to recognize at least one of the semantic object tags as being present in the user phrase.
Anorga is described above with relation to claim 10. Regarding claim 11, Anorga teaches wherein determining at least one semantic element in the user phrase comprises consulting an object database that stores semantic object tags in an attempt to recognize at least one of the semantic object tags as being present in the user phrase (as described with reference to an example, “the collection of content data items {object database} [can be] searched {consulting an...} for content data items {object identifier} having a matching text tag [to the search topic] {that stores semantic object tags} that describes the content [characteristics] {in an attempt to recognize at least one of the semantic object tags} of the content data item... {as being present in the user phrase}”; Anorga, ¶¶ [0065]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Anorga to include wherein determining at least one semantic element in the user phrase comprises consulting an object database that stores semantic object tags in an attempt to recognize at least one of the semantic object tags as being present in the user phrase. “A technical effect of described techniques and features is a reduction in the consumption of system processing resources, such as display and search processing and power consumption, utilized by a system that does not provide one or more of the described techniques or features,” as recognized by Anorga. (Anorga, ¶ [0029]).

Regarding claim 12, the rejection of claim 11 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. Bui further discloses an intent identifier recognized as being present in the user phrase (“the verbal command {present in the user phrase} includes... a verbal intention {intent identifier}.”; Bui, ¶¶ [0024]); [and] a semantic object tag recognized as being present in the user phrase (“the verbal command Bui, ¶¶ [0024]). However, Spoor, Acharya, and Bui fail to expressly recite wherein the intent database stores an association between the intent identifiers and corresponding intents, wherein the object database stores an association between the semantic object tags and corresponding object identifiers , and wherein converting the at least one semantic element into the simplified user intent comprises (i) obtaining from the intent database the intent associated with an intent identifier recognized as being present in the user [input]; (ii) obtaining from the object database an object identifier associated with a semantic object tag recognized as being present in the user [input]; and (ii) concatenating the intent with the object identifier to create the simplified user intent.
The relevance of Anorga is described above with relation to claim 10. Regarding claim 12, Anorga teaches wherein the intent database stores an association between the intent identifiers and corresponding intents, (the system “identif[ies] search topics in the collection of content data items” and where the content characteristics “and the corresponding collection of content data items can be stored in accessible storage {intent database}” and “the characteristics... are converted into search topics” thus, the association of [content] characteristics {intent identifiers} and search topics {corresponding intents} are stored in the collection of content data items {intent database}; Anorga, ¶¶ [0070], [0050], [0092]) wherein the object database stores an association between the semantic object tags and corresponding object identifiers (as described with reference to an example, “the collection of content data items {object database} [can be] searched for {thus, obtaining} content data items {object identifier} having a matching text tag [to the search topic] {semantic object tag} that describes the content [characteristics] of the content data item...” where a text tag {semantic object tag} “matching” the content data items {object identifiers} is an association, and where searching a collection for said matching is storing said association.; Anorga, ¶¶ [0065]), and wherein converting the at least one semantic element into the simplified user intent comprises (i) obtaining from the intent database the intent associated with an intent identifier recognized as being present in the user [input] (The system can include content characteristics {intent identifier}, where the content characteristics “can be any of a variety of different characteristics... [such as] identifications of content features depicted in the searched content data items or otherwise represented in the searched content data items “ and where the content characteristics “and the corresponding collection of content data items can be stored in accessible storage {intent database}... [and] can be received from a different device and/or user over a network connection. {obtaining from...}” and where “search topics...can be determined based on...detecting a voice in an audio segment”; Anorga, ¶¶ [0078], [0050]); (ii) obtaining from the object database an object identifier associated with a semantic object tag recognized as being present in the user [input] (as described with reference to an example, “the collection of content data items {object database} [can be] searched for {thus, obtaining} content data items {object identifier} having a matching text tag [to the search topic] {semantic object tag} that describes the content [characteristics] of the content data item...”; Anorga, ¶¶ [0065]); and (ii) concatenating the intent with the object identifier to create the simplified user intent (“the collection of content data items [can be] searched for content data items having a matching text tag [to the search topic] that describes the content [characteristics] of the content data item.” As such, the matching text tag {intent identifier and the intent} is linked {concatenating...} to the content characteristic {the object identifier} to create the search topic {simplified user intent}.; Anorga, ¶¶ [0065]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Anorga to include wherein the intent database stores an association between the intent identifiers and corresponding intents, wherein the object database stores an association between the semantic object tags and corresponding object identifiers , and wherein converting the at least one semantic element into the simplified user intent comprises (i) obtaining from the intent Anorga. (Anorga, ¶ [0029]).

Regarding claim 13, the rejection of claim 12 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. However, Spoor, Acharya, and Bui fail to expressly recite further comprising requiring that the object identifier obtained from the object database correspond to an object that is in a field of view of the camera.
The relevance of Anorga is described above with relation to claim 10. Regarding claim 13, Anorga teaches further comprising requiring that the object identifier obtained from the object database correspond to an object that is in a field of view of the camera (“In some implementations, the search topics in the search topic elements are determined based on one or more characteristics associated with eligible content elements {objects and associated object identifiers} that are within a threshold distance of one or more content elements displayed in the display view 602 {in the field of view of the camera}”; Anorga, ¶¶ [0164]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Anorga to include further comprising requiring that the object identifier obtained from the object database correspond to an object that is in a field of view of the camera. “A technical effect of described techniques and features is a reduction in the consumption of system processing Anorga. (Anorga, ¶ [0029]).

Regarding claim 17, the rejection of claim 8 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. However, Spoor, Acharya, and Bui fail to expressly recite wherein converting the at least one semantic element into the simplified user intent comprises: (i) obtaining from an intent database an intent associated with the intent identifier; (ii) obtaining from an object database a plurality of object identifiers associated with the semantic object tag, the object identifiers corresponding to different objects in the AR scene; and (iii) concatenating the intent with the plurality of object identifiers to create the simplified user intent.
The relevance of Anorga is described above with relation to claim 10. Regarding claim 17, Anorga teaches wherein converting the at least one semantic element into the simplified user intent comprises: (i) obtaining from an intent database an intent associated with the intent identifier (the system “identif[ies] search topics in the collection of content data items” and where the content characteristics “and the corresponding collection of content data items can be stored in accessible storage {intent database}” and “the characteristics... are converted into search topics” thus, the association of [content] characteristics {intent identifiers} and search topics {corresponding intents} are stored in the collection of content data items {intent database}; Anorga, ¶¶ [0070], [0050], [0092]); (ii) obtaining from an object database a plurality of object identifiers associated with the semantic object tag (as described with reference to an example, “the collection of content data items {object database} [can be] searched for {thus, obtaining} content data items {object identifier} having a matching text tag [to the search topic] {semantic object tag} that describes the content [characteristics] of the content data item...” where a text tag {semantic object tag} “matching” the content data items {object identifiers} is an Anorga, ¶¶ [0065]), the object identifiers corresponding to different objects in the AR scene (“, the content characteristics can include content features depicted in the one or more content data items (e.g., persons, objects, activities, or other features depicted in images)” and where the scene may be an AR scene.; Anorga, ¶¶ [0024], [0033]); and (iii) concatenating the intent with the plurality of object identifiers to create the simplified user intent (“the collection of content data items [can be] searched for content data items having a matching text tag [to the search topic] {intent identifier} that describes the content [characteristics] of the content data item.” As such, the matching text tag {intent identifier and the intent} is linked {concatenating...} to the content characteristic {the object identifier} to create the search topic {simplified user intent}.; Anorga, ¶¶ [0065]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Anorga to include wherein converting the at least one semantic element into the simplified user intent comprises: (i) obtaining from an intent database an intent associated with the intent identifier; (ii) obtaining from an object database a plurality of object identifiers associated with the semantic object tag, the object identifiers corresponding to different objects in the AR scene; and (iii) concatenating the intent with the plurality of object identifiers to create the simplified user intent. “A technical effect of described techniques and features is a reduction in the consumption of system processing resources, such as display and search processing and power consumption, utilized by a system that does not provide one or more of the described techniques or features,” as recognized by Anorga. (Anorga, ¶ [0029]).

Claims 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spoor, Acharya, and Bui as applied to claim 6 above, and further in view of Wabgaonkar (U.S. Pat. App. Pub. No. 2019/0385595, hereinafter Wabgaonkar).

Regarding claim 14, the rejection of claim 6 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. However, Spoor, Acharya, and Bui fail to expressly recite wherein the at least one semantic element includes a plurality of intent identifiers.
Wabgaonkar teaches “system and method for spoken language understanding.” (Wabgaonkar, ¶ [0002]). Regarding claim 14, Wabgaonkar teaches wherein the at least one semantic element includes a plurality of intent identifiers (“Examples of dialogue act categories {semantic elements} include question, greeting, command, and information {a plurality of intent identifiers}”; Wabgaonkar, ¶¶ [0061]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Wabgaonkar to include wherein the at least one semantic element includes a plurality of intent identifiers. “Providing more information to the spoken language understanding system” using the implementations described in Wabgaonkar “can improve the efficiency and accuracy of the spoken language understanding system,” as recognized by Wabgaonkar. (Wabgaonkar, ¶ [0090]).

Regarding claim 15, the rejection of claim 14 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. However, Spoor, Acharya,  Bui fail to expressly recite wherein the plurality of intent identifiers includes a command and a greeting.
The relevance of Wabgaonkar is described above with relation to claim 14. Regarding claim 15, Wabgaonkar teaches wherein the plurality of intent identifiers includes a command and a greeting (“Examples of dialogue act categories {intent identifiers} include question, greeting, command, and information”; Wabgaonkar, ¶¶ [0061]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Wabgaonkar to include wherein the plurality of intent identifiers includes a command and a greeting. “Providing more information to the spoken language understanding system” using the implementations described in Wabgaonkar “can improve the efficiency and accuracy of the spoken language understanding system,” as recognized by Wabgaonkar. (Wabgaonkar, ¶ [0090]).

Regarding claim 16, the rejection of claim 15 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. However, Spoor, Acharya, and Bui fail to expressly recite further comprising prioritizing the command over the greeting when converting the user phrase into the simplified user intent.
The relevance of Wabgaonkar is described above with relation to claim 14. Regarding claim 16, Wabgaonkar teaches further comprising prioritizing the command over the greeting when converting the user phrase into the simplified user intent (“when the first softmax classifier is an intent decoder and the second softmax classifier is a dialogue act classification decoder, the dialogue act category of the word sequence determined by the second softmax classifier... [to] have more context for determining the intent of the word sequence. For Wabgaonkar, ¶¶ [0086]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Wabgaonkar to include further comprising prioritizing the command over the greeting when converting the user phrase into the simplified user intent. “Providing more information to the spoken language understanding system” using the implementations described in Wabgaonkar “can improve the efficiency and accuracy of the spoken language understanding system,” as recognized by Wabgaonkar. (Wabgaonkar, ¶ [0090]).

Claims 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spoor and Acharya as applied to claim 2 above, and further in view of Gagnier (U.S. Pat. App. Pub. No. 2015/0286698, hereinafter Gagnier).

Regarding claim 18, the rejection of claim 2 is incorporated. Spoor and Acharya disclose all of the elements of the current invention as stated above. Spoor further discloses wherein the properties of the at least one real-world object comprise a shape of the real-world object (“if a user looks at an object like a helicopter and says ‘what is this’, as depicted by an outline surrounding the object, the assistant will know the user’s question is in reference to the helicopter as the helicopter,” where helicopter is the shape of the outline around the real-world helicopter Spoor, ¶¶ [0128]). However, Spoor and Acharya fail to expressly recite wherein the reaction comprises moving the virtual agent and conforming a shape of the virtual agent to a shape of the real-world object during movement.
Gagnier teaches systems and methods for “providing a reactive digital personal assistant.” (Gagnier, ¶ [0002]). Regarding claim 18, Gagnier teaches wherein the reaction comprises moving the virtual agent and conforming a shape of the virtual agent to a shape of the real-world object during movement (“step 506 includes using the designated media representation(s) to change a form of the digital personal assistant from a first form to a second form. In accordance with this embodiment, the second form is based on the content. For instance, personal assistant logic 402 may use the designated media representation(s) 422 to change a form of the digital personal assistant 416 from a first form to a second form, which is based on the content 410” where content can include “visual content (e.g., visual representation(s) and/or audio visual representation(s)) {real world objects}, audio content (e.g., audio representation(s) and/or audio visual representation(s)), haptic content, or a combination thereof.”; Gagnier, ¶¶ [0089]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya to incorporate the teachings of Gagnier to include wherein the reaction comprises moving the virtual agent and conforming a shape of the virtual agent to a shape of the real-world object during movement. The digital personal assistants that are reactive can provide the illusion of being sentient, “which may enhance the user's emotional connection with the digital personal assistant,” as recognized by Gagnier. (Gagnier, ¶¶ [0026]).

Claims 21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spoor and Acharya as applied to claim 1 above, and further in view of Gibbs (U.S. Pat. App. Pub. No. 2017/0206095, hereinafter Gibbs).

Regarding claim 21, the rejection of claim 1 is incorporated. Spoor and Acharya disclose all of the elements of the current invention as stated above. However, Spoor and Acharya fail to expressly recite wherein the reaction is further dependent on a specific behavior type of the virtual agent.
Gibbs teaches “a virtual agent that is represented by an animated form and/or has a simulated personality.” (Gibbs, ¶ [0001]). Regarding claim 21, Gibbs teaches wherein the reaction is further dependent on a specific behavior type of the virtual agent (“the virtual agent has a simulated personality, emotions and/or mood. Sometimes, its responses are affected by the simulated personality and are not entirely dependent on the current context or on what the user just asked. “; Gibbs, ¶¶ [0023]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya to incorporate the teachings of Gibbs to include wherein the reaction is further dependent on a specific behavior type of the virtual agent. The virtual agent described in Gibbs can “improve and simplify the interaction between electronic devices and people” by taking on a more relatable visual form, as recognized by Gibbs. (Gibbs, ¶ [0002], [0016]).

Regarding claim 22, the rejection of claim 21 is incorporated. Spoor and Acharya disclose all of the elements of the current invention as stated above. However, Spoor and Acharya fail to expressly recite wherein the reaction comprises animation of the virtual agent further comprising 
The relevance of Gibbs is described above with relation to claim 21. Regarding claim 22, Gibbs teaches wherein the reaction comprises animation of the virtual agent further comprising accessing a database to obtain the specific behavior type of the virtual agent (“The PEM module 210 is any hardware or software arranged to store and update the agent personality, mood and emotion {a database to obtain the specific behavior type of the virtual agent}” where the virtual agent can “detect a particular event, characteristic, trait or attitude (e.g., a smile of a user, pleasant conversation from a user, criticism, a negative attitude on the part of the user, other visual or sensed events or conditions, the interaction context, etc.) using the sensor analysis module 205, context analysis module 225, speech to text module 230, microphone, camera and/or other sensors. Based on such feedback, the PEM module 210 may update the mood and/or emotion {the reaction comprises... accessing the database}” which “cause a change in the animation and/or visual appearance of the virtual agent model.”; Gibbs, ¶¶ [0046], [0082]) and applying a behavior tree to determine how to animate the virtual agent (“the NLP/dialog generation module 235 generates a script that represents what the virtual agent will say in response to events or conditions that are detected by the virtual agent” and the script is “based on the sensor data, interaction context (e.g., as determined by the context analysis module 225) and/or detected speech (e.g., as determined by the speech to text module 230.)” and “behavior planner module 215 may access data stored at the PEM module 210...to render appropriate visual changes in the virtual agent model.” Thus, applying a behavior tree to determine visual changes to {how to animate} the virtual agent; Gibbs, ¶¶ [0057]-[0059]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya to incorporate the teachings of Gibbs to include wherein the reaction comprises animation of the Gibbs can “improve and simplify the interaction between electronic devices and people” by taking on a more relatable visual form, as recognized by Gibbs. (Gibbs, ¶ [0002], [0016]).

Claims 23-24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spoor, Acharya, and Bui as applied to claim 6 above, and further in view of Anorga and Gibbs.

Regarding claim 23, the rejection of claim 6 is incorporated. Spoor, Acharya, and Bui disclose all of the elements of the current invention as stated above. However, Spoor, Acharya, and Bui fail to expressly recite wherein the at least one semantic element comprises an intent identifier associated in a database with a command, and wherein the reaction comprises animating the virtual agent so as to exhibit an apparent movement that evokes carrying out the command
The relevance of Anorga is described above with relation to claim 10. Regarding claim 23, Anorga teaches wherein the at least one semantic element comprises an intent identifier associated in a database with a command (The system can “search eligible content data items for characteristics {intent identifier} that are related to the commanded characteristic {command}” and where the content characteristics “and the corresponding collection of content data items can be stored in accessible storage {intent database}”; Anorga, ¶¶ [0155], [0050]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya and by the verbal input digital image modification systems of Bui, to incorporate the teachings of Anorga to include wherein the at least one semantic element comprises an intent identifier Anorga. (Anorga, ¶ [0029]). However, Spoor, Acharya, Bui, and Anorga fail to expressly recite wherein the reaction comprises animating the virtual agent so as to exhibit an apparent movement that evokes carrying out the command.
The relevance of Gibbs is described above with relation to claim 21. Regarding claim 23, Gibbs teaches wherein the reaction comprises animating the virtual agent so as to exhibit an apparent movement that evokes carrying out the command (“The virtual agent may implement a wide variety of different behaviors. Behaviors may include various types of actions that cause a change in the animation and/or visual appearance of the virtual agent model” where “the selections of behaviors performed by the behavior planner module 215 may be based on input from any other suitable module in the virtual agent system 200. For example, interaction context data is received from the context analysis module 225 and used to help selected suitable behaviors”; Gibbs, ¶¶ [0082], [0085]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya, by the verbal input digital image modification systems of Bui, and by the interface elements for directed display of content data items of Anorga, to incorporate the teachings of Gibbs to include wherein the reaction comprises animating the virtual agent so as to exhibit an apparent movement that evokes carrying out the command. The virtual agent described in Gibbs can “improve and simplify the interaction between electronic devices and people” by taking on a more relatable visual form, as recognized by Gibbs. (Gibbs, ¶ [0002], [0016]).

Regarding claim 24, the rejection of claim 23 is incorporated. Spoor, Acharya, Bui, and Anorga disclose all of the elements of the current invention as stated above. However, Spoor, Acharya, Bui, and Anorga fail to expressly recite wherein the reaction further comprises preceding the apparent movement with an animation of the virtual agent so as to exhibit a perceived acknowledgement of the command.
The relevance of Gibbs is described above with relation to claim 21. Regarding claim 24, Gibbs teaches wherein the reaction further comprises preceding the apparent movement with an animation of the virtual agent so as to exhibit a perceived acknowledgement of the command (“the candidate planner module 215 may prioritize and perform the former and not the latter, since an immediate verbal response to the user’s question may be considered more important than generation of a smile. Alternatively, the candidate planner 215 may determine that the smile should come immediately before or after the speech animation,” where the animation of a smile is a perceived acknowledgment of the user’s question {the command}; Gibbs, ¶¶ [0101]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the augmented reality virtual personal assistant of Spoor as modified by the augmented reality virtual assistant application of Acharya, by the verbal input digital image modification systems of Bui, and by the interface elements for directed display of content data items of Anorga, to incorporate the teachings of Gibbs to include wherein the reaction comprises animating the virtual agent so as to exhibit an apparent movement that evokes carrying out the command. The virtual agent described in Gibbs can “improve and simplify the interaction between electronic devices and people” by taking on a more relatable visual form, as recognized by Gibbs. (Gibbs, ¶ [0002], [0016]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Non-Patent Literature to Schmeil (A. Schmeil and W. Broll, "MARA - A Mobile Augmented Reality-Based Virtual Assistant," 2007 IEEE Virtual Reality Conference, 2007, pp. 267-270, doi: 10.1109/VR.2007.352497) discloses a mobile augmented reality based virtual assistant.
Non-Patent Literature to Wang (I. Wang, J. Smith, and J. Ruiz. "Exploring Virtual Agents for Augmented Reality." In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). ACM, New York, NY, USA, 281:1–281:12. https://doi.org/10.1145/3290605.3300511 Published: 2 May 2019.) discloses various systems and methods for virtual agents in augmented reality systems.

 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about 





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        
/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657