Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 9, and 15 are independent.
This Application was published as U.S. 20220165260.
            Apparent priority: 24 November 2020.
Claim Objections
Claim 8 is objected to because of informalities that may be addressed with the following suggested amendments: 
8. The method according to claim 1, being implemented in a cloud infrastructure. 
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 15-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter.
Claim 15 is directed to “15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable and executable by a computer to cause the computer to perform a method, comprising:….”
The phrase “computer program product” appears in the Specification as follows: “[0010] According to yet another embodiment of the present invention, computer program product comprising a computer readable storage medium having program instructions embodied therewith, includes A computer program product comprising a computer readable storage medium having program instructions embodied therewith ….” 
No definition is provided for “computer program product.”  Accordingly, the phrase “machine readable medium” is interpreted under its broadest reasonable interpretation and as such includes transitory wave media which are machine readable and yet non-statutory. The broadest reasonable interpretation of the Claim would then include non-statutory embodiments and the Claim as a whole is directed to non-statutory subject matter.
Claims 16-20 that depend from Claim 15 do not add structure and are non-statutory as well.
To overcome the rejection, see suggested amendment: “15. A computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions readable and executable by a computer to cause the computer to perform a method, comprising:….”
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 5-6, 8, 9, 11, 13, 15, 17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Venkataraman (U.S. 20170249382) in view of Jang (U.S. 20200372906).
In order to expedite prosecution, the broadly stated “augmented voice command” is interpreted as a combined or compound voice command whereas, as demonstrated in the 35 USC 102 rejections below, this interpretation is not required by the current language of the Claim as explained in the 102 rejections below.
Regarding Claim 1, Venkataraman teaches:
1. A method to generate an augmented voice command, [Venkataraman, Figure 1 shows a combined command of “Show me action movies … 104” and “with Tom Cruise 106.”  “[0001] It is becoming ubiquitous for searches to be carried out by devices that detect a voice or textual input. For example, if a user types out the phrase "show me a list of action movies" into a search engine, a search might be performed for a list of action movies. These devices, however, are not able to effectively distinguish between where one search string ends, and a next search string begins. For example, devices are not able to effectively discern that the string "Show me a list of action movies. What is the weather?" includes two separate search commands.”]
comprising: 
identifying a plurality of sounds from a respective plurality of transducers to a smart device; [Venkataraman, Figure 1, “microphone 102” is the transducer that is receiving the “plurality of sounds.” “[0036] In some embodiments, the media guidance application may detect first phrase 104 and the second phrase 106 through any known user input interface of a user equipment (described further below with respect to FIG. 4), such as a microphone (e.g., microphone 102) if the phrases were spoken, or a keyboard or touch screen if the phrases were typed…..”] [Venkataraman does not teach a plurality of transducers/microphones.  However, the “user equipment 100” shown in Figure 1 is a smart phone or a tv remote that generally have at least two microphones for noise cancelation.  See [0068] and [0085] including:  “[0068] … a personal digital assistant (PDA), a mobile telephone, a portable video player, a portable music player, a portable gaming machine, a smart phone ….”  “[0085] A user may send instructions to control circuitry 404 using user input interface 410. User input interface 410 may be any suitable user interface, such as a remote control, mouse, trackball, keypad, keyboard, touch screen, touchpad, stylus input, joystick, voice recognition interface, or other user input interfaces….” ]
generating a visualization of the sounds using an augmented reality device, wherein one or more of the sounds can be selected using the visualization; and [Venkataraman does not mention the phrases augmented reality or virtual reality.  However, the “visualization” of Venkataraman is generated on Figures 2 and 3 which appear like smart TV screens and are taught to include “3D” media format which is a type of virtual/augmented reality:  “[0085] … In some embodiments, display 412 may be a 3D display, and the interactive media guidance application and any suitable content may be displayed in 3D. A video card or graphics card may generate the output to the display 412. The video card may offer various functions such as accelerated rendering of 3D scenes and 2D graphics, MPEG-2/MPEG-4 decoding, TV output, or the ability to connect multiple monitors….”] [Venkataraman also teaches “voice recognition” and “translate the first phrase to a first string of word … 604” and the same for the second and third phrases in Figure 6 and teaches:  “[0085] …  User input interface 410 may be any suitable user interface, such as a remote control, mouse, trackball, keypad, keyboard, touch screen, touchpad, stylus input, joystick, voice recognition interface, or other user input interfaces….”] [Venkataraman does no teach display of the recognized text to the user for his selection; the selection is done by the machine itself.]
generating the augmented voice command for the smart speaker device, wherein the augmented voice command comprises the one or more sounds selected using the visualization of the augmented reality device. [Venkataraman, Figures 6-8 with the culmination being in Figure 8 (Cont.), “818: Determine that each of the first string, the second string, and the third string are part of a first conversation” and “determine that the first string and the second string are part of a second conversation that the third string is not a part of 832” or “determining the at the second string and the third string are part of a third conversation that the first string is not a part of, 826.”  “8. The method of claim 1, wherein receiving the first phrase, the second phrase, and the third phrase comprises receiving a voice command from a user comprising the first phrase, the second phrase, and the third phrase.”]

Venkataraman does not teach visualization of sounds or user selection using the visualization.
Jang teaches: 
identifying a plurality of sounds from a respective plurality of transducers to a smart device; [Jang, Figure 1, “microphone 114” receiving the sounds “Hey, can you turn the lights on? 104.”  “[0028] The microphone 114 is configured to capture the verbal command 104 and generate an audio signal 124 that corresponds to the verbal command 104…..”  “[0101] The virtual assistant device 110 may also, in some instances, include a plurality of microphones that are collectively configured to record a 3D sound field….”]
generating a visualization of the sounds using an augmented reality device, [Jang, Figure 1, “Automatic Speech Recognizer 130” Figure 2, showing the results of the “Automatic Speech Recognizer 130” leading to “Actions 150.” Figure 6 showing a “Visualization” of the different commands/actions as 150A, 150B, 150C.  “[0061] FIG. 6 is an illustrative example of a wearable device 600. The wearable device 600 includes a screen 120A that enables a user to teach the virtual assistant device 110 which action is to be performed in response to receiving a verbal command. According to one implementation, the screen 120A corresponds to the screen 120. In the illustrated example of FIG. 6, the wearable device 600 can be a virtual reality headset, an augmented reality headset, or a mixed reality headset….”] wherein one or more of the sounds can be selected using the visualization; and [Jang, Figure 6, showing the 3 commands and the “user 102” making a selection of one of the visualized options.]
generating the augmented voice command for the smart speaker device, wherein the augmented voice command comprises the one or more sounds selected using the visualization of the augmented reality device. [Jang, Figure 6, teaches associating the voice command with the selected option:  “[0062] The screen 120A displays the prompt 352 illustrated in FIG. 4. The user 102 can select the second action 150B (e.g. "music plays") in response to receiving the prompt 352, and the virtual assistant device 110 can store the vector 342 associated with the verbal command 304 in the database 118 and can associate the second action 150B with the vector. As a result, after receiving the user selection, the virtual assistant device 110 (e.g., the action initiator 138) can initiate the second action 150B in response to receiving the verbal command 304 again or receiving a verbal command that produces a vector that is similar to the vector 342.”]
Venkataraman and Jang pertain to execution of voice commands using VR devices and it would have been obvious to combine the selection feature of Jang with the system of Venkataraman which performs the selections automatically and by the machine in order to provide the user with more control.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 3, Venkataraman teaches:
3. The method of claim 1, 
wherein each of the plurality of transducers comprises a speaker, and the smart device comprises a smart speaker device, and [Venkataraman, Figure 1, “user equipment 100” teaches the smart device of the Claim and:  “[0051] In some embodiments, after resolving that the first degree exceeds the second degree, the media guidance application may proceed to execute a first search corresponding first phrase 104, and to execute a second, separate search corresponding to second phrase 106. The search results may separately populate in search results 108, which may be generated for display through a display of user equipment 100. The display will be described below with respect to FIG. 4. Additionally, or alternatively, the search results may be output verbally through speakers that are incorporated in, or connected to, user equipment 100. The speakers will be described below with respect to FIG. 4.”  Figure 4, “Speakers 414.”  “[0085] … Speakers 414 may be provided as integrated with other elements of user equipment device 400 or may be stand-alone units. The audio component of videos and other content displayed on display 412 may be played through speakers 414. In some embodiments, the audio may be distributed to a receiver (not shown), which processes and outputs the audio via speakers 414.”]
wherein the sounds include spoken or non-spoken content that are selected in an augmented reality space. [Venkataraman, Figure 1, teaches input as voice or text both:  “[0001] It is becoming ubiquitous for searches to be carried out by devices that detect a voice or textual input…..”  The presentation to the user is in a “multimedia” format that includes voice and text: “[0066] …As referred to herein, the term "multimedia" should be understood to mean content that utilizes at least two different content forms described above, for example, text, audio, images, video, or interactivity content forms. Content may be recorded, played, displayed or accessed by user equipment devices, but can also be part of a live performance.”]
(Jang, Figure 8, “ “speaker 800” is a smart speaker device:  “[0017] FIG. 8 is an illustrative example of a voice-controlled speaker system that incorporates aspects of a virtual assistant device;” and Jang, Figure 6, shows selection of the input spoken commands.)

Regarding Claim 5, Venkataraman teaches:
5. The method according to claim 1, further comprising translating the voice commands to the generation of visualization of the sounds using an augmented reality device. [Venkataraman, Figure 2 shows the conversion of “Audio Signal 124” to “Word Sequence 140” and conversion to the actions shown on Figures 6, “[0061] FIG. 6 is an illustrative example of a wearable device 600. The wearable device 600 includes a screen 120A that enables a user to teach the virtual assistant device 110 which action is to be performed in response to receiving a verbal command. According to one implementation, the screen 120A corresponds to the screen 120. In the illustrated example of FIG. 6, the wearable device 600 can be a virtual reality headset, an augmented reality headset, or a mixed reality headset. The virtual assistant device 110 can be integrated in wearable device 600 or coupled to the wearable device 600 (e.g., in another wearable device or in a mobile device that interacts with the wearable device 600).”]
(Jang Figure 6 shows the virtual reality headset 600 and also the visualization of the sounds on the screen 120A:  “[0061] FIG. 6 is an illustrative example of a wearable device 600. The wearable device 600 includes a screen 120A that enables a user to teach the virtual assistant device 110 which action is to be performed in response to receiving a verbal command. According to one implementation, the screen 120A corresponds to the screen 120. In the illustrated example of FIG. 6, the wearable device 600 can be a virtual reality headset, an augmented reality headset, or a mixed reality headset. The virtual assistant device 110 can be integrated in wearable device 600 or coupled to the wearable device 600 (e.g., in another wearable device or in a mobile device that interacts with the wearable device 600).”)

Regarding Claim 6, Venkataraman teaches:
6. The method according to claim 1, further comprising executing the augmented voice command by a selection of one or more sounds by an augmented reality system. [Venkataraman, Figure 1, “Mission Impossible, etc.” is the result of the execution of the command of “search results for action movies by Tom Cruise.”]
(Jang, Figure 2, “Action Initiator 138” executes the command 150A.  “[0043] Thus, the system 200 enables a slightly different version of a stored command, associated with a particular action, to be executed without additional ontology design. For example, a vector (e.g., the vector 142) associated with the slightly different version of the stored command is compared to a stored vector (e.g., the stored vector 144A) associated with the stored command. If the difference between the vectors satisfies the difference constraint, the virtual assistant device 110 can perform the particular action associated with the stored command. As a result, a slightly different version of the stored command can be interpreted and executed as the stored command. For example, the virtual assistant device 110 can interpret and execute the phrase "Hey, can you turn the lights on?" as if it were the stored command "turn on the lights."”)

Regarding Claim 8, Venkataraman teaches:
8. The method according to claim 1, being implemented in a cloud infrastructure [Venkataraman, “[0107] In a fourth approach, user equipment devices may operate in a cloud computing environment to access cloud services. In a cloud computing environment, various types of computing services for content sharing, storage or distribution (e.g., video sharing sites or social networking sites) are provided by a collection of network-accessible computing and storage resources, referred to as "the cloud."….”]
(Jang does not mention cloud computing.)

Claim 9 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally:
Venkataraman teaches:
9. A system to generate an augmented voice command, comprising: 
a memory storing computer instructions; and [Venkataraman, Figure 4, “storage 408.”]
a processor configured to execute the computer instructions to: [Venkataraman, Figure 4, “Processing Circuitry 406.”]
identify a plurality of sounds from a respective plurality of transducers to a smart speaker device; 
generate a visualization of the sounds using an augmented reality device, wherein one or more of the sounds can be selected using the visualization; and 
generate the augmented voice command for the smart speaker device, wherein the augmented voice command comprises the one or more sounds selected using the visualization of the augmented reality device. 

Claim 11 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.

Claim 13 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.

Claim 15 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale. Additionally:
Venkataraman teaches:
15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable and executable by a computer to cause the computer to perform a method, [Venkataraman, “[0067] The media guidance application and/or any instructions for performing any of the embodiments discussed herein may be encoded on computer readable media. Computer readable media includes any media capable of storing data. The computer readable media may be transitory, including, but not limited to, propagating electrical or electromagnetic signals, or may be non-transitory including, but not limited to, volatile and non-volatile computer memory or storage devices such as a hard disk, floppy disk, USB drive, DVD, CD, media cards, register memory, processor caches, Random Access Memory ("RAM"), etc.”]
comprising: 
identifying a plurality of sounds from a respective plurality of transducers to a smart speaker device; 
generating a visualization of the sounds using an augmented reality device, wherein one or more of the sounds can be selected using the visualization; and 
generating the augmented voice command for the smart speaker device, 
wherein the augmented voice command comprises the one or more sounds selected using the visualization of the augmented reality device. 

Claim 17 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 19 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.

Claims 2, 4, 7, 10, 12, 14, 16, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Venkataraman and Jang in view of Bar-or (U.S. 20150371664).
Regarding Claim 2, Venkataraman and Jang do not mention the use of history.
Bar-or teaches:
2. The method of claim 1, further comprising automatically selecting the one or more sounds based on a selection and de-selection history of historical sounds from one or more of the transducers. [Bar-or, Figure 2, 208 and Figure 4, 404, which pertain to resolving the input command to specified action.  Figure 3A pertains to presenting the possible commands to the user for user selection.  Bar-or teaches that the system can make a decision on its own based on user history of selection: “[0045] In FIG. 3A, two dialogs are shown because the command input resolved to two actions for two different applications. In some implementations, only one dialog is shown even if the command input resolves to two or more device-supported actions. The dialog that is shown may, for example, correspond to a most likely action to be performed. The likelihoods can be determined by how well the command input parses to particular command models, user history of selections, and other data that can be used to determine a likelihood.”]
Venkataraman and Jang and Bar-or pertain to execution of voice commands where Venkataraman selects a command automatically and Bar-or provides for user selection, and it would have been obvious to combine the automatic resolution of commands based on user selection history from Bar-or with the system of combination in order to provide for one method of automatic command resolution.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 4, Venkataraman and Jang do not mention the use of history.
Bar-or teaches and suggests:
4. The method of claim 1, further comprising selecting the one or more sounds based on a selection and de-selection history of historical information stored in a memory. [Bar-or, Figures 2, 3A and 4 pertaining to resolution of command based on “user history of selections” as provided in the rejection of Claim 2.  Bar-or Figure 1 shows storage of “Account Data 114” which stores user information. Bar-or does not expressly teach that user history of selection is stored.  However, in order for a “history” to be accessed, it must be stored because “history” is not contemporaneous.  Accordingly, the teachings of Bar-or either inherently teach this claimed limitation or at the least the combination of teachings of Bar-or suggests the limitation.]
Rationale for combination as provided for Claim 2.

Regarding Claim 7, Venkataraman makes the selections automatically does not teach user selection.
Jang teaches:
7. The method according to claim 1, further comprising: 
while submitting voice command or after submitting voice command, selecting one or more spoken content visualized in the augmented reality device, and accordingly be considered in the augmented voice command for execution; and [Jang, Figures 4 or 6 presenting a list of commands to be executed to the user on the “screen 120/120A” and asking the user to “make a selection.”  “[0067] FIG. 8 is an illustrative example of a voice-controlled speaker system 800. The voice-controlled speaker system 800 can have wireless network connectivity and is configured to execute an assistant operation. The virtual assistant device 110 is included in the voice-controlled speaker system 800. The voice-controlled speaker system 800 also includes a speaker 802. During operation, in response to receiving a verbal command, the voice-controlled speaker system 800 can execute assistant operations. The assistant operations can include adjusting a temperature, playing music, turning on lights, etc. In some implementations, the virtual assistant device 110 can instruct the user 102 how to train the virtual assistant device 110 to respond to an unrecognized command. For example, virtual assistant device 110 can provide a GUI, such as the GUI 504, or verbal interactions to instruct the user 102 how to train the virtual assistant device 110.”]
selecting the one or more sounds based on a selection and de-selection history of historical sounds from one or more of the transducers saved in a historical corpus. 
Venkataraman and Jang do not mention the use of history.
Bar-or teaches and suggests:
while submitting voice command or after submitting voice command, selecting one or more spoken content visualized in the augmented reality device, and accordingly be considered in the augmented voice command for execution; and [Bar-or, Figures 3A and 3B show the presentation of commands to the user for selection.  “[0044] FIG. 3A is an illustration of a user interface 300 at the first user device in which dialogs 304 and 306 are presented in response to the command input 302. For each dialog 304 and 306, a user may either accept or deny invoking the user device action at the second user device in response to the command input by selecting either the "Yes" or "No" button, respectively.”  “[0046] In some implementations, once a user has confirmed a particular device supported action for a second user device, options specific to that action may be displayed at the first user device. For example, FIG. 3B is an illustration of another user interface 310 at the first user device in which action-specific options are displayed…dialog 320 includes a listing 322 generated in response to the command input 302 and action-specific options 324, 326, 328, 330, and 332…..”]
selecting the one or more sounds based on a selection and de-selection history of historical sounds from one or more of the transducers saved in a historical corpus. [Bar-or, see rejection of Claims 2 and 4, “[0045] …The dialog that is shown may, for example, correspond to a most likely action to be performed. The likelihoods can be determined by how well the command input parses to particular command models, user history of selections, and other data that can be used to determine a likelihood.”  Storage of the history in a memory or “historical corpus” is either inherent or suggested by the combination of teachings of Bar-or]
Rationale for combination as provided for Claim 2.

Claim 10 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 12 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.

Claim 14 is a system claim with limitations corresponding to the limitations of Claims 6 and 7 together and is rejected under similar rationale.
14. The system according to claim 8, further comprising: 
executing the augmented voice command by a selection of one or more sounds by an augmented reality system; [Claim 6 and also all of the 3 references teach execution of the received and selected or otherwise disambiguated command.]
while submitting voice command or after submitting voice command, selecting one or more spoken content visualized in the augmented reality device, and accordingly be considered in the augmented voice command for execution; and [Claim 7.]
selecting the one or more sounds based on a selection and de-selection history of historical sounds from one or more of the transducers saved in a historical corpus. [Claim 7.]

Claim 16 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 18 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 20 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 14 and is rejected under similar rationale.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 3, 5-6, 9, 11, 13, 15, 17, and 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Jang (U.S. 20200372906).

    PNG
    media_image1.png
    500
    782
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    508
    789
    media_image2.png
    Greyscale

Regarding Claim 1, Jang teaches: 
1. A method to generate an augmented voice command, [Note that Jang does not teach an “augmented voice command.”  However, “augmented voice command” is not defined in this Claim or others.  Additionally, the last limitation states: “wherein the augmented voice command comprises the one or more sounds selected using the visualization of the augmented reality device.”  This is hardly “augmented.”  In order to receive weight, define the limitation as intended and in a “limiting” manner.]
comprising: 
identifying a plurality of sounds from a respective plurality of transducers to a smart device; [Jang, Figure 1, “microphone 114” receiving the sounds “Hey, can you turn the lights on? 104.”  “[0028] The microphone 114 is configured to capture the verbal command 104 and generate an audio signal 124 that corresponds to the verbal command 104…..”  “[0101] The virtual assistant device 110 may also, in some instances, include a plurality of microphones that are collectively configured to record a 3D sound field….”]
generating a visualization of the sounds using an augmented reality device, [Jang, Figure 1, “Automatic Speech Recognizer 130” Figure 2, showing the results of the “Automatic Speech Recognizer 130” leading to “Actions 150.” Figure 6 showing a “Visualization” of the different commands/actions as 150A, 150B, 150C.  “[0061] FIG. 6 is an illustrative example of a wearable device 600. The wearable device 600 includes a screen 120A that enables a user to teach the virtual assistant device 110 which action is to be performed in response to receiving a verbal command. According to one implementation, the screen 120A corresponds to the screen 120. In the illustrated example of FIG. 6, the wearable device 600 can be a virtual reality headset, an augmented reality headset, or a mixed reality headset….”] wherein one or more of the sounds can be selected using the visualization; and [Jang, Figure 6, showing the 3 commands and the “user 102” making a selection of one of the visualized options.]
generating the augmented voice command for the smart speaker device, wherein the augmented voice command comprises the one or more sounds selected using the visualization of the augmented reality device. [Jang, Figure 6, teaches associating the voice command with the selected option:  “[0062] The screen 120A displays the prompt 352 illustrated in FIG. 4. The user 102 can select the second action 150B (e.g. "music plays") in response to receiving the prompt 352, and the virtual assistant device 110 can store the vector 342 associated with the verbal command 304 in the database 118 and can associate the second action 150B with the vector. As a result, after receiving the user selection, the virtual assistant device 110 (e.g., the action initiator 138) can initiate the second action 150B in response to receiving the verbal command 304 again or receiving a verbal command that produces a vector that is similar to the vector 342.”]

Regarding Claim 3, Jang teaches:
3. The method of claim 1, 
wherein each of the plurality of transducers comprises a speaker, and the smart device comprises a smart speaker device, and [Jang, Figure 8, “ “speaker 800” is a smart speaker device:  “[0017] FIG. 8 is an illustrative example of a voice-controlled speaker system that incorporates aspects of a virtual assistant device;”]
wherein the sounds include spoken or non-spoken content that are selected in an augmented reality space. [Jang, Figure 6, shows selection of the input spoken commands.]

Regarding Claim 5, Jang teaches:
5. The method according to claim 1, further comprising translating the voice commands to the generation of visualization of the sounds using an augmented reality device. [Jang Figure 6 shows the virtual reality headset 600 and also the visualization of the sounds on the screen 120A:  “[0061] FIG. 6 is an illustrative example of a wearable device 600. The wearable device 600 includes a screen 120A that enables a user to teach the virtual assistant device 110 which action is to be performed in response to receiving a verbal command. According to one implementation, the screen 120A corresponds to the screen 120. In the illustrated example of FIG. 6, the wearable device 600 can be a virtual reality headset, an augmented reality headset, or a mixed reality headset. The virtual assistant device 110 can be integrated in wearable device 600 or coupled to the wearable device 600 (e.g., in another wearable device or in a mobile device that interacts with the wearable device 600).”]

Regarding Claim 6, Jang teaches:
6. The method according to claim 1, further comprising executing the augmented voice command by a selection of one or more sounds by an augmented reality system. [Jang, Figure 2, “Action Initiator 138” executes the command 150A.  “[0043] Thus, the system 200 enables a slightly different version of a stored command, associated with a particular action, to be executed without additional ontology design. For example, a vector (e.g., the vector 142) associated with the slightly different version of the stored command is compared to a stored vector (e.g., the stored vector 144A) associated with the stored command. If the difference between the vectors satisfies the difference constraint, the virtual assistant device 110 can perform the particular action associated with the stored command. As a result, a slightly different version of the stored command can be interpreted and executed as the stored command. For example, the virtual assistant device 110 can interpret and execute the phrase "Hey, can you turn the lights on?" as if it were the stored command "turn on the lights."”]

Claim 9 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally:
Jang teaches:
9. A system to generate an augmented voice command, comprising: 
a memory storing computer instructions; and [Jang, Figure 10, “Memory 116.”]
a processor configured to execute the computer instructions to: [Jang, Figure 10, “processor 112.”]
…
Claim 11 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 13 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.

Claim 15 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale. Additionally:
Jang teaches:
15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable and executable by a computer to cause the computer to perform a method, [Jang, “[0007] According to another implementation of the techniques disclosed herein, a non-transitory computer-readable medium includes instructions for teaching a virtual assistant device an action to be performed in response to receiving a command….”]
…
Claim 17 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 19 is a computer-readable medium system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 4, 7, 10, 12, 14, 16, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jang in view of Bar-or (U.S. 20150371664).
Rationale for rejection remains as provided above for the combination of Venkataraman/Jang with Bar-or.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Joh (U.S. 2020/0193986)
Foruntanpour (U.S. 20140081634) Figure 1A, “audio capture and positioning module 130.”  Figure 4, “capture, by a first augmented reality device, speech spoken by a person in a real-world scene 410.”  “[0032] Audio capture and positioning module 130 may capture audio in the vicinity of system 100. For instance, audio capture and positioning module 130 may be configured to capture speech spoken by persons (also referred to as speakers) present within a scene viewed by the user. Audio capture and positioning module 130 may include one or more microphones. Multiple microphones may be used to assist in determining the location where audio originated, such as based on a time of flight analysis. Audio capture and positioning module 130 may be configured to determine a direction or location from which sound, such as speech, originated. As such, audio capture and positioning module 130 may be used to capture audio and determine the audio's source. For instance, if multiple persons are participating in a conversation, audio capture and positioning module 130 may capture a person's speech and determine which person spoke the speech. In some embodiments, audio capture and positioning module 130 may be used to capture audio of persons not present with a scene viewed by the user. For instance, speech occurring behind the user may still be captured and its source determined.”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659