DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed 5/18/2022 has been entered. The claims 1 and 11 have been amended. The claims 4, 17 and 18 have been cancelled. The claims 1-3, 5-16, 19 and 20 are pending in the current application. 

Response to Arguments
Applicant’s arguments filed 5/18/2022 with respect to the amended claim 1 and similar claims have been considered but are not found persuasive in view of the ground(s) of rejection set forth in the current Office Action.  
In Remarks, applicant separately attacked each of the cited references in light of the new claim limitation of “wherein integrating the augmented reality graphical data output into the image comprises processing the visual data to identify a position of a physical object in the scene and arranging the augmented reality graphical data output within the image based on the position of the physical object, wherein the physical object is associated with the recognized non-verbal sound event” with applicant’s misinterpretation of the cited references’ teaching in relation to the claim invention. The examiner cannot concur. 
In Remarks, applicant alleged that Eubank does not disclose that the position at which an object is rendered in the SR setting is based on information obtained from image processing. This argument is unfounded in view of Eubank-provisional’s teaching at Paragraph 0047-0050 that the position of the physical object is determined based on the image processing performed by the object recognition algorithm. 
Eubank-provisional teaches applicant’s contested claim limitation: 
wherein integrating the augmented reality graphical data output into the image comprises processing the visual data to identify a position of a physical object in the scene and arranging the augmented reality graphical data output within the image based on the position of the physical object, wherein the physical object is associated with the recognized non-verbal sound event. 
For example, Eubank-provisional teaches at Paragraph 0047-0050 that the identifier 10 may use image data captured by the camera 4 to identify the sound object within the environment. The identifier 10 may perform an object recognition algorithm upon the image data to identify an object within the field of view of the camera. The algorithm may determine descriptive data that describes physical characteristics of an object…..The parameter estimator 61 is configured to obtain 1) at least one microphone audio signal and/or 2) image data captured by at least one camera 4…. The estimator 61 is configured to estimate parameters of the sound source, such as a position of the sound source as position data (e.g., location of the source)….the estimator may process the signals according to a sound source localization algorithm…the estimator may process the image data captured by the camera 4 to identify the sound object and/or the position of the sound object or source with respect to the device 1. For instance, the estimator may estimate a position of a sound object within an environment by perform an object recognition upon the image data to identify an object within the field of view of the camera…..The estimator is configured to produce metadata that contains at least some of the parameters that are estimated and/or data that is determined….The estimator 60 may adjust position data based on movement of an object.  
Eubank-provisional teaches integrating the AR graphical data output dog bark spatially rendered) into the image comprises processing the visual data (e.g., the image data) to identify a position of a physical object (a dog) in the scene and arranging the AR graphical data output (dog bark spatially rendered) within the image based on the position of the physical object (e.g., the dog), wherein the physical object (the dog) is associated with the recognized non-verbal sound event (dog bark sound event). 
Eubank-provisional teaches at Paragraph 0075 that the spatial mixer 30 may output a sound object in sync with presentation of image data on the display screen 23 when the display screen is presenting a VR setting that includes a dog, the dog bark may be outputted when the mouth of the dog in the VR setting moves and at Paragraph 0095 that both devices may be HMDs that are presenting a SR setting (MR) by displaying the setting on a respective display screen and outputting sounds of the setting and at Paragraph 0076 that the spatial mixer 30 may spatially render sound at a virtual sound source produced by the speakers 21 and 22 that corresponds to a physical location (or position) at which the sound (e.g., sound object) is detected within the environment in which the source device 1 is located. 
Eubank-provisional teaches at Paragraph 0020 that another example of SR is mixed reality (MR) and at Paragraph 0022-0025 that one example of mixed reality is augmented reality (AR)…an augmented reality forest may have virtual trees and structures, but the animals may have features that are accurately reproduced from images taken of physical animals and at Paragraph 0096 that the audio receiver device 20 may retrieve image data associated with the dog bark and present the dog in the SR setting at a position with the SR setting in which the dog bark is to be spatially rendered. 
In Remarks, applicant individually attacked the Takahashi ‘739 reference in an obviousness type of rejection with applicant’s misinterpretation of the Takahashi ‘739 reference’s teaching in relation to the claim invention. The examiner cannot concur. 
Takahashi ‘739 teaches the claim limitation that wherein integrating the augmented reality graphical data output into the image comprises processing the visual data to identify a position of a physical object in the scene and arranging the augmented reality graphical data output within the image based on the position of the physical object, wherein the physical object is associated with the recognized non-verbal sound event (
Takahashi ‘739 teaches at Paragraph 0173 that the moving image processing unit 23 performs the emphasis process on the moving image with sound on the basis of the sound image object information so that a bounding box is displayed in the area of the selected sound image object to emphasize the sound image object and at Paragraph 0214 that the sound of the object can be emphasized and at Paragraph 0220 that the sound image object OB22 that is the car and the sound image object OB23 that is the dog are blurred. Accordingly, the emphasizing and blurring graphical outputs are spatially rendered at the positions of the sound image objects. 
Takahashi ‘739 teaches at Paragraph 0133 it is desired to detect a dog as the image object from the moving image with sound and at Paragraph 0140 that the image object detector 51 outputs the image object information as the detection result of the image object and at Paragraph 0234 that the moving image processing unit 23 performs a superimposition process of superimposing a mark MK11 representing the sound image object OB41 and an arrow mark MK12 indicating the direction in which the sound image object OB41 is located on the moving image with sound with respect to the area of the current field of view of the moving image with sound. 
Takahashi ‘739 teaches at FIG. 10 that the virtual sound symbols are rendered at the positions the sound objects (the guitars as the sound objects). 
Takahashi ‘739 teaches at Paragraph 0145 that the neural network may learn using a data set of the moving image with sound in which an image object and a sound object are associated with each other in advance and at Paragraph 0186 that the acoustic event information…and the sound of the moving image with sound are input to the neural network…so as to detect the sound object. 
Takahashi ‘739 teaches at FIGS. 9-11 and Paragraph 0233-0235 that the bird that the user may be interested in is detected as the sound image object OB41….the moving image processing unit 23 performs a superimposition process of superimposing a mark MK11 (virtual bird) representing the sound image object OB41 and an arrow mark MK12 indicating the direction in which the sound image object OB41 is located on the moving image with sound….the sound image object OB41 as a bird is displayed on the display image and the separated sound “chirp chirp” of the sound image object OB41 is reproduced as a preproduction sound). 
Applicant’s arguments filed 5/18/2022 with respect to the amended claim 1 and similar claims have been considered but are moot in view of the new ground(s) of rejection set forth in the current Office Action based on the newly cited Takahashi ‘953 reference.  
Takahashi ‘953 teaches the claim limitation that wherein integrating the augmented reality graphical data output into the image comprises processing the visual data to identify a position of a physical object in the scene and arranging the augmented reality graphical data output within the image based on the position of the physical object, wherein the physical object is associated with the recognized non-verbal sound event (
Takahashi ‘953 teaches at FIGS. 4A-4F, 5A-5B, 10A-10D, 24A-24F, FIGS. 25A-25F, FIGS. 27-29 and Paragraph 0313-0314 that the telop image TP representing the volume of the speech sound can be added to the moving image…the example is not limited to the speech sound of a person, and it is also conceivable to add a telop image TP representing an animal call or an ambient sound using characters and it is appropriate to display the telop image TO according to the position or depth of a sound source in the image and at Paragraph 0250-0251 that an animal is the moving object 80 and the image processing apparatus 1 sets the image modes of the effect image EF according to the acquired information of the moving object 80. For example, the color, luminance, density, gradation and the like of the effect image EF to be displayed are set according to the information. 
Takahashi ‘953 teaches at Paragraph 0448-0449 and Paragraph 0467 processing of superimposing the additional image such as the effect image EF on the object, synthesizing graphics, characters as the additional image, providing optical effects in these moving image editing). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Takahashi ‘953’s visual sound effect models into Eubank’s AR system for displaying the sound effect models to have displayed the visual sound effect models in association with the positions of the sound objects such as the animal objects identified in the physical scene. One of the ordinary skill in the art would have utilized the visual sound effect models to have visually characterized the sounds of the sound objects in the augmented reality scene. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-12, 16, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Eubank et al. US-PGPUB No. 2021/0035597 (hereinafter Eubank based on the provisional application 62/880,559’s filing date) in view of 
Brown US-PGPUB No. 2020/0082842 (hereinafter Brown); Visser et al. US-PGPUB No. 2018/0020312 (hereinafter Visser); Takahashi US-PGPUB No. 2021/0281739 (hereinafter Takahashi ‘739); Takahashi et al. US-PGPUB No. 2021/0201953 (hereinafter Takahashi ‘953) and Gross US-PGPUB No. 2018/0108369 (hereinafter Gross). 

Re Claim 1: 
Eubank-provisional teaches a video telephony system for establishing a video telephony session between first and second user points, the video telephony session causing the generation of an image of a scene at the first user point of the video telephony session (e.g., Eubank-provisional FIGS. 1 and 4 the video telephony session between the audio source device 1---the second use point and audio receiver device 20---the first use point), the system comprising: 
a sound acquirer for acquiring audio data at the second use point of the video telecommunication session (Eubank-provisional teaches FIG. 1 “n microphone array 2” including “Microphone 3” and “Sound Object and Sound Bed Identifier 10” for acquiring audio data including Wind 18); 
a camera for acquiring visual data at the second use point (
Eubank-provisional teaches at Paragraph 0047-0050 that the identifier 10 may use image data captured by the camera 4 to identify the sound object within the environment. The identifier 10 may perform an object recognition algorithm upon the image data to identify an object within the field of view of the camera. The algorithm may determine descriptive data that describes physical characteristics of an object…..The parameter estimator 61 is configured to obtain 1) at least one microphone audio signal and/or 2) image data captured by at least one camera 4…. The estimator 61 is configured to estimate parameters of the sound source, such as a position of the sound source as position data (e.g., location of the source)….the estimator may process the signals according to a sound source localization algorithm…the estimator may process the image data captured by the camera 4 to identify the sound object and/or the position of the sound object or source with respect to the device 1. For instance, the estimator may estimate a position of a sound object within an environment by perform an object recognition upon the image data to identify an object within the field of view of the camera…..The estimator is configured to produce metadata that contains at least some of the parameters that are estimated and/or data that is determined….The estimator 60 may adjust position data based on movement of an object.  
Eubank-provisional teaches at Paragraph 0075 that the spatial mixer 30 may output a sound object in sync with presentation of image data on the display screen 23 when the display screen is presenting a VR setting that includes a dog, the dog bark may be outputted when the mouth of the dog in the VR setting moves and at Paragraph 0095 that both devices may be HMDs that are presenting a SR setting (MR) by displaying the setting on a respective display screen and outputting sounds of the setting and at Paragraph 0020 that another example of SR is mixed reality (MR) and at Paragraph 0022-0025 that one example of mixed reality is augmented reality (AR)…an augmented reality forest may have virtual trees and structures, but the animals may have features that are accurately reproduced from images taken of physical animals); 
a sound processor for recognizing a non-verbal sound event at the second use point by processing the audio data using one or more sound models to obtain a non-verbal sound event identifier for the non-verbal sound event (
Eubank-provisional teaches at Paragraph 0037 that the controller 5 may be a special-purpose processor and teaches at FIG. 1 “Sound Object and Sound Bed Identifier 10” for acquiring audio data including Wind 18 and Paragraph 0042-0044 that the sound object & sound be identifier 10 identifies sound objects…spatial sound-source data of the dog bark 17 may include an audio signal that contains the bark 17 and position data of the source, e.g., the dog’s mouth of the bark 17….the sound library 9 may be a table having an entry for one or more sound objects…the metadata may include a unique index identifier that is associated with a sound object such as the dog bark 17 and at Paragraph 0046 that the identifier 10 may analyze the audio data within the spatial sound-source data to identify one or more sound characteristics of the audio data that is associated with a bark or more particularly with the specific bark 17 from that specific breed of dog…may perform a table lookup into the sound library 9 using the spatial sound-source data to identify the sound object as a matching sound object). 
Eubank-provisional at least implicitly teaches the claim limitation: 
an augmented reality controller for determining and generating an augmented reality effect command corresponding to an augmented reality graphical data output semantically related to the non-verbal sound event by: inputting at least the non-verbal sound event identifier into an augmented reality effect command model; and receiving the augmented reality effect command corresponding to the augmented reality graphical data output semantically related to the non-verbal sound event from the augmented reality effect command model (
Eubank-provisional teaches at Paragraph 0055 that the identifier 10 is configured to produce a sound-object sonic descriptor 13 upon finding/selecting a matching predefined sound object’s entry from the library 9 and add metadata into the descriptor and at Paragraph 0060 that the network interface 6 is configured to obtain at least some audio data for transmission to the audio receiver device 20 and at Paragraph 0072 that the sound object engine 27 is configured to obtain a sound-object sonic descriptor 13 and to reproduce the sound object…may perform a table lookup into the sound library 28 using metadata contained within the sonic descriptor 13 such as an index identifier…the engine 27 selects the sound object associated with the entry….that may be used by the mixer to spatially render the sound object….a sound object of the dog bark reproduced by the receiver device 20 may output the reproduction of the bark to the right of the user of the receiver device 20 and at Paragraph 0075 that the spatial mixer 30 may output a sound object in sync with presentation of image data on the display screen 23 when the display screen is presenting a VR setting that includes a dog, the dog bark may be outputted when the mouth of the dog in the VR setting moves and at Paragraph 0095 that both devices may be HMDs that are presenting a SR setting (MR) by displaying the setting on a respective display screen and outputting sounds of the setting and at Paragraph 0020 that another example of SR is mixed reality (MR) and at Paragraph 0022-0025 that one example of mixed reality is augmented reality (AR)…an augmented reality forest may have virtual trees and structures, but the animals may have features that are accurately reproduced from images taken of physical animals and at Paragraph 0096 that audio receiver device 20 may retrieve image data associated with the dog bark, e.g., a dog and present the dog in the SR setting, at a position within the SR setting at which the dog bark is to be spatially rendered); and 
An augmented reality environment generator for generating the augmented reality graphical data output semantically related to the non-verbal sound event by implementing the augmented reality effect command, and for integrating the augmented reality graphical data output into the image of the scene at the first use point (
Eubank-provisional teaches at Paragraph 0055 that the identifier 10 is configured to produce a sound-object sonic descriptor 13 upon finding/selecting a matching predefined sound object’s entry from the library 9 and add metadata into the descriptor and at Paragraph 0060 that the network interface 6 is configured to obtain at least some audio data for transmission to the audio receiver device 20 and at Paragraph 0072 that the sound object engine 27 is configured to obtain a sound-object sonic descriptor 13 and to reproduce the sound object…may perform a table lookup into the sound library 28 using metadata contained within the sonic descriptor 13 such as an index identifier…the engine 27 selects the sound object associated with the entry….that may be used by the mixer to spatially render the sound object….a sound object of the dog bark reproduced by the receiver device 20 may output the reproduction of the bark to the right of the user of the receiver device 20 and at Paragraph 0075 that the spatial mixer 30 may output a sound object in sync with presentation of image data on the display screen 23 when the display screen is presenting a VR setting that includes a dog, the dog bark may be outputted when the mouth of the dog in the VR setting moves and at Paragraph 0095 that both devices may be HMDs that are presenting a SR setting (MR) by displaying the setting on a respective display screen and outputting sounds of the setting and at Paragraph 0020 that another example of SR is mixed reality (MR) and at Paragraph 0022-0025 that one example of mixed reality is augmented reality (AR)…an augmented reality forest may have virtual trees and structures, but the animals may have features that are accurately reproduced from images taken of physical animals and at Paragraph 0096 that the audio receiver device 20 may retrieve image data associated with the dog bark and present the dog in the SR setting at a position with the SR setting in which the dog bark is to be spatially rendered). 
wherein integrating the augmented reality graphical data output into the image comprises processing the visual data to identify a position of a physical object in the scene and arranging the augmented reality graphical data output within the image based on the position of the physical object, wherein the physical object is associated with the recognized non-verbal sound event (
Eubank-provisional teaches at Paragraph 0047-0050 that the identifier 10 may use image data captured by the camera 4 to identify the sound object within the environment. The identifier 10 may perform an object recognition algorithm upon the image data to identify an object within the field of view of the camera. The algorithm may determine descriptive data that describes physical characteristics of an object…..The parameter estimator 61 is configured to obtain 1) at least one microphone audio signal and/or 2) image data captured by at least one camera 4…. The estimator 61 is configured to estimate parameters of the sound source, such as a position of the sound source as position data (e.g., location of the source)….the estimator may process the signals according to a sound source localization algorithm…the estimator may process the image data captured by the camera 4 to identify the sound object and/or the position of the sound object or source with respect to the device 1. For instance, the estimator may estimate a position of a sound object within an environment by perform an object recognition upon the image data to identify an object within the field of view of the camera…..The estimator is configured to produce metadata that contains at least some of the parameters that are estimated and/or data that is determined….The estimator 60 may adjust position data based on movement of an object.  
Eubank-provisional teaches integrating the AR graphical data output dog bark spatially rendered) into the image comprises processing the visual data (e.g., the image data) to identify a position of a physical object (a dog) in the scene and arranging the AR graphical data output (dog bark spatially rendered) within the image based on the position of the physical object (e.g., the dog), wherein the physical object (the dog) is associated with the recognized non-verbal sound event (dog bark sound event). 
Eubank-provisional teaches at Paragraph 0088 that the identifier 10 may use image data in lieu of the sound characteristics to identify an object associated with the sound source. 
Eubank-provisional teaches at Paragraph 0075 that the spatial mixer 30 may output a sound object in sync with presentation of image data on the display screen 23 when the display screen is presenting a VR setting that includes a dog, the dog bark may be outputted when the mouth of the dog in the VR setting moves and at Paragraph 0095 that both devices may be HMDs that are presenting a SR setting (MR) by displaying the setting on a respective display screen and outputting sounds of the setting and at Paragraph 0076 that the spatial mixer 30 may spatially render sound at a virtual sound source produced by the speakers 21 and 22 that corresponds to a physical location (or position) at which the sound (e.g., sound object) is detected within the environment in which the source device 1 is located. 
Eubank-provisional teaches at Paragraph 0020 that another example of SR is mixed reality (MR) and at Paragraph 0022-0025 that one example of mixed reality is augmented reality (AR)…an augmented reality forest may have virtual trees and structures, but the animals may have features that are accurately reproduced from images taken of physical animals and at Paragraph 0096 that the audio receiver device 20 may retrieve image data associated with the dog bark and present the dog in the SR setting at a position with the SR setting in which the dog bark is to be spatially rendered. 
). 
Takahashi ‘953 teaches the claim limitation that wherein integrating the augmented reality graphical data output into the image comprises processing the visual data to identify a position of a physical object in the scene and arranging the augmented reality graphical data output within the image based on the position of the physical object, wherein the physical object is associated with the recognized non-verbal sound event (
Takahashi ‘953 teaches at FIGS. 4A-4F, 5A-5B, 10A-10D, 24A-24F, FIGS. 25A-25F, FIGS. 27-29 and Paragraph 0313-0314 that the telop image TP representing the volume of the speech sound can be added to the moving image…the example is not limited to the speech sound of a person, and it is also conceivable to add a telop image TP representing an animal call or an ambient sound using characters and it is appropriate to display the telop image TO according to the position or depth of a sound source in the image and at Paragraph 0250-0251 that an animal is the moving object 80 and the image processing apparatus 1 sets the image modes of the effect image EF according to the acquired information of the moving object 80. For example, the color, luminance, density, gradation and the like of the effect image EF to be displayed are set according to the information. 
Takahashi ‘953 teaches at Paragraph 0448-0449 and Paragraph 0467 processing of superimposing the additional image such as the effect image EF on the object, synthesizing graphics, characters as the additional image, providing optical effects in these moving image editing). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Takahashi ‘953’s visual sound effect models into Eubank’s AR system for displaying the sound effect models to have displayed the visual sound effect models in association with the positions of the sound objects such as the animal objects identified in the physical scene. One of the ordinary skill in the art would have utilized the visual sound effect models to have visually characterized the sounds of the sound objects in the augmented reality scene. 
 
Brown teaches the claim limitation: an augmented reality controller for determining and generating an augmented reality effect command corresponding to an augmented reality graphical data output semantically related to the non-verbal sound event by: inputting at least the non-verbal sound event identifier into an augmented reality effect command model; and receiving the augmented reality effect command corresponding to the augmented reality graphical data output semantically related to the non-verbal sound event from the augmented reality effect command model; and an augmented reality environment generator for generating the augmented reality graphical data output semantically related to the non-verbal sound event by implementing the augmented reality effect command, and for integrating the augmented reality graphical data output into the image of the scene at the first use point
 (Brown teaches at Paragraph 0063 identifying at least the non-verbal sound event by identifying a source or at least a category of sound source to which the source belongs; for example, the analysis could identify whether the sound originates from a radio or a car or electronic device or vehicle. Brown teaches at Paragraph 0052 that generated image elements may be indicative of the volume of the one or more sounds generated by the one or more sound sources…the generated image elements may be indicative of a classification of a type of sound such as an electronic device or an alarm…..Properties of the image element may be varied (controlled) in order to communicate different characteristics of the (identified) sound. Example properties of image elements to indicate a sound source (non-verbal sound event identifier) include animations such as a flashing effect or motion. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound).
Even if “semantically related” has a specific meaning, Visser teaches that the augmented reality graphical data output comprises an image of an object semantically associated with the nonverbal sound event (Visser FIGS. 4A and 5 and Paragraph 0070-0071 that the processor may operate as the virtual sound source generation circuitry…insert a virtual bird 430 where the bird sound 410 is generated…and may insert a virtual monkey 434 where the monkey sound 414 is generated...virtual objects 430 and 434 may be inserted into a scene using mixed reality processing techniques). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the augmented reality effect overlay features of Brown’s augmented reality sound effect overlay to have provided an augmented reality graphical data output to have represented the sound sources to the HMD user. One of the ordinary skill in the art would have been motivated to have provided the augmented reality visual effect overlay to have represented the sound sources in the environment of the HMD user. 
However, Brown, Visser and Eubank-provisional do not specifically teach the claim limitation: the one or more sound models being defined by parameters of a deep neural network architecture. 
In the same field of endeavor, Gross teaches the claim limitation that the one or more sound models being defined by parameters of a deep neural network architecture (Gross teaches at Paragraph 0016 that audio recognition system 210 may detect and record sounds, e.g., an animal, in vehicle 220. Audio recognition system 210 may be trained using machine learning neural network and at Paragraph 0027 that first neural network 350 and second neural network 360 may include one input layer, at least one hidden layer and one output layer and at Paragraph 0028 that process 400 may be utilized to train neural networks to achieve sound classification and recognition and at Paragraph 0030 that a set of different sounds may be provided for training, validation and testing of neural network learning…the set of sound may include child laughing or crying, dog barking or cat meowing and at Paragraph 0032 process 400 may involve processor 310 and second neural network 360 classifying the recorded sounds into a number of categories….the second neural network 360 may include multiple nodes and may adjust weight and bias factors (parameters) according to a back-propagation algorithm). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Gross’s trained multi-layered deep neural network to have classified the sounds into a number of categories into Brown’s sound processing unit 1120 to have classified the detected sounds into the one or more sound models (e.g., the radio sound model, the car sound model). One of the ordinary skill in the art would have utilized a machine learning as the one or more sound classification models. 
In the same field of endeavor, Takahashi ‘739 teaches the claim limitation that the one or more sound models being defined by parameters of a deep neural network architecture (Takahashi ‘739 teaches at Paragraph 0135-0136 that acoustic event information are input to the neural network constituting the image object detector 51 and at Paragraph 0143 that the sound image object detector 53 is the neural network that takes…the sound object information as inputs and at Paragraph 0145 that the neural network may learn using a data set of the moving image with sound in which an image object and a sound object are associated with each other in advance and at Paragraph 0186 that the acoustic event information…and the sound of the moving image with sound are input to the neural network…so as to detect the sound object). 
Moreover, Takahashi ‘739 teaches the claim limitation that wherein integrating the augmented reality graphical data output into the image comprises processing the visual data to identify a position of a physical object in the scene and arranging the augmented reality graphical data output within the image based on the position of the physical object, wherein the physical object is associated with the recognized non-verbal sound event (
Takahashi ‘739 teaches at Paragraph 0173 that the moving image processing unit 23 performs the emphasis process on the moving image with sound on the basis of the sound image object information so that a bounding box is displayed in the area of the selected sound image object to emphasize the sound image object and at Paragraph 0214 that the sound of the object can be emphasized and at Paragraph 0220 that the sound image object OB22 that is the car and the sound image object OB23 that is the dog are blurred. Accordingly, the emphasizing and blurring graphical outputs are spatially rendered at the positions of the sound image objects. 
Takahashi ‘739 teaches at Paragraph 0133 it is desired to detect a dog as the image object from the moving image with sound and at Paragraph 0140 that the image object detector 51 outputs the image object information as the detection result of the image object and at Paragraph 0234 that the moving image processing unit 23 performs a superimposition process of superimposing a mark MK11 representing the sound image object OB41 and an arrow mark MK12 indicating the direction in which the sound image object OB41 is located on the moving image with sound with respect to the area of the current field of view of the moving image with sound. 
Takahashi ‘739 teaches at FIG. 10 that the virtual sound symbols are rendered at the positions the sound objects (the guitars as the sound objects). 
Takahashi ‘739 teaches at Paragraph 0145 that the neural network may learn using a data set of the moving image with sound in which an image object and a sound object are associated with each other in advance and at Paragraph 0186 that the acoustic event information…and the sound of the moving image with sound are input to the neural network…so as to detect the sound object. 
Takahashi ‘739 teaches at FIGS. 9-11 and Paragraph 0233-0235 that the bird that the user may be interested in is detected as the sound image object OB41….the moving image processing unit 23 performs a superimposition process of superimposing a mark MK11 (virtual bird) representing the sound image object OB41 and an arrow mark MK12 indicating the direction in which the sound image object OB41 is located on the moving image with sound….the sound image object OB41 as a bird is displayed on the display image and the separated sound “chirp chirp” of the sound image object OB41 is reproduced as a preproduction sound). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Takahashi ‘739’s trained neural network to have classified the sounds into a number of categories into Brown’s sound processing unit 1120 to have classified the detected sounds into the one or more sound models (e.g., the radio sound model, the car sound model). One of the ordinary skill in the art would have utilized a machine learning as the one or more sound classification models. 

Re Claim 19: 
The claim 19 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the non-verbal sound event comprises a human articulated impersonation of an animal sound, wherein the augmented reality graphical data output comprises a graphical depiction of an animal the subject of the impersonation, and the augmented reality effect command corresponds to a graphical data output comprising a depiction of the animal, the augmented reality environment generator being operable to generate the augmented reality graphical data output depicting the animal by implementing said augmented reality effect command.
Takahashi ‘739 further teaches the claim limitation that the non-verbal sound event comprises a human articulated impersonation of an animal sound, wherein the augmented reality graphical data output comprises a graphical depiction of an animal the subject of the impersonation, and the augmented reality effect command corresponds to a graphical data output comprising a depiction of the animal, the augmented reality environment generator being operable to generate the augmented reality graphical data output depicting the animal by implementing said augmented reality effect command (Takahashi ‘739 teaches at FIGS. 9-11 and Paragraph 0233-0235 that the bird that the user may be interested in is detected as the sound image object OB41….the moving image processing unit 23 performs a superimposition process of superimposing a mark MK11 (virtual bird) representing the sound image object OB41 and an arrow mark MK12 indicating the direction in which the sound image object OB41 is located on the moving image with sound….the sound image object OB41 as a bird is displayed on the display image and the separated sound “chirp chirp” of the sound image object OB41 is reproduced as a preproduction sound. Takahashi ‘739 teaches at FIG. 11 and Paragraph 0251-0254 that the bark is selected as the trigger, a still image is captured at a timing when the dog bark “bowwow” is detected as the separated sound of the sound object OB62 that is the dog).

Re Claim 11: 
The claim 11 is in parallel with the claim 1 in a method form. The claim 11 is subject to the same rationale of rejection as the claim 1.  
Re Claim 20: 
The claim 20 encompasses the same scope of invention as that of the claim 11 except additional claim limitation that the non-verbal sound event comprises a human articulated impersonation of an animal sound, wherein determining and generating an augmented reality effect command comprises determining and generating an augmented reality effect command corresponding to an augmented reality graphical data output depicting an animal associated with the animal sound the subject of the human- articulated impersonation, wherein the augmented reality effect command corresponds to the depicting of the animal, and wherein the augmented reality graphical data output comprises a graphical depiction of the animal the subject of the impersonation.
Takahashi ‘739 further teaches the claim limitation that the non-verbal sound event comprises a human articulated impersonation of an animal sound, wherein determining and generating an augmented reality effect command comprises determining and generating an augmented reality effect command corresponding to an augmented reality graphical data output depicting an animal associated with the animal sound the subject of the human- articulated impersonation, wherein the augmented reality effect command corresponds to the depicting of the animal, and wherein the augmented reality graphical data output comprises a graphical depiction of the animal the subject of the impersonation (Takahashi ‘739 teaches at FIGS. 9-11 and Paragraph 0233-0235 that the bird that the user may be interested in is detected as the sound image object OB41….the moving image processing unit 23 performs a superimposition process of superimposing a mark MK11 (virtual bird) representing the sound image object OB41 and an arrow mark MK12 indicating the direction in which the sound image object OB41 is located on the moving image with sound….the sound image object OB41 as a bird is displayed on the display image and the separated sound “chirp chirp” of the sound image object OB41 is reproduced as a preproduction sound. Takahashi ‘739 teaches at FIG. 11 and Paragraph 0251-0254 that the bark is selected as the trigger, a still image is captured at a timing when the dog bark “bowwow” is detected as the separated sound of the sound object OB62 that is the dog).

Re Claim 12: 
The claim 12 is in parallel with the claim 1 in the form of a non-transitory storage medium. The claim 12 is subject to the same rationale of rejection as the claim 1. The claim 12 further recites a non-transitory storage medium storing computer executable instructions which, when executed by a computer, cause the computer to perform a method in accordance with the claim 1. 
Brown further teaches the claim limitation of a non-transitory storage medium storing computer executable instructions which, when executed by a computer, cause the computer to perform a method in accordance with claim 11 (Brown FIG. 9 and Paragraph 0072 and Paragraph 0062 “audio processing can be conducted by a general purpose CPU or graphics processing unit”). 

Re Claim 2: 
The claim 2 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that implementing the augmented reality effect command comprises overlaying a graphical symbol corresponding to the non-verbal sound event identifier over a portion of an image in the augmented reality environment. 
Brown further teaches the claim limitation: the augmented reality effect comprises an overlay of a graphical symbol corresponding to the sound event identity over a portion of an image in the augmented reality environment (Brown teaches at FIG. 6 and Paragraph 0049-0051 that the sound source 740 is visible to the user in the display 800, highlighted by an area 810 surrounding the sound source 740…..alternatively, the object may be displayed to the user as an overlay on the virtual content that is currently being display to the user….an image element such as a simple icon, an exclamation mark or other symbol/image that may be used to identify an object may be displayed to indicate a detected sound or sound source. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound and at Paragraph 0065 that the image output unit 1140 is configured to output display images for display to a user of a HMD, the images comprising the generated image elements as an image overlay. The image output unit 1140 is configured to apply an image overlay to an existing video stream for output to the HMD.. 
Brown teaches at Paragraph 0052 the generated image elements can be indicative of a classification of a type of sound such as music or a news alert, a sound from electronic device or an alarm. Example properties of image elements used to indicate a sound source and its properties include color, intensity, size, shape, animation such as a flashing effect or motion and display location).
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the augmented reality effect overlay features of Brown’s augmented reality sound effect overlay to have provided an augmented reality graphical data output to have represented the sound sources to the HMD user. One of the ordinary skill in the art would have been motivated to have provided the augmented reality visual effect overlay to have represented the sound sources in the environment of the HMD user. 
Re Claim 3: 
The claim 3 encompasses the same scope of invention as that of the claim 2 except additional claim limitation that an image localization stage operable to identify a position of an object in the image and on the basis of which to place the overlay.
Brown further teaches the claim limitation: that an image localization stage operable to identify a position of an object in the image and on the basis of which to place the overlay (Brown teaches at Paragraph 0061-0064 that the sound processing unit 1120 is configured to analyze the sound information relating to the one or more sounds and the audio processing unit 1120 is configured to analyze the sound information received by the sound input unit 1110. Such an analysis is performed to determine the direction of the sound source relative to a current orientation of the HMD, the volume of the sound, or any other property of the sound and the analysis performed may also be able to identify a source, or at least a category of sound source to which the source belongs, for example, the analysis could identify whether the sound originates from a radio or a car or more general categories such as electronic device, or vehicle. 
Brown teaches at FIG. 6 and Paragraph 0049-0051 that the sound source 740 is visible to the user in the display 800, highlighted by an area 810 surrounding the sound source 740…..alternatively, the object may be displayed to the user as an overlay on the virtual content that is currently being display to the user….an image element such as a simple icon, an exclamation mark or other symbol/image that may be used to identify an object may be displayed to indicate a detected sound or sound source. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound and at Paragraph 0065 that the image output unit 1140 is configured to output display images for display to a user of a HMD, the images comprising the generated image elements as an image overlay. The image output unit 1140 is configured to apply an image overlay to an existing video stream for output to the HMD.. 
Brown teaches at Paragraph 0052 the generated image elements can be indicative of a classification of a type of sound such as music or a news alert, a sound from electronic device or an alarm. Example properties of image elements used to indicate a sound source and its properties include color, intensity, size, shape, animation such as a flashing effect or motion and display location).
Re Claim 5: 
The claim 5 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the sound processor is operable to determine a sound event identity on the basis of comparison with the one or more sound models.
Brown further teaches the claim limitation: that the sound processor is operable to determine a sound event identity on the basis of comparison with the one or more sound models (Brown teaches at Paragraph 0061-0064 that the sound processing unit 1120 is configured to analyze the sound information relating to the one or more sounds and the audio processing unit 1120 is configured to analyze the sound information received by the sound input unit 1110. Such an analysis is performed to determine the direction of the sound source relative to a current orientation of the HMD, the volume of the sound, or any other property of the sound and the analysis performed may also be able to identify a source, or at least a category of sound source to which the source belongs, for example, the analysis could identify whether the sound originates from a radio or a car or more general categories such as electronic device, or vehicle. 
Brown teaches at FIG. 6 and Paragraph 0049-0051 that the sound source 740 is visible to the user in the display 800, highlighted by an area 810 surrounding the sound source 740…..alternatively, the object may be displayed to the user as an overlay on the virtual content that is currently being display to the user….an image element such as a simple icon, an exclamation mark or other symbol/image that may be used to identify an object may be displayed to indicate a detected sound or sound source. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound and at Paragraph 0065 that the image output unit 1140 is configured to output display images for display to a user of a HMD, the images comprising the generated image elements as an image overlay. The image output unit 1140 is configured to apply an image overlay to an existing video stream for output to the HMD.. 
Brown teaches at Paragraph 0052 the generated image elements can be indicative of a classification of a type of sound such as music or a news alert, a sound from electronic device or an alarm. Example properties of image elements used to indicate a sound source and its properties include color, intensity, size, shape, animation such as a flashing effect or motion and display location).
Re Claim 6: 
The claim 6 encompasses the same scope of invention as that of the claim 1 except additional claim limitation of a sound localization processor for processing the audio data to obtain a sound event location identifier, corresponding to the non-verbal sound event identifier, indicating a direction of receipt of the acquired audio data.
Brown further teaches the claim limitation: of a sound localization processor for processing the audio data to obtain a sound event location identifier, corresponding to the non-verbal sound event identifier, indicating a direction of receipt of the acquired audio data (Brown teaches at Paragraph 0061-0064 that the sound processing unit 1120 is configured to analyze the sound information relating to the one or more sounds and the audio processing unit 1120 is configured to analyze the sound information received by the sound input unit 1110. Such an analysis is performed to determine the direction of the sound source relative to a current orientation of the HMD, the volume of the sound, or any other property of the sound and the analysis performed may also be able to identify a source, or at least a category of sound source to which the source belongs, for example, the analysis could identify whether the sound originates from a radio or a car or more general categories such as electronic device, or vehicle. 
Brown teaches at FIG. 6 and Paragraph 0049-0051 that the sound source 740 is visible to the user in the display 800, highlighted by an area 810 surrounding the sound source 740…..alternatively, the object may be displayed to the user as an overlay on the virtual content that is currently being displayed to the user….an image element such as a simple icon, an exclamation mark or other symbol/image that may be used to identify an object may be displayed to indicate a detected sound or sound source. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound and at Paragraph 0065 that the image output unit 1140 is configured to output display images for display to a user of a HMD, the images comprising the generated image elements as an image overlay. The image output unit 1140 is configured to apply an image overlay to an existing video stream for output to the HMD.. 
Brown teaches at Paragraph 0052 the generated image elements can be indicative of a classification of a type of sound such as music or a news alert, a sound from electronic device or an alarm. Example properties of image elements used to indicate a sound source and its properties include color, intensity, size, shape, animation such as a flashing effect or motion and display location).
Re Claim 7: 
The claim 7 encompasses the same scope of invention as that of the claim 3 except additional claim limitation that the augmented reality environment generator is operable to generate an augmented reality graphical data output based on a direction of receipt of the audio data.
Brown further teaches the claim limitation that the augmented reality environment generator is operable to generate an augmented reality graphical data output based on a direction of receipt of the audio data (Brown teaches at Paragraph 0061-0064 that the sound processing unit 1120 is configured to analyze the sound information relating to the one or more sounds and the audio processing unit 1120 is configured to analyze the sound information received by the sound input unit 1110. Such an analysis is performed to determine the direction of the sound source relative to a current orientation of the HMD, the volume of the sound, or any other property of the sound and the analysis performed may also be able to identify a source, or at least a category of sound source to which the source belongs, for example, the analysis could identify whether the sound originates from a radio or a car or more general categories such as electronic device, or vehicle. 
Brown teaches at FIG. 6 and Paragraph 0049-0051 that the sound source 740 is visible to the user in the display 800, highlighted by an area 810 surrounding the sound source 740…..alternatively, the object may be displayed to the user as an overlay on the virtual content that is currently being display to the user….an image element such as a simple icon, an exclamation mark or other symbol/image that may be used to identify an object may be displayed to indicate a detected sound or sound source. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound and at Paragraph 0065 that the image output unit 1140 is configured to output display images for display to a user of a HMD, the images comprising the generated image elements as an image overlay. The image output unit 1140 is configured to apply an image overlay to an existing video stream for output to the HMD.. 
Brown teaches at Paragraph 0052 the generated image elements can be indicative of a classification of a type of sound such as music or a news alert, a sound from electronic device or an alarm. Example properties of image elements used to indicate a sound source and its properties include color, intensity, size, shape, animation such as a flashing effect or motion and display location).
Re Claim 8: 
The claim 8 encompasses the same scope of invention as that of the claim 4 except additional claim limitation that the augmented reality environment generator is operable to generate an augmented reality graphical data output comprising an augmented reality effect in a position in the augmented reality environment based on a direction of receipt of the audio data.
Brown further teaches the claim limitation that the augmented reality environment generator is operable to generate an augmented reality graphical data output comprising an augmented reality effect in a position in the augmented reality environment based on a direction of receipt of the audio data (Brown teaches at Paragraph 0061-0064 that the sound processing unit 1120 is configured to analyze the sound information relating to the one or more sounds and the audio processing unit 1120 is configured to analyze the sound information received by the sound input unit 1110. Such an analysis is performed to determine the direction of the sound source relative to a current orientation of the HMD, the volume of the sound, or any other property of the sound and the analysis performed may also be able to identify a source, or at least a category of sound source to which the source belongs, for example, the analysis could identify whether the sound originates from a radio or a car or more general categories such as electronic device, or vehicle. 
Brown teaches at FIG. 6 and Paragraph 0049-0051 that the sound source 740 is visible to the user in the display 800, highlighted by an area 810 surrounding the sound source 740…..alternatively, the object may be displayed to the user as an overlay on the virtual content that is currently being display to the user….an image element such as a simple icon, an exclamation mark or other symbol/image that may be used to identify an object may be displayed to indicate a detected sound or sound source. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound and at Paragraph 0065 that the image output unit 1140 is configured to output display images for display to a user of a HMD, the images comprising the generated image elements as an image overlay. The image output unit 1140 is configured to apply an image overlay to an existing video stream for output to the HMD.. 
Brown teaches at Paragraph 0052 the generated image elements can be indicative of a classification of a type of sound such as music or a news alert, a sound from electronic device or an alarm. Example properties of image elements used to indicate a sound source and its properties include color, intensity, size, shape, animation such as a flashing effect or motion and display location).
Re Claim 9: 
The claim 9 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the augmented reality controller is operable to determine an augmented reality effect with reference to one or more augmented reality effect models, the augmented reality effect being determined on the basis of comparison with the one or more augmented reality effect models.
Brown further teaches the claim limitation that the augmented reality controller is operable to determine an augmented reality effect with reference to one or more augmented reality effect models, the augmented reality effect being determined on the basis of comparison with the one or more augmented reality effect models (Brown teaches at Paragraph 0061-0064 that the sound processing unit 1120 is configured to analyze the sound information relating to the one or more sounds and the audio processing unit 1120 is configured to analyze the sound information received by the sound input unit 1110. Such an analysis is performed to determine the direction of the sound source relative to a current orientation of the HMD, the volume of the sound, or any other property of the sound and the analysis performed may also be able to identify a source, or at least a category of sound source to which the source belongs, for example, the analysis could identify whether the sound originates from a radio or a car or more general categories such as electronic device, or vehicle. 
Brown teaches at FIG. 6 and Paragraph 0049-0051 that the sound source 740 is visible to the user in the display 800, highlighted by an area 810 surrounding the sound source 740…..alternatively, the object may be displayed to the user as an overlay on the virtual content that is currently being display to the user….an image element such as a simple icon, an exclamation mark or other symbol/image that may be used to identify an object may be displayed to indicate a detected sound or sound source. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound and at Paragraph 0065 that the image output unit 1140 is configured to output display images for display to a user of a HMD, the images comprising the generated image elements as an image overlay. The image output unit 1140 is configured to apply an image overlay to an existing video stream for output to the HMD.. 
Brown teaches at Paragraph 0052 the generated image elements can be indicative of a classification of a type of sound such as music or a news alert, a sound from electronic device or an alarm. Example properties of image elements used to indicate a sound source and its properties include color, intensity, size, shape, animation such as a flashing effect or motion and display location).
Re Claim 10: 
The claim 10 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the augmented reality effect command has a semantic correspondence with the non-verbal sound event identifier.
Brown further teaches the claim limitation that the augmented reality effect command has a semantic correspondence with the non-verbal sound event identifier (Brown teaches at Paragraph 0061-0064 that the sound processing unit 1120 is configured to analyze the sound information relating to the one or more sounds and the audio processing unit 1120 is configured to analyze the sound information received by the sound input unit 1110. Such an analysis is performed to determine the direction of the sound source relative to a current orientation of the HMD, the volume of the sound, or any other property of the sound and the analysis performed may also be able to identify a source, or at least a category of sound source to which the source belongs, for example, the analysis could identify whether the sound originates from a radio or a car or more general categories such as electronic device, or vehicle. 
Brown teaches at FIG. 6 and Paragraph 0049-0051 that the sound source 740 is visible to the user in the display 800, highlighted by an area 810 surrounding the sound source 740…..alternatively, the object may be displayed to the user as an overlay on the virtual content that is currently being display to the user….an image element such as a simple icon, an exclamation mark or other symbol/image that may be used to identify an object may be displayed to indicate a detected sound or sound source. 
Brown teaches at Paragraph 0064 that the image generating unit 1130 is configured to generate one or more image elements that indicate properties of analyzed sound information….in order to determine and/or generate appropriate image elements and their intended display position for representing the sound and the direction of the source of the sound and at Paragraph 0065 that the image output unit 1140 is configured to output display images for display to a user of a HMD, the images comprising the generated image elements as an image overlay. The image output unit 1140 is configured to apply an image overlay to an existing video stream for output to the HMD.. 
Brown teaches at Paragraph 0052 the generated image elements can be indicative of a classification of a type of sound such as music or a news alert, a sound from electronic device or an alarm. Example properties of image elements used to indicate a sound source and its properties include color, intensity, size, shape, animation such as a flashing effect or motion and display location). 
Re Claim 16: 
The claim 16 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the sound acquirer is configured to acquire the audio data from audio captured by a microphone.
Brown further teaches the claim limitation that the sound acquirer is configured to acquire the audio data from audio captured by a microphone (Brown teaches at Paragraph 0044 processing is performed by the HMD to identify the direction from which the sound originated relative to the position of the user 700 and at Paragraph 0047 that the indicated or detected direction can be relative to a current orientation of the head-mountable display device. If the detection of direction is by a microphone array or other directional sound detector at the HMD, then the direction of the sound source relative to the HMD can be directly obtained by such a detector). 

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Eubank et al. US-PGPUB No. 2021/0035597 (hereinafter Eubank based on the provisional application 62/880,559’s filing date) in view of Brown US-PGPUB No. 2020/0082842 (hereinafter Brown); Visser et al. US-PGPUB No. 2018/0020312 (hereinafter Visser); Takahashi ‘739 US-PGPUB No. 2021/0281739 (hereinafter Takahashi ‘739); Takahashi et al. US-PGPUB No. 2021/0201953 (hereinafter Takahashi ‘953); Gross US-PGPUB No. 2018/0108369 (hereinafter Gross) and Cahill et al. US-PGPUB No. 2016/0277863 (hereinafter Cahill). 
Re Claim 13: 
The claim 13 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the augmented reality effect command model comprises a machine learned model.
In the same field of endeavor, Gross teaches the claim limitation that the augmented reality effect command model comprises a machine learned model (Gross teaches at Paragraph 0016 that audio recognition system 210 may detect and record sounds, e.g., an animal, in vehicle 220. Audio recognition system 210 may be trained using machine learning neural network and at Paragraph 0027 that first neural network 350 and second neural network 360 may include one input layer, at least one hidden layer and one output layer and at Paragraph 0028 that process 400 may be utilized to train neural networks to achieve sound classification and recognition and at Paragraph 0030 that a set of different sounds may be provided for training, validation and testing of neural network learning…the set of sound may include child laughing or crying, dog barking or cat meowing and at Paragraph 0032 process 400 may involve processor 310 and second neural network 360 classifying the recorded sounds into a number of categories….the second neural network 360 may include multiple nodes and may adjust weight and bias factors (parameters) according to a back-propagation algorithm). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Gross’s trained multi-layered deep neural network to have classified the sounds into a number of categories into Brown’s sound processing unit 1120 to have classified the detected sounds into the one or more sound models (e.g., the radio sound model, the car sound model). One of the ordinary skill in the art would have utilized a machine learning as the one or more sound classification models. 
In the same field of endeavor, Takahashi ‘739 teaches the claim limitation that the augmented reality effect command model comprises a machine learned model (Takahashi ‘739 teaches at Paragraph 0135-0136 that acoustic event information are input to the neural network constituting the image object detector 51 and at Paragraph 0143 that the sound image object detector 53 is the neural network that takes…the sound object information as inputs and at Paragraph 0145 that the neural network may learn using a data set of the moving image with sound in which an image object and a sound object are associated with each other in advance and at Paragraph 0186 that the acoustic event information…and the sound of the moving image with sound are input to the neural network…so as to detect the sound object). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Takahashi ‘739’s trained neural network to have classified the sounds into a number of categories into Brown’s sound processing unit 1120 to have classified the detected sounds into the one or more sound models (e.g., the radio sound model, the car sound model). One of the ordinary skill in the art would have utilized a machine learning as the one or more sound classification models. 
However, Cahill further teaches the claim limitation that the augmented reality effect command model comprises a machine learned model (Cahill teaches at Paragraph 049-0056 that the classification module 408 includes a GMM-based machine learning model where the acoustic dimension of the sound event signature can be analyzed by each GMM to generate the supplemental data regarding the one or more event classification may be provided by an alert message via a user interface and the classification module 408 outputs the image frame with AR overlays depicting an acoustic heat map and/or the metadata for the event….different types of highlights, e.g., colors, images, symbols, and animations, e.g., blinking text, flashing symbols and other effects may be utilized to denote event regions). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated the augmented reality effect overlay features of Cahill’s augmented reality sound effect overlay to have provided an augmented reality graphical data output to have represented the sound sources to the HMD user. One of the ordinary skill in the art would have been motivated to have provided the augmented reality visual effect overlay to have represented the sound sources in the environment of the HMD user. 

Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Eubank et al. US-PGPUB No. 2021/0035597 (hereinafter Eubank based on the provisional application 62/880,559’s filing date) in view of Brown US-PGPUB No. 2020/0082842 (hereinafter Brown); Visser et al. US-PGPUB No. 2018/0020312 (hereinafter Visser); Takahashi ‘739 US-PGPUB No. 2021/0281739 (hereinafter Takahashi ‘739); Takahashi et al. US-PGPUB No. 2021/0201953 (hereinafter Takahashi ‘953); Gross US-PGPUB No. 2018/0108369 (hereinafter Gross)  and Clark et al. US-PGPUB No. 2019/0221035 (hereinafter Clark). 

Re Claim 14: 
The claim 14 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the sound acquirer is configured to acquire the audio data from a sound generation module. 
However Clark further teaches the claim limitation that the sound acquirer is configured to acquire the audio data from a sound generation module (
Clark teaches at Paragraph 0096-0099 that the physical object 1110 can generate sounds in the real world environment 100 detected by the VR application 345….also may detect other real world sounds such as the sound of airplanes, cars….e.g., howling of a wolf can be selected to mask a dog bark, a sound of a dragon breathing fire can be selected to mask a sound of an airplane….The VR application 345 can access audio data, e.g., digital sound tracks (sound models) from the audio library 440…..In response to detecting a dog bark…..the VR application 345 can manipulate images of the virtual object 1210 being presented in the VR environment 1010 to depict the virtual object 1210 howling. The VR application 345 can select a sound (model) from the audio library 440 that correlates to the identified sound in the VR environment 1010.  
Clark teaches at Paragraph 0100 that the VR application 345 can determine a virtual object to represent the source of the detected sound…if the detected sound is a sound of a bird, the VR application 345 can present a bird flying in the VR environment 1010…..if the detected sound is a sound of a plane, the VR application 345 can present a plane flying in the VR environment 1010 by selectively amplifying and/or applying sound effects to the detected sound. 
Clark teaches at Paragraph 0101-0102 the VR application 345 can selectively control the volume of the generated sounds across a plurality of audio channels to produce audio stereo imaging effects that cause the user to perceive the generated sounds as being emanated at a spatial location where the physical object 1110 is located…..that if the physical object 1110 begins barking and moves into the real world environment 100 while continuing to bark….the VR application 345 can selectively adjust a volume of the sound of the wolf howling as the sound pressure level continues to increase. If the sound pressure level of the barking decreases, the VR application 345 can selectively decrease a volume of the sound of the wolf howling….similarly, the sound pressure level detected for a plane flying overhead may begin at a low volume, increase as the plane approaches the real world environment 100 and decrease after the plane passes. The VR application 345 can selectively adjust the volume of the moving steam locomotive/train based on the changes in the sound pressure level of the detected sound of the plane….Also, the VR application 345 can produce audio stereo imaging effects so that the sound of the moving locomotive/train is perceived by the user as being emanated from a same spatial direction where the plane is located and selectively control the volume of the generated sounds….the audio stereo imaging effects can cause the sound to be perceived by the user to be emanating in the VR environment from an object that is moving from left to right. 
Clark teaches at Paragraph 0104 that the VR application 345 can generate a sound of a rattle snake and increase the volume of that sound…the VR application 345 can manipulate (control/command) the image of the rattle snake to depict the rattle snake getting ready to strike or striking). 
As Clark’s VR application 345 is configured as AR application in FIG. 7 in the mixed reality environment, Clark implicitly teaches the claim limitation: 
An augmented reality controller for determining and generating an augmented reality effect command by: inputting at least the non-verbal sound event identifier into an augmented reality effect command model; and receiving the augmented reality effect command from the augmented reality effect command model (The VR application 345 can be configured as AR application in a see-through HMD and is mapped to the claimed augmented reality effect command model that generates the augmented reality effects/commands/controls. 
Clark teaches at Paragraph 0095 that the virtual object 1210 can be configured to be manipulated (controlled/commanded) by the VR application 345 to walk, run, jump, fly etc. in the VR environment. 
Clark teaches at Paragraph 0096-0099 that the physical object 1110 can generate sounds in the real world environment 100 detected by the VR application 345….also may detect other real world sounds such as the sound of airplanes, cars….e.g., howling of a wolf can be selected to mask a dog bark, a sound of a dragon breathing fire can be selected to mask a sound of an airplane….The VR application 345 can access audio data, e.g., digital sound tracks (sound models) from the audio library 440…..In response to detecting a dog bark…..the VR application 345 can manipulate images of the virtual object 1210 being presented in the VR environment 1010 to depict the virtual object 1210 howling. The VR application 345 can select a sound (model) from the audio library 440 that correlates to the identified sound in the VR environment 1010.  
Clark teaches at Paragraph 0100 that the VR application 345 can determine a virtual object to represent the source of the detected sound…if the detected sound is a sound of a bird, the VR application 345 can present a bird flying in the VR environment 1010…..if the detected sound is a sound of a plane, the VR application 345 can present a plane flying in the VR environment 1010 by selectively amplifying and/or applying sound effects to the detected sound. 
Clark teaches at Paragraph 0101-0102 the VR application 345 can selectively control the volume of the generated sounds across a plurality of audio channels to produce audio stereo imaging effects that cause the user to perceive the generated sounds as being emanated at a spatial location where the physical object 1110 is located…..that if the physical object 1110 begins barking and moves into the real world environment 100 while continuing to bark….the VR application 345 can selectively adjust a volume of the sound of the wolf howling as the sound pressure level continues to increase. If the sound pressure level of the barking decreases, the VR application 345 can selectively decrease a volume of the sound of the wolf howling….similarly, the sound pressure level detected for a plane flying overhead may begin at a low volume, increase as the plane approaches the real world environment 100 and decrease after the plane passes. The VR application 345 can selectively adjust the volume of the moving steam locomotive/train based on the changes in the sound pressure level of the detected sound of the plane….Also, the VR application 345 can produce audio stereo imaging effects so that the sound of the moving locomotive/train is perceived by the user as being emanated from a same spatial direction where the plane is located and selectively control the volume of the generated sounds….the audio stereo imaging effects can cause the sound to be perceived by the user to be emanating in the VR environment from an object that is moving from left to right. 
Clark teaches at Paragraph 0104 that the VR application 345 can generate a sound of a rattle snake and increase the volume of that sound…the VR application 345 can manipulate (control/command) the image of the rattle snake to depict the rattle snake getting ready to strike or striking). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have detected the sounds relating to the various objects in the physical environment and to have provided the corresponding augmented reality effects in response to the detected sounds of the various objects by the virtual/augmented reality application. One of the ordinary skill in the art would have been motivated to have detected the sounds based on the sound models in the sound library/database and to have provided the corresponding AR effects/controls in response to the detected sounds. 
RE Claim 15: 
The claim 15 encompasses the same scope of invention as that of the claim 14 except additional claim limitation that the computer system comprises the sound generation module and the sound generation module is configured to generate a sound for a virtual sound environment.
However Clark further teaches the claim limitation that the computer system comprises the sound generation module and the sound generation module is configured to generate a sound for a virtual sound environment (
Clark teaches at Paragraph 0096-0099 that the physical object 1110 can generate sounds in the real world environment 100 detected by the VR application 345….also may detect other real world sounds such as the sound of airplanes, cars….e.g., howling of a wolf can be selected to mask a dog bark, a sound of a dragon breathing fire can be selected to mask a sound of an airplane….The VR application 345 can access audio data, e.g., digital sound tracks (sound models) from the audio library 440…..In response to detecting a dog bark…..the VR application 345 can manipulate images of the virtual object 1210 being presented in the VR environment 1010 to depict the virtual object 1210 howling. The VR application 345 can select a sound (model) from the audio library 440 that correlates to the identified sound in the VR environment 1010.  
Clark teaches at Paragraph 0100 that the VR application 345 can determine a virtual object to represent the source of the detected sound…if the detected sound is a sound of a bird, the VR application 345 can present a bird flying in the VR environment 1010…..if the detected sound is a sound of a plane, the VR application 345 can present a plane flying in the VR environment 1010 by selectively amplifying and/or applying sound effects to the detected sound. 
Clark teaches at Paragraph 0101-0102 the VR application 345 can selectively control the volume of the generated sounds across a plurality of audio channels to produce audio stereo imaging effects that cause the user to perceive the generated sounds as being emanated at a spatial location where the physical object 1110 is located…..that if the physical object 1110 begins barking and moves into the real world environment 100 while continuing to bark….the VR application 345 can selectively adjust a volume of the sound of the wolf howling as the sound pressure level continues to increase. If the sound pressure level of the barking decreases, the VR application 345 can selectively decrease a volume of the sound of the wolf howling….similarly, the sound pressure level detected for a plane flying overhead may begin at a low volume, increase as the plane approaches the real world environment 100 and decrease after the plane passes. The VR application 345 can selectively adjust the volume of the moving steam locomotive/train based on the changes in the sound pressure level of the detected sound of the plane….Also, the VR application 345 can produce audio stereo imaging effects so that the sound of the moving locomotive/train is perceived by the user as being emanated from a same spatial direction where the plane is located and selectively control the volume of the generated sounds….the audio stereo imaging effects can cause the sound to be perceived by the user to be emanating in the VR environment from an object that is moving from left to right. 
Clark teaches at Paragraph 0104 that the VR application 345 can generate a sound of a rattle snake and increase the volume of that sound…the VR application 345 can manipulate (control/command) the image of the rattle snake to depict the rattle snake getting ready to strike or striking). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have detected the sounds relating to the various objects in the physical environment and to have provided the corresponding augmented reality effects in response to the detected sounds of the various objects by the virtual/augmented reality application. One of the ordinary skill in the art would have been motivated to have detected the sounds based on the sound models in the sound library/database and to have provided the corresponding AR effects/controls in response to the detected sounds. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIN CHENG WANG whose telephone number is (571)272-7665. The examiner can normally be reached Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIN CHENG WANG/Primary Examiner, Art Unit 2613