Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2019-0105999, filed on 08/28/2019.
Drawings
The drawing submitted on 10/10/2019 is being considered by the examiner.
Response to Amendment
Claims 1-20 are currently pending in the application. Among the claims 1-20, claims 1, 11, and 19 has been amended and claims 6-7, and 16-17 has been cancelled.
Response to Arguments
Applicant arguments corresponding to Claims 1, 11, and 19 are considered but are not persuasive for the following reasons below:
Applicant Argument 1: Regarding Keane, Keane is merely relied upon by the Office Action for is generic disclosure of adjusting the gain of a first transducer system relative to a second transducer system, when the second transducer system has a better speech-to-noise ratio than the first transducer system. Keane is NOT relevant to the embodied invention and appears to only have been referenced for its generic mention of the word "gain."
Examiner Response 1: Examiner agree with the applicant for use of the Kean et al. reference and further found out based on further interpretation of claims in light of the specification that Park et al. expressly teach the limitation for which the Kean reference was used and thus the Kean reference is not necessary to be used in the rejection of the arguing limitation. Therefore the use of Kean reference was withdrawn based on the teaching noted in the rejection of Park et al. which will be further reflected in the final office action below. 
Similarly Rosenberg teaching in the rejection of claim 19 has been withdrawn since Park et al. clearly teaches output means for outputting contents to be a speaker and examiner inadvertently uses the Rosenberg teaching only for the limitation of “speaker” for outputting the content.

Applicant Argument 2: Howard does NOT determine whether the intention analysis information is obtained from the text data and corresponds to the trigger sound based on the intention analysis information. 
Examiner Response 2: Examiner respectfully disagree with the applicant generalized assertion on Howard et al. teaching. Howard et al. inherently teach intention analysis information is obtained from the text data and corresponds to the trigger sound based on the intention analysis information. The examiner office action in the rejection of the claims clearly states the teaching and examiner is further presenting additional teaching paragraphs of Howard et al. in explaining the cited teaching limitation (in the office action) with respect to the applicant’s argument as below.
Howard et al. teach in [0004] The computer-based system uses both the acoustic characteristics of the obtained utterance and recognized text of the obtained utterance to determine whether the follow on question is directed towards the computer-based system and [0005] Specifically, the content includes a determination that the audience for the human speech is likely directed towards the automated assistant server. The classification system includes a speech recognizer, a transcription representation generator, an acoustic feature generator, a concatenation module, and a classifier to perform this determination function. The speech recognizer can obtain utterance information spoken by a user and generate a transcription of the spoken utterance from the user. The transcription representation generator can receive the transcription of the spoken utterance from the speech recognizer and output transcriptions including word embeddings. [0007] …receiving audio data corresponding to an utterance; obtaining a transcription of the utterance, generating a representation of the audio data; generating a representation of the transcription of the utterance; providing (i) the representation of the audio data and (ii) the representation of the transcription of the utterance to a classifier that, based on a given representation of audio data and a given representation of a transcription of an utterance, is trained to output an indication of whether the utterance associated with the given representation is likely directed to an automated assistant or is likely not directed to an automated assistant; [0021] Advantageously, the technique uses neural networks for both the acoustic characteristics and the recognized text of the obtained utterance to train a neural network to produce an indication of whether the audience for the obtained utterance is likely directed towards the computer-based system.
Therefor from the above teaching it is very clear that the neural network device is been trained to recognized from a text transcription of an utterance, an intention of the utterance. Further if the neural network device does not recognize or understand a user utterance or command (i.e. intention of the command or utterance) which corresponds to an user intention of the utterance or command, then the neural network device would not be able to execute the user utterance corresponding to an action which user utterance or command is intended to and which is also the neural network device is trained for “to generate an action based on understanding of an user utterance or command corresponding to an intention or intended action.” 
Below paragraph further clearly proves that Howard et al. teach the limitation as applicant arguing: [0025]   For example, phrases such as "What time is it" may be included in phrases such as "Hey mom, I'm late for school, what time is it" that the classifier server 108 may obtain.” [0026]“In summary, the classifier server 108 can judge the likelihood and provide an indication that the audience for the obtained utterance is likely directed to the automated assistant server 116.” [0027] “In response, the classifier server 108 can provide data, indicating instructions and the obtained utterance, to the automated assistant server 116 over a network such as network 114. The instructions request that the automated assistant server 116 process the obtained utterance and generate a response to the obtained utterance.” [0028] In particular, the automated assistant server 116 can provide an answer to the questions and/or statements provided by the classifier server 108. For example, the automated assistant server 116 may obtain data indicating an utterance and instructions that require the automated assistant server 116 to process the utterance. The automated assistant server 116 determines that the utterance recites, "What time is it" and generates a response to the utterance. For example, the automated assistance server 116 determines the time is "6:02 PM" and generates a response 113 to provide to the classifier server 108 over a network 114. The response 113 may include the answer that recites, "The time is 6:02 PM.”
Again, if the automated assistance server does not understand the question (intention) which is the user intention for the command or utterance, the automated assistance server cannot response or provide an answer corresponding to the intention of the question of the user. 
Therefore even Howard et al. do not explicitly uses the word “intention analysis” in the teaching but “intention analysis information is obtained from the text data and corresponds to the trigger sound” or intention analysis of the user question has to be done through understanding and analyzing the question in order to generate an appropriate response to the user question or command or utterance.

Further, Howard et al. and Liu et al. examiner uses in the rejection to show that the limitation of claims 6-7 are very well-known. 
Further Liu et al. clearly teaches the intention analysis of an utterance or request or command or question in order to generate an appropriate response, which further proves the Howard et al. inherent teaching of intention analysis. Liu et al. teach: [0035] For example, automatic speech recognition (ASR) logic 205 may be configured to convert input audio into text. When needed, the automatic speech recognition (ASR) logic 205 may convert the input 204 into text that the virtual agent 202 can understand. In order to convert an audio recording to text, the ASR logic 205 may utilize an acoustic model, an ASR class-based language model, and a recognition lexicon model. [0039] For example, a natural language understanding module 223 may take the input 204 and determine what the user is asking about, e.g. is the correct response a time, a place, a meeting request, a reservation, etc. It might use a classifier that classifies the intent of the request. That classifier might be trained on social network data. [0041] The dialog module 225 may use the natural language understanding module 223 to determine what kind of information the user is looking for based on the intent. [0044] Once the dialog module 225 has all the information it needs to respond to a query or act on a request, the dialog module 225 may call on the natural language generator 224 to form the responses that answer the original inquiry or request.

Applicant Argument 3: Liu does NOT determine whether the audio data corresponds to the trigger sound based on the intention analysis information, as recited in amended claim 1.
Examiner Response 3: Examiner respectfully disagree and would like to present that Liu et al. teaches the limitation that applicant arguing. Liu et al. in [0035] For example, automatic speech recognition (ASR) logic 205 may be configured to convert input audio into text. When needed, the automatic speech recognition (ASR) logic 205 may convert the input 204 into text that the virtual agent 202 can understand. In order to convert an audio recording to text, the ASR logic 205 may utilize an acoustic model, an ASR class-based language model, and a recognition lexicon model. [0039] For example, a natural language understanding module 223 may take the input 204 and determine what the user is asking about, e.g. is the correct response a time, a place, a meeting request, a reservation, etc. It might use a classifier that classifies the intent of the request. That classifier might be trained on social network data. [0041] The dialog module 225 may use the natural language understanding module 223 to determine what kind of information the user is looking for based on the intent. [0044] Once the dialog module 225 has all the information it needs to respond to a query or act on a request, the dialog module 225 may call on the natural language generator 224 to form the responses that answer the original inquiry or request.
The above teaching inherently teaches the user input which could be interpreted as “trigger sound” since the utterance or request cause (trigger) the system to process the request through natural language understanding to determine the question and answer, therefore the request could be interpreted as trigger sound.
Further Howard et al. clearly teaches analysis of a trigger sound (request) to verify whether the request/trigger sound toward the automatic assistance device for processing or not. Even the request does not contain a trigger sound, the automatic assistance server can still determine based on the analysis of the utterance or speech whether the utterance was a request directed toward the automatic assistance system to process a response or action.
Therefore trigger sound could be any sound or speech or utterance or request or question that automatic voice recognition system or devices receives and thus cause (trigger) the system to interpret the utterance and to take an action based on the input sound or speech or utterance or request.
Thus the examiner believe the Liu et al. also specifically teaches as like Howard et al. “determine whether the audio data corresponds to the trigger sound based on the intention analysis information”.
Therefore the applicant arguments with respect to Howard et al. and Liu et al. teaching is not persuasive and the limitation of claims 6-7, which is now incorporated in the independent claims 1 rejection remains same but without including the teaching of K

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-4,  9-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al.(US 2016/0360384 A1) in view of Howard et al. (US 2019/0035390 A1) further in view of Liu et al. (US 2019/0189126 A1).

Regarding Claims 1, and 11, Park et al. teaches:  An artificial intelligence device for providing a notification to a user using audio data, the artificial intelligence device comprising ([0005] However, since the user uses user's senses of sight and hearing to enjoy contents through the body-mounted electronic device, the user may have difficulty in detecting a dangerous situation which is caused by an external environment or a situation that needs a notification. [0008] In accordance with an aspect of the present disclosure, an electronic device is provided. The electronic device includes a sensor module, a memory, and a processor electrically connected to the sensor module and the memory. The processor is configured to reproduce a content, acquire ambient environment information through the sensor module or an external electronic device, determine at least one attribute regarding notification information corresponding to the ambient environment information, and provide the notification information in connection with the content based on the at least one attribute.  [0042] Hereinafter, an electronic device according to various embodiments will be described with reference to the accompanying drawings. As used herein, the term "user" may indicate a person who uses an electronic device or a device (e.g., an artificial intelligence electronic device) that uses an electronic device.): a memory configured to store a trigger sound (notification event detection variable or pre-stored notification data or notification event) for notifying a user and information about a notification (notification information corresponding to a notification event) corresponding to the trigger sound; at least one microphone (sensor module 504) configured to receive audio data ([0109] According to an embodiment of the present disclosure, the processor 502 may detect generation of a notification event based on ambient environment information which is collected through at least one of the sensor module 504 and the communication module 512. For example, the processor 502 may determine whether a notification event is generated or not by comparing ambient environment information detected through the sensor module 504 and a notification event detection variable stored in the memory 506. For example, the processor 502 may determine that the notification event is generated when ambient environment information is received from an external electronic device through the communication module 512. For example, the processor 502 may collect ambient environment information of the electronic device 500 by activating at least one of the sensor module 504 and the communication module 502 in response to the content being reproduced. Herein, the ambient environment information may include at least one of an ambient image, an ambient sound, a motion of an ambient object, an ambient smell, ambient temperature/humidity, ambient lighting, ambient ultraviolet rays, and information related to the external electronic device. The notification event detection variable may be data which is pre-defined to detect a notification event, and may be pre-stored in the memory 500 or received from an external server. The processor 502 may add, delete, or change the notification event detection variable based on input information received through the input module 508. [0111] For example, the sensor module 504 may include a microphone sensor… [0112] For example, the memory 506 may store at least one of the notification event detection variable, the notification information corresponding to the notification event, and the attribute regarding the notification information. Herein, the notification event detection variable may include reference data to be compared with the ambient environment information to detect the generation of the notification event. [0121] According to various embodiments of the present disclosure, the notification information may include at least one of ambient environment information, pre-stored notification data, or notification data which is received from the external electronic device. [0143] For example, the processor 500 may detect a notification event by comparing the characteristic point of the ambient sound and a notification event detection variable stored in the memory 506. When the characteristic point of the ambient sound and the notification event detection variable are determined to be similar to each other, the processor 502 may determine that the notification event is generated.); a processor configured to: change a volume gain of the microphone based on a noise level of the audio data received from the at least one microphone ([0160] Referring to FIG. 11, in operation 1101, the electronic device (for example, the electronic device 101, 201, or 500) may detect an output volume of notification information corresponding to a notification event. Herein, the output volume of the notification information may be determined to correspond to a volume of an ambient sound detected by the sensor module 504 or may include a volume fixed at a predetermined level. [0161] In operation 1103, the processor 502 may detect an output volume of a content corresponding to time when the notification event is generated. For example, when a notification event is generated by baby's crying, the processor 502 of the electronic device 500 may detect the volume of the content from the time when the baby's crying is detected from the content which is being reproduced.  [0162] In operation 1105, the electronic device may determine whether the user can recognize the notification information based on the output volume of the content detected in operation 1103 and the output volume of the notification information. [0163] When the electronic device determines that the user cannot recognize the notification information corresponding to the notification event, the electronic device may adjust the output volume of the notification information in operation 1107. For example, the processor 502 of the electronic device 500 may increase the output volume of the notification information at a predetermined rate. In this case, the processor 502 may increase the output volume of the notification information based on the output volume of the content detected in operation 1103.), and in response to the audio data received from the at least one microphone corresponding to the trigger sound, extract the notification corresponding to the trigger sound; and an outputter configured to output the notification ([0153] Referring to FIG. 10, in operation 1001, the electronic device (for example, the electronic device 101, 201, or 500) may determine an audio extraction time of a content corresponding to time when a notification event is generated. For example, when generation of a notification event corresponding to a baby's crying is detected, the processor 502 of the electronic device 500 may determine the time when the baby's crying is detected as the audio extraction time of the content.  [0154] In operation 1003, the electronic device may extract audio data of the content corresponding to the audio extraction time. For example, when the time of detection of the baby's crying is determined as the audio extraction time of the content in operation 1001, the processor 502 of the electronic device 500 may extract the audio data of the content corresponding to the time of detection of the baby's crying. [0157] In operation 1009, when the similarity between the notification information and the audio data of the content satisfies the reference similarity, the electronic device may determine the audio extraction time of the content as an output time of the notification information [0166] Referring to FIG. 12, in operation 1201, the electronic device (for example, the electronic device 101, 201, or 500) may output audio data of a content which is mixed with notification information of an audio form. For example, the processor 502 of the electronic device 502 may mix a baby's crying corresponding to generation of a notification event and audio data of a content, and output the content. [0169] For example, when baby's crying is detected while the content is being reproduced, the processor 502 may generate the vibration corresponding to the baby's crying three times.).
Park et al. do not explicitly teach: wherein the processor is further configured to: provide the audio data to a voice recognition model for generating text data based on the audio data and determine whether the audio data corresponds to the trigger sound based on the text data, and wherein the processor is further configured to: acquire intention analysis information about the text data, and determine whether the audio data corresponds to the trigger sound based on the intention analysis information.
Howard et al. teach: provide the audio data to a voice recognition model for generating text data based on the audio data and determine whether the audio data corresponds to the trigger sound (hotword from acoustical properties corresponding to a question) based on the text data ([0005] The speech recognizer can obtain utterance information spoken by a user and generate a transcription of the spoken utterance from the user. The transcription representation generator can receive the transcription of the spoken utterance from the speech recognizer and output transcriptions including word embeddings. [0011] In some implementations, the method further comprises receiving, from a speech recognizer at a word-embedding model, recognizable text corresponding to the utterance; generating, at the word-embedding model, the transcription of the utterance from the recognizable text; and providing, from the word-embedding model, the transcription of the utterance to the classifier. [0027] “In response, the classifier server 108 can provide data, indicating instructions and the obtained utterance, to the automated assistant server 116 over a network such as network 114. The instructions request that the automated assistant server 116 process the obtained utterance and generate a response to the obtained utterance.” [0028] In particular, the automated assistant server 116 can provide an answer to the questions and/or statements provided by the classifier server 108. [0031] The classifier server 108 can detect the hotword from acoustical properties of the spoken question and process the question "what should I wear today?").
Therefore it would have been obvious to one of the ordinary skilled in the art before the effective filling date of the invention was made for Park et al. to include the teaching of Howard et al. in order to for the automated assistant server to understand and response to a user question.
Park et al. in view of Howard et al. do not explicitly teach: acquire intention analysis information about the text data, and determine whether the audio data corresponds to the trigger sound based on the intention analysis information.
Liu et al. teach: acquire intention analysis information about the text data, and determine whether the audio data corresponds to the trigger sound based on the intention analysis information ([0035] For example, automatic speech recognition (ASR) logic 205 may be configured to convert input audio into text. When needed, the automatic speech recognition (ASR) logic 205 may convert the input 204 into text that the virtual agent 202 can understand. In order to convert an audio recording to text, the ASR logic 205 may utilize an acoustic model, an ASR class-based language model, and a recognition lexicon model. [0039] For example, a natural language understanding module 223 may take the input 204 and determine what the user is asking about, e.g. is the correct response a time, a place, a meeting request, a reservation, etc. It might use a classifier that classifies the intent of the request. That classifier might be trained on social network data. [0041] The dialog module 225 may use the natural language understanding module 223 to determine what kind of information the user is looking for based on the intent. [0044] Once the dialog module 225 has all the information it needs to respond to a query or act on a request, the dialog module 225 may call on the natural language generator 224 to form the responses that answer the original inquiry or request. [0069] The logic 400 may apply speech recognition and natural language understanding to determine the intent of the request at bock 404. If the request is received as audio data, the virtual agent may use ASR logic 205 and speech recognition module 221 to convert the request to text. Once in text form, the virtual agent may use the dialog module 225 and the natural language understanding module 223 to determine an intent of the request.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Park et al. in view of the Howard et al. to include the teaching of Liu et al. in order to respond to a query or act on a request by a user.

Regarding Claims 2 and 12, Park et al. teaches: in response to the noise level (baby’s cry) of the audio data received from the at least one microphone being less than or equal to a predetermined noise level (output volume of the content greater than the output volume of the notification information by more than a predetermined ratio, i.e. detected baby’s cry notification information volume is less than output content volume), increase the volume gain (increase the output volume of the notification information) of the microphone ([0162] In operation 1105, the electronic device may determine whether the user can recognize the notification information based on the output volume of the content detected in operation 1103 and the output volume of the notification information. For example, when the output volume of the content detected in operation 1103 is greater than the output volume of the notification information by more than a predetermined ratio, the processor 502 of the electronic device 500 may determine that the user cannot recognize the notification information. When the output volume of the content detected in operation 1103 is greater (or smaller) than the output volume of the notification information by less than the predetermined ratio, the processor 502 may determine that the user can recognize the notification information. [0163] When the electronic device determines that the user cannot recognize the notification information corresponding to the notification event, the electronic device may adjust the output volume of the notification information in operation 1107. For example, the processor 502 of the electronic device 500 may increase the output volume of the notification information at a predetermined rate. In this case, the processor 502 may increase the output volume of the notification information based on the output volume of the content detected in operation 1103.)

Regarding Claim 3, Park et al. teaches: The artificial intelligence device of claim 1, wherein the outputter outputs the audio data received from the microphone and the notification together ([0166] Referring to FIG. 12, in operation 1201, the electronic device (for example, the electronic device 101, 201, or 500) may output audio data of a content which is mixed with notification information of an audio form. For example, the processor 502 of the electronic device 502 may mix a baby's crying corresponding to generation of a notification event and audio data of a content, and output the content.).

Regarding Claims 4 and 14, Park et al. teaches: The artificial intelligence device of claim 1, wherein the at least one microphone includes a plurality of microphones (beamforming microphone sensors) and the processor determines a sound source direction of an audio source from audio data received from each of the plurality of microphones, and wherein the outputter outputs information regarding the sound source direction when outputting the notification ( [0129] In operation 603, the electronic device may detect ambient environment information of the electronic device. For example, the processor 502 may detect ambient environment information of the electronic device 500 through the sensor module 504. For example, the processor 502 may detect an ambient sound in a specific direction using reception beamforming technology or a beamforming microphone sensor. [0132] In operation 609, the electronic device may output the content and the notification information corresponding to the notification event based on the attribute regarding the notification information, and output the content. For example, the processor 502 may mix the content and the notification information of a vibration form based on a notification output variable, and output the content. [0174] For example, when the audio data of the content and the notification information of the audio form corresponding to the notification event are being mixed, the processor 502 of the electronic device 500 may display a direction in which the notification information is detected as shown in FIG. 14C.).


Regarding Claim 9, Park et al. teaches:  The artificial intelligence device of claim I, wherein the trigger sound includes at least one of a name of the user, a beep sound, a voice command, or a siren sound ([0149] Herein, the notification event detection variable may include a predetermined name and a predetermined title (for example, mother or baby) for causing a notification to be generated.).

Regarding Claim 10, Park et al. teaches:  The artificial intelligence device of claim 1, wherein the audio data corresponding to the trigger sound is received while the outputter is outputting music ([0114] The output module 510 may output various contents to the user. Herein, the content may include an audio, a text, an image, a video, an icon, a symbol, etc. [0115] According to an embodiment of the present disclosure, the output module 510 may output the notification information corresponding to the generation of the notification event while the content is being reproduced. For example, the output module 510 may mix audio data of the content and the notification information (for example, a notification sound) corresponding to the notification event, and output the content. [0172] Referring to FIG. 13, in operation 1301, the electronic device (for example, the electronic device 101, 201, or 500) may output audio data of a content which is mixed with notification information of an audio form. For example, the processor 502 of the electronic device 500 may control to mix a doorbell sound corresponding to generation of a notification event and music of a content, and output the content. ).

Regarding Claim 13, Park et al. teaches:  The method of claim 11, wherein the outputting includes outputting the audio data received from the at least one microphone (See rejection of claim 1).

Regarding Claim 19, Park et al. teaches:  A device for providing a notification to a user based on artificial intelligence, the device comprising ([0008] In accordance with an aspect of the present disclosure, an electronic device is provided. The electronic device includes a sensor module, a memory, and a processor electrically connected to the sensor module and the memory. The processor is configured to reproduce a content, acquire ambient environment information through the sensor module or an external electronic device, determine at least one attribute regarding notification information corresponding to the ambient environment information, and provide the notification information in connection with the content based on the at least one attribute.  [0042] Hereinafter, an electronic device according to various embodiments will be described with reference to the accompanying drawings. As used herein, the term "user" may indicate a person who uses an electronic device or a device (e.g., an artificial intelligence electronic device) that uses an electronic device.): a memory configured to store a trigger sound for notifying a user; at least microphone configured to receive audio data([0109] According to an embodiment of the present disclosure, the processor 502 may detect generation of a notification event based on ambient environment information which is collected through at least one of the sensor module 504 and the communication module 512. For example, the processor 502 may determine whether a notification event is generated or not by comparing ambient environment information detected through the sensor module 504 and a notification event detection variable stored in the memory 506. For example, the processor 502 may collect ambient environment information of the electronic device 500 by activating at least one of the sensor module 504 and the communication module 502 in response to the content being reproduced. Herein, the ambient environment information may include at least one of an ambient image, an ambient sound, a motion of an ambient object, an ambient smell, ambient temperature/humidity, ambient lighting, ambient ultraviolet rays, and information related to the external electronic device. The notification event detection variable may be data which is pre-defined to detect a notification event, and may be pre-stored in the memory 500 or received from an external server. The processor 502 may add, delete, or change the notification event detection variable based on input information received through the input module 508. [0111] For example, the sensor module 504 may include a microphone sensor… [0112] For example, the memory 506 may store at least one of the notification event detection variable, the notification information corresponding to the notification event, and the attribute regarding the notification information. Herein, the notification event detection variable may include reference data to be compared with the ambient environment information to detect the generation of the notification event. [0121] According to various embodiments of the present disclosure, the notification information may include at least one of ambient environment information, pre-stored notification data, or notification data which is received from the external electronic device.); at least one speaker configured to output audio content ([0102] According to various embodiments of the present disclosure, the display 430 of the HMD may include an inputting means for receiving an input of a user's control command and an outputting means (for example, a speaker) for outputting audio data although they are not illustrated. Herein, the inputting means may include an input interface, a microphone, a camera, an ultrasonic sensor, etc. [0107] Referring to FIG. 5, the electronic device 500 may include a processor 502, a sensor module 504, a memory 506, an input module 508, and an output module 510. [0108] The processor 502 may reproduce a content. For example, the processor 502 may control the output module 510 to output at least one of video data or audio data corresponding to the reproduced content. [0114] The output module 510 may output various contents to the user. Herein, the content may include an audio, a text, an image, a video, an icon, a symbol, etc.); and a controller (processor) configured to: receive the audio data from the at least one microphone corresponding to the trigger sound while at least one speaker outputting the audio content ([0102] According to various embodiments of the present disclosure, the display 430 of the HMD may include an inputting means for receiving an input of a user's control command and an outputting means (for example, a speaker) for outputting audio data although they are not illustrated. [0115] According to an embodiment of the present disclosure, the output module 510 may output the notification information corresponding to the generation of the notification event while the content is being reproduced. For example, the output module 510 may mix audio data of the content and the notification information (for example, a notification sound) corresponding to the notification event, and output the content. [0169] According to various embodiments of the present disclosure, the electronic device may generate a vibration of a vibration pattern or a vibration intensity corresponding to the notification event while the content is being reproduced. For example, when baby's crying is detected while the content is being reproduced, the processor 502 may generate the vibration corresponding to the baby's crying three times.), and output an audio notification via the at least one speaker based on the trigger sound received by the at least microphone and learning data (volume of an ambient sound) corresponding to one or more trigger sounds ([0160] Referring to FIG. 11, in operation 1101, the electronic device (for example, the electronic device 101, 201, or 500) may detect an output volume of notification information corresponding to a notification event. Herein, the output volume of the notification information may be determined to correspond to a volume of an ambient sound detected by the sensor module 504 or may include a volume fixed at a predetermined level. [0161] In operation 1103, the processor 502 may detect an output volume of a content corresponding to time when the notification event is generated. For example, when a notification event is generated by baby's crying, the processor 502 of the electronic device 500 may detect the volume of the content from the time when the baby's crying is detected from the content which is being reproduced.  [0166] Referring to FIG. 12, in operation 1201, the electronic device (for example, the electronic device 101, 201, or 500) may output audio data of a content which is mixed with notification information of an audio form. For example, the processor 502 of the electronic device 502 may mix a baby's crying corresponding to generation of a notification event and audio data of a content, and output the content. [0172] Referring to FIG. 13, in operation 1301, the electronic device (for example, the electronic device 101, 201, or 500) may output audio data of a content which is mixed with notification information of an audio form. For example, the processor 502 of the electronic device 500 may control to mix a doorbell sound corresponding to generation of a notification event and music of a content, and output the content.).
Park et al. do not explicitly teach: wherein the controller is further configured to: provide the audio data to a voice recognition model for generating text data based on the audio data and determine whether the audio data corresponds to the trigger sound based on the text data, and wherein the controller is further configured to: acquire intention analysis information about the text data, and determine whether the audio data corresponds to the trigger sound based on the intention analysis information.
Howard et al. teach: provide the audio data to a voice recognition model for generating text data based on the audio data and determine whether the audio data corresponds to the trigger sound (hotword from acoustical properties corresponding to a question) based on the text data ([0005] The speech recognizer can obtain utterance information spoken by a user and generate a transcription of the spoken utterance from the user. The transcription representation generator can receive the transcription of the spoken utterance from the speech recognizer and output transcriptions including word embeddings. [0011] In some implementations, the method further comprises receiving, from a speech recognizer at a word-embedding model, recognizable text corresponding to the utterance; generating, at the word-embedding model, the transcription of the utterance from the recognizable text; and providing, from the word-embedding model, the transcription of the utterance to the classifier. [0027] “In response, the classifier server 108 can provide data, indicating instructions and the obtained utterance, to the automated assistant server 116 over a network such as network 114. The instructions request that the automated assistant server 116 process the obtained utterance and generate a response to the obtained utterance.” [0028] In particular, the automated assistant server 116 can provide an answer to the questions and/or statements provided by the classifier server 108. [0031] The classifier server 108 can detect the hotword from acoustical properties of the spoken question and process the question "what should I wear today?").
Therefore it would have been obvious to one of the ordinary skilled in the art before the effective filling date of the invention was made for Park et al. to include the teaching of Howard et al. in order for the automated assistant server to understand and response to a user question.
Park et al. in view of Howard et al. do not explicitly teach: acquire intention analysis information about the text data, and determine whether the audio data corresponds to the trigger sound based on the intention analysis information.
Liu et al. teach: acquire intention analysis information about the text data, and determine whether the audio data corresponds to the trigger sound based on the intention analysis information ([0035] For example, automatic speech recognition (ASR) logic 205 may be configured to convert input audio into text. When needed, the automatic speech recognition (ASR) logic 205 may convert the input 204 into text that the virtual agent 202 can understand. In order to convert an audio recording to text, the ASR logic 205 may utilize an acoustic model, an ASR class-based language model, and a recognition lexicon model. [0039] For example, a natural language understanding module 223 may take the input 204 and determine what the user is asking about, e.g. is the correct response a time, a place, a meeting request, a reservation, etc. It might use a classifier that classifies the intent of the request. That classifier might be trained on social network data. [0041] The dialog module 225 may use the natural language understanding module 223 to determine what kind of information the user is looking for based on the intent. [0044] Once the dialog module 225 has all the information it needs to respond to a query or act on a request, the dialog module 225 may call on the natural language generator 224 to form the responses that answer the original inquiry or request. [0069] The logic 400 may apply speech recognition and natural language understanding to determine the intent of the request at bock 404. If the request is received as audio data, the virtual agent may use ASR logic 205 and speech recognition module 221 to convert the request to text. Once in text form, the virtual agent may use the dialog module 225 and the natural language understanding module 223 to determine an intent of the request.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Park et al. in view of the Howard et al. to include the teaching of Liu et al. in order to respond to a query or act on a request by a user.

Regarding Claim 20, Park et al. teaches: The device of claim 19, wherein the trigger sound corresponds to a name of the user of the device ([0149] Herein, the notification event detection variable may include a predetermined name and a predetermined title (for example, mother or baby) for causing a notification to be generated.). 

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. in view of  in view of  Howard et al. further in view of Liu et al. and further in view of Ahn (US 2018/0255015 A1).
Regarding Claims 5 and 15, Park et al. teaches:  generate the vibration corresponding to the notification event (baby's crying) three times ([0169] According to various embodiments of the present disclosure, the electronic device may generate a vibration of a vibration pattern or a vibration intensity corresponding to the notification event while the content is being reproduced. For example, when baby's crying is detected while the content is being reproduced, the processor 502 may generate the vibration corresponding to the baby's crying three times.).
Park et al. in view of Howard et al. further in view of Liu et al. do not specifically teaches: control the outputter to stop outputting the notification when a predetermined amount of time passes after the notification starts to be output through the outputter.
Ahn teaches: “control the outputter to stop outputting the notification when a predetermined amount of time passes after the notification starts to be output through the outputter( [0023] In another embodiment, the controller may sequentially terminate the output of the plurality of notification icons, in response to a lapse of a preset time. [0213] Specifically, the icons may be output at the same time as those events occur. At this time, vibration or notification sound notifying the occurrence of the events may be output together with the icons.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Park et al. in view of Howard et al. further in view of Liu et al. to include the teaching of “control the outputter to stop outputting the notification when a predetermined amount of time passes after the notification starts to be output through the outputter”, according to the teaching of Ahn above in order to terminate notification due to a lapse of a preset time. 

Claims 8 and 18, are  rejected under 35 U.S.C. 103 as being unpatentable over Park et al in view of Howard et al. further in view of Liu et al. further in view of Park et al.(US 2014/0300466 A1) herein referred as Park1.
Regarding Claims 8 and 18, Park et al. teaches: determine whether the audio data corresponds to the trigger sound based on matching the captured ambient sound with the stored trigger event in the database (See rejection of claim 1).
Park et al. in view of Howard et al. further in view of Liu et al. do not explicitly teaches:  provide the audio data to a situation recognition model for generating situation information, and determine whether the audio data corresponds to the trigger sound based on the situation information.
Park1 teaches: provide the audio data to a situation recognition model for generating situation information, and determine whether the audio data corresponds to the trigger sound based on the situation information ([0012] Accordingly, an aspect of the present disclosure is to provide an apparatus and a method for preventing an accident in a portable terminal by identifying and/or determining a dangerous object and/or a dangerous state through an analysis of environmental situation information obtained from a sensor and notifying a user using the mobile terminal. [0030] The control unit 120 drives the sensor unit 100 in an execution state of an application or a waiting state, and/or according to a user input. The control unit 120 obtains environmental situation information using the sensor unit 100, and detects a dangerous object and/or dangerous state by analyzing the environmental situation information. The control unit 120 extracts a control service or notice service mapped on the dangerous object and/or dangerous state if the detected dangerous object and/or dangerous state is determined, wherein the dangerous object and/or dangerous state may inflict an injury on a user, and the control unit 120 executes the extracted control service or notice service. [0034] Further, the control unit 120 may detect a dangerous object by analyzing a sound obtained from the sound detector 106, such as a horn from vehicles including a car, a bicycle, a train, a subway train, and an ambulance, and sounds such as a fire emergency bell, a burglar alarm, a shouting sound, and an animal's howling sound. The control unit 120 may detect dangerous states such as an impact, a collision, and a drop of the portable terminal, by analyzing a movement of the portable terminal obtained from the motion sensor 104. The control unit 120 may detect a dangerous state of the surroundings of the portable terminal by analyzing a temperature value obtained from the temperature sensor 108. The control unit 120 may detect an approach of a person and/or object if the person and/or object are detected by the proximity sensor. The control unit 120 may determine that the environmental situation is harmful to a person if a pollutant concentration detected by the environmental sensor is higher than a predetermined value. [0035] The control unit 120 may determine levels corresponding to the dangerous object and/or dangerous state, and may differentiate a level of notifying and alarming to a user according to the levels of dangerous object and/or dangerous state. [0036] Namely, the control unit 120 may give an urgent alarm to a user, or may switch off the portable terminal, if the level of danger is determined to be very high. Alternatively, the control unit 120 may output a notice displayed on a screen, such as a moving images or graphics, an alarm sound, a vibration, and switching on a light, if the level of danger is relatively low, or in other words, if the user may easily recognize a danger. [0043] In particular, the storage unit 130, according to an embodiment of the present disclosure, may include an environmental situation information database 131, a notice service database 132, and a control service database 133. The environmental situation information database may include predictable environmental situation information stored in a manufacturing process of the portable terminal, and data of dangerous states and dangerous objects.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Park et al. in view of Howard et al. further in view of Liu et al. to include the teaching of “provide the audio data to a situation recognition model for generating situation information, and determine whether the audio data corresponds to the trigger sound based on the situation information”, according to the teaching of Park1 et al. above in order to detect a dangerous object and/or dangerous state by analyzing the environmental situation information. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Taite et al. (US 9609419 B2) teach: (Abstract) A notification decision module (304) determines whether to notify a user of the event. A notification module (306) notifies the user of the event based on determination. The notification decision module obtains identity of a speaker of a spoken word. The notification decision module determines whether the speaker is associated with the user. The notification decision module determines to notify the user of the event based on whether the speaker is associated with the user.
Pate et al. (US 10206043 B2) teach: (Abstract) The method involves receiving an ambient audio signal from a local environment and a remote environment. An initial audio signal is detected from within the received ambient audio signal by a processor circuit (120). Determination is made that the initial audio signal corresponds to a type of audio source from a set of types of audio sources. Determination is made that the initial audio signal meets a metric for pass-through to a user of a wearable audio device (100) using the processor circuit. The initial audio signal is isolated using the processor circuit based on filtering the ambient audio signal in response to determining that the initial audio signal meets the metric. Playback of the isolated initial audio signal is adjusted for the user through a speaker (110).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656