Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2019-0102695, filed on 08/21/2019.
Drawings
The drawing submitted on 08/30/2019 is being considered by the examiner.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a communication unit configured to” in claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 5-7, and 12-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Jang et al.(US 2013/0073293 A1).

Regarding Claim 1, Jang et al. teach:  An artificial intelligence apparatus(first electronic device 100  ) for recognizing an utterance voice of a user, the artificial intelligence apparatus comprising: a communication unit  (Fig.5, communication unit 110) configured to communicate with at least one external artificial intelligence apparatus (Other electronic devices (for example, the tablet PC 10a)) which obtains first sound data (data including time of voice command was entered along with voice command, received by other electronic devices (for example, the tablet PC 10a)) including the utterance voice (voice command)of the user to generate a first speech recognition result from the first sound data ([0078] In addition, the communication unit 110 generally includes one or more components allowing radio communication between the electronic device 100 and a communication system or a network in which the electronic device is located. [0110] The voice recognition unit 182 can carry out voice recognition upon voice signals input through the microphone 122 of the electronic device 100 or the remote control 10 and/or the mobile terminal shown in FIG. 1; the voice recognition unit 182 can then obtain at least one recognition candidate corresponding to the recognized voice. [0127] Referring to FIGS. 6 and 7, the first electronic device 100 receives a user's voice command in the device environment as shown in FIG. 6 (S100). For example, the TV 100 receives a voice command saying "next channel" from the user. Other electronic devices (for example, the tablet PC 10a) than the first electronic device 100, which are connected to the first electronic device over a network, may receive the user's voice command. [0129] Likewise, the other electronic devices connected to the first electronic device 100 via the network may perform the voice recognition process in response to the voice command. For purposes of illustration, the voice command received by the other electronic devices is the same as the voice command received by the first electronic device 100. [0130] Thereafter, the controller 180 of the first electronic device 100 receives a result of the voice recognition for the same voice command as the voice command received from at least one of the other electronic devices connected to the first electronic device 100 through the network (S120). [0133] The voice recognition result received from the other electronic devices includes information on time that the user's voice command was entered. For instance, when the first electronic device 100 receives a first voice command at a first time and the second electronic device 10a receives the first voice command at a second time, there might be a tiny difference in the time of recognizing the voice command in consideration of a distance difference between the two devices.); a microphone configured to obtain second sound data (time and voice command information received by the first electronic device 100) including the utterance voice ([0123] A microphone included in the TV 100 or tablet PC 10a can function as an input means that receives the user's voice command. According to an embodiment, the input means includes a microphone included in the remote controller 50 for controlling the TV 100 or included in the user's mobile phone 10. The remote controller 50 and the mobile phone 10 can perform near-field wireless communication with the TV 100 or the tablet PC 10a.); and a processor (controller 180) configured to: receive first speech recognition results (voice recognition result received from the other electronic devices) from each of the at least one external artificial intelligence apparatus ([0129] Likewise, the other electronic devices connected to the first electronic device 100 via the network may perform the voice recognition process in response to the voice command. [0130] Thereafter, the controller 180 of the first electronic device 100 receives a result of the voice recognition for the same voice command as the voice command received from at least one of the other electronic devices connected to the first electronic device 100 through the network (S120). [0133] The voice recognition result received from the other electronic devices includes information on time that the user's voice command was entered. For instance, when the first electronic device 100 receives a first voice command at a first time and the second electronic device 10a receives the first voice command at a second time, there might be a tiny difference in the time of recognizing the voice command in consideration of a distance difference between the two devices. ); generate a second speech recognition result from the second sound data ([0128] The controller 180 of the first electronic device 100 performs a voice recognition process in response to the received voice command (S110). For purposes of illustration, the voice command received by the other electronic devices is the same as the voice command received by the first electronic device 100.); generate a final speech recognition result (voice command execution with a selected device) for the utterance voice by using the first speech recognition results and the second speech recognition result ([0132] Accordingly, the first electronic device 100 and the second electronic device 10a as shown in FIG. 6 need share the voice recognition results by exchanging the results therebetween.); and perform a control (perform a function) corresponding to the final speech recognition result ([0134] Accordingly, in sharing the voice recognition results between a plurality of devices, time information received from the devices may be taken into consideration. [0136] The controller 180 of the first electronic device 100 selects a device to perform the voice command based on the voice recognition result shared with the other electronic devices (S130). [0140] Accordingly, when the first electronic device 100 is selected to perform the voice command, the controller 180 may enable the first electronic device 100 to directly perform a function corresponding to the voice command.).

Regarding Claim 5, Jang et al. teach:  The artificial intelligence apparatus according to claim 1, wherein the processor is configured to: determine weights ( noise or interference, a magnitude (gain value) of the recognized voice signal, voice recognition ratio of each device, type of content or application in execution by each device upon voice recognition, and remaining power) for each of the first speech recognition results and the second speech recognition result; and generate the final speech recognition result based on the weights ([0088] In addition, the microphone 122 can receive sounds via a microphone in a phone call mode, a recording mode, a voice recognition mode, and the like, and can process such sounds into audio data. The microphone 122 may also implement various types of noise canceling (or suppression) algorithms to cancel or suppress noise or interference generated when receiving and transmitting audio signals. [0110] The voice recognition unit 182 can carry out voice recognition upon voice signals input through the microphone 122 of the electronic device 100 or the remote control 10 and/or the mobile terminal shown in FIG. 1; the voice recognition unit 182 can then obtain at least one recognition candidate corresponding to the recognized voice. For example, the voice recognition unit 182 can recognize the input voice signals by detecting voice activity from the input voice signals, carrying out sound analysis thereof, and recognizing the analysis result as a recognition unit. And the voice recognition unit 182 can obtain the at least one recognition candidate corresponding to the voice recognition result with reference to the recognition dictionary and the translation database stored in the memory 160. [0135] The voice command result received from the other electronic devices may include a magnitude (gain value) of the recognized voice signal, voice recognition ratio of each device, type of content or application in execution by each device upon voice recognition, and remaining power. [0136] The controller 180 of the first electronic device 100 selects a device to perform the voice command based on the voice recognition result shared with the other electronic devices (S130). ).

Regarding Claim 6, Jang et al. teach:  The artificial intelligence apparatus according to claim 5, wherein the processor is configured to: determine an environment variable (noise or interference generated when receiving and transmitting audio signals, a difference in input time between two devices) corresponding to an utterance time point of the user by using the second sound data; and determine the weights based on the environment variable ([0088] In addition, the microphone 122 can receive sounds via a microphone in a phone call mode, a recording mode, a voice recognition mode, and the like, and can process such sounds into audio data. The microphone 122 may also implement various types of noise canceling (or suppression) algorithms to cancel or suppress noise or interference generated when receiving and transmitting audio signals.  [0133] The voice recognition result received from the other electronic devices includes information on time that the user's voice command was entered. For instance, when the first electronic device 100 receives a first voice command at a first time and the second electronic device 10a receives the first voice command at a second time, there might be a tiny difference in the time of recognizing the voice command in consideration of a distance difference between the two devices. [0134] Accordingly, in sharing the voice recognition results between a plurality of devices, time information received from the devices may be taken into consideration. For instance, when a difference in input time between two devices is within a predetermined interval, the controller 180 may determine that the user voice commands have been input at the same time. In contrast, when the difference in input time is more than the predetermined interval, the controller 180 may determine that the voice command input at the first time has been reentered at the second time. The controlling method for an electronic device according to the embodiments of the present disclosure may apply to the former situation.  [0135] The voice command result received from the other electronic devices may include a magnitude (gain value) of the recognized voice signal, voice recognition ratio of each device, type of content or application in execution by each device upon voice recognition, and remaining power.  [0136] The controller 180 of the first electronic device 100 selects a device to perform the voice command based on the voice recognition result shared with the other electronic devices (S130).).

Regarding Claim 7, Jang et al. teach: The artificial intelligence apparatus according to claim 6, wherein the environment variable includes at least one of a noise level, a noise type, an utterance level, or positional relation (distance difference between the two devices) (See rejection of claim 6 for noise in [0088] and a distance difference between the two devices in [0133])

Regarding Claim 12, Jang et al. teach:  The artificial intelligence apparatus according to claim 7, wherein the processor is configured to: determine a distance from each of the artificial intelligence apparatuses to the user based on the positional relation; and increase a weight for each of the artificial intelligence apparatuses as the distance is shorter ([0017] wherein the controller is configured to: identify, for each electronic device included in the group of related electronic devices, a distance from the user; and select the voice command performing device based on the identified distances from the user.  [0133] The voice recognition result received from the other electronic devices includes information on time that the user's voice command was entered. For instance, when the first electronic device 100 receives a first voice command at a first time and the second electronic device 10a receives the first voice command at a second time, there might be a tiny difference in the time of recognizing the voice command in consideration of a distance difference between the two devices. [0163] For instance, in the embodiment described in connection with FIG. 12, voice recognition results shared between the first electronic device 100 and the second electronic device 10a include gains of the received voice signals. [0164] The controller 180 of the first electronic device 100 compares a first gain of a voice signal received by the first electronic device 100 with a second gain received from the second electronic device 10a, and selects one having a smaller gain as performing the voice commands (S133). [0165] Since a distance d1 between the second electronic device 10a and the user is shorter than a distance d2 between the first electronic device 100 and the user, the first electronic device 100 may select the second electronic device 10a as an electronic device conducting the voice commands.).

Regarding Claim 13, Jang et al. teach:  The artificial intelligence apparatus according to claim 7, wherein the processor is configured to: receive a first timestamp for a time point of receiving the first sound data from each of the at least one external artificial intelligence apparatus; obtain a second timestamp for a time point of receiving the second sound data; calculate a reception time difference in the at least one external artificial intelligence apparatus based on the first timestamp and the second timestamp; determine a location of the user based on the reception time difference; and determine the positional relation based on the determined location of the user (see rejection of claim 12).

Regarding Claim 14, Jang et al. teach:   A method for recognizing an utterance voice of a user, the method comprising: receiving first speech recognition results from each of at least one external artificial intelligence apparatus which obtains first sound data including the utterance voice of the user to generate the first speech recognition result from the first sound data; obtaining second sound data including the utterance voice; generating a second speech recognition result from the second sound data; generating a final speech recognition result for the utterance voice by using the first speech recognition results and the second speech recognition result; and performing a control corresponding to the final speech recognition result (See rejection of claim 1).

Regarding Claim 15, Jang et al. teach: A recording medium recorded with a program to perform a method for recognizing an utterance voice of a user, wherein the method includes: receiving first speech recognition results from each of at least one external artificial intelligence apparatus which obtains first sound data including the utterance voice of the user to generate the first speech recognition result from the first sound data; obtaining second sound data including the utterance voice; generating a second speech recognition result from the second sound data; generating a final speech recognition result for the utterance voice by using the first speech recognition results and the second speech recognition result; and performing a control corresponding to the final speech recognition result (See rejection of Claim 1).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2-4, are rejected under 35 U.S.C. 103 as being unpatentable over Jang et al. in view of Jeong et al.(US 2019/0206403 A1).

Regarding Claim 2, Jang et al. teach:  wherein the at least one external artificial intelligence apparatus generates the first speech recognition result from the first sound data corresponding to each of the at least one external artificial intelligence apparatus, and the processor is configured to generate the second speech recognition result from the second sound data (see rejection of claim 1) and a predetermined language model ([0098] Furthermore, the memory 160 may include an audio model, a recognition dictionary, a translation database, a predetermined language model, and a command database which are necessary for the operation of the present disclosure.). 
Jang et al. do not teach: wherein the at least one external artificial intelligence apparatus generates the first speech recognition result from the first sound data by using a first speech recognition model corresponding to each of the at least one external artificial intelligence apparatus, and the processor is configured to generate the second speech recognition result from the second sound data by using a second speech recognition model.
However speech recognition results from sound data by using speech recognition model is very well-known in the art.
Such as Jeong et al. teach: speech recognition results from sound data by using speech recognition model ([0207] The sound apparatus 200 may recognize the control command(s) based on the user voice utterance from the voice data using voice recognition (1060).). [0208] The controller 240 may extract a voice feature from the voice data and compare the voice feature with a previously stored voice model. The controller 240 may determine the meaning of the user voice utterance by comparing the voice feature of the voice data with the voice model.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Jang et al. to include the teaching of Jeong et al. above in order to recognize control command based on the user voice utterance specific to a device by using device specific voice model.  

Regarding Claim 3, Jang et al. do not teach: The artificial intelligence apparatus according to claim 2, wherein the first speech recognition model and the second speech recognition model include an artificial neural network and are learned by using a machine learning algorithm or a deep learning algorithm.
Jeong et al. teach: speech recognition model include an artificial neural network and are learned by using a machine learning algorithm or a deep learning algorithm (0224] The user apparatus 300 may include voice recognition applications that can process the voice data to recognize the contents of the user voice utterance. For example, the voice recognition applications may determine the meaning of the user voice utterance by extracting the voice feature from the voice data and comparing the voice feature with the previously stored voice model. The voice recognition applications may, for example, also recognize user voice utterances using machine learning or deep learning.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Jang et al. to include the teaching of Jeong et al. above in order to recognize control command based on the user voice utterance specific to a device by using device specific voice model.  

Regarding Claim 4, Jang et al. teach: [0125] Hereinafter, a method for controlling an electronic device according to an embodiment of the present disclosure is described with reference to the drawings. Specifically, examples are described where in a system environment involving a plurality of electronic devices, one electronic device conducts a user's voice command. 
Jang et al. do not teach:  The artificial intelligence apparatus according to claim 3, wherein each of the first speech recognition model and the second speech recognition model is learned by using training data corresponding to an application environment.
Jeong et al. teach: speech recognition model is learned by using training data corresponding to an application environment (ambient noise) (See Jeong et al. teaching: [0278] The AI server apparatus 400 may communicate with the receiving apparatus 100 and/or the sound apparatus 200 and may receive the voice data from the receiving apparatus 100 and/or the sound apparatus 200. In addition, the AI server apparatus 400 may perform machine learning and/or deep learning on the voice recognition based on the voice data received from the receiving apparatus 100 and/or the sound apparatus 200. [0280] The AI server apparatus 400 may recognize the control command of the user by performing the voice recognition based on machine learning on the voice data. For example, the AI server apparatus 400 may recognize the plurality of control commands of the user from the voice data using the voice recognition, and may determine a target of the plurality of control commands and the order of the plurality of control commands. In addition, the AI server apparatus 400 may generate a sequence of the control commands based on the target of the plurality of control commands and the order of the plurality of control commands. [0281] The AI server apparatus 400 may process the voice data to recognize the contents of the user voice utterance. For example, the AI server apparatus 400 may extract an interval for the voice recognition from the voice data received from the receiving apparatus 100 and/or the sound apparatus 200, and remove noise included in the voice data. In addition, the AI server apparatus 400 may extract the voice feature from the voice data and compare the voice feature to the previously stored voice model. The AI server apparatus 400 may determine the meaning of the user voice utterance by comparing the voice feature of the voice data with the voice model.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Jang et al. to include the teaching of Jeong et al. above in order to recognize control command based on the user voice utterance specific to a device by using device specific voice model.  

11.	Claim 8, is rejected under 35 U.S.C. 103 as being unpatentable over Jang et al. in view of Bryan et al.(US 2018/0350381 A1).
Regarding Claim 8, Jang et al. teach:  [0088] In addition, the microphone 122 can receive sounds via a microphone in a phone call mode, a recording mode, a voice recognition mode, and the like, and can process such sounds into audio data. The microphone 122 may also implement various types of noise canceling (or suppression) algorithms to cancel or suppress noise or interference generated when receiving and transmitting audio signals.
Jang et al. do not teach:  The artificial intelligence apparatus according to claim 7, wherein the noise type includes at least one of media, a vibration, a conversation, a wind, or a daily life noise.
Bryan et al. teach: wherein the noise type includes at least one of media, a vibration, a conversation, a wind, or a daily life noise ([0003] When using these electronic devices, the user also has the option of using headphones, earbuds, or headset to receive his or her speech. However, a common complaint with these hands-free modes of operation is that the speech captured by the microphone port or the headset includes environmental noise such as wind noise, secondary speakers in the background or other background noises. This environmental noise often renders the user's speech unintelligible and thus, degrades the quality of the voice communication. [0004] Noise suppression algorithms are commonly used to enhance speech quality in modern mobile phones, telecommunications, and multimedia systems. Such techniques remove unwanted background noises caused by acoustic environments, electronic system noises, or similar. Noise suppression may greatly enhance the quality of desired speech signals and the overall perceptual performance of communication systems. [0021] FIG. 1 depicts near-end user using an exemplary electronic device 10 in which an embodiment of the invention may be implemented. The electronic device (or mobile device) 10 may be a mobile communications handset device such as a smart phone or a multi-function cellular phone. The sound quality improvement techniques using double talk detection and acoustic echo cancellation described herein can be implemented in such a user audio device, to improve the quality of the near-end audio signal. In the embodiment in FIG. 1, the near-end user is in the process of a call with a far-end user (not shown) who is using another communications device. [0034] In order to provide directional noise robustness, the BSS 33 included in system 30 accounts for the change in the geometry of the microphone placement relative to the unwanted noisy sounds. The BSS 33 improves separation of the speech and noise in the signals by removing noise from the voicebeam signal and removing voice from the noisebeam signal.)
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Jang et al. to include the teaching of Bryan et al. above in order to recognize control command based on the user voice utterance by improving noise reduction ([0001]).  

Claim 9, is rejected under 35 U.S.C. 103 as being unpatentable over Jang et al. in view of Bryan et al. further in view of Soto (US 2020/0213729 A1).
Regarding Claim 9 Jang et al.  in view of Bryan et al. do not teach: The artificial intelligence apparatus according to claim 8, wherein the processor is configured to: increase weights for a TV, a radio, and a speaker if the noise type includes the media; increase weights for a refrigerator and a washing machine if the noise type includes the vibration; and increase weights for a clearer, an air purifier, a fan, and an air conditioner if the noise type includes the wind.
Soto teach: in response to classifying noise in the detected sounds, the gain applied to the sound data during processing can be adjusted up or down to improve voice detection ([0136] In operation, NMDs (network microphone devices) can be exposed to a variety of different types of noise, such as traffic, appliances (e.g., fans, sinks, refrigerators, etc.), construction, interfering speech, etc. To better analyze captured audio input in the presence of such noise, it can be useful to classify noises in the audio input. Different noise sources will produce different sounds, and those different sounds will have different associated sound metadata (e.g., frequency response, signal levels, etc.). The sound metadata associated with different noise sources can have a signature that differentiates one noise source from another. Accordingly, by identifying the different signatures, different noise sources can be classified by analyzing the sound metadata. [0137] For example, in response to classifying noise in the detected sounds, the gain applied to the sound data during processing can be adjusted up or down to improve voice detection. In one example, an NMD may detect that a dishwasher is running based on classifying noise in the detected sound data. In response, the NMD may increase the gain or otherwise raise the volume level of audio played back via the NMD. When the NMD detects that the dishwasher is no longer running (e.g., by no longer identifying the classified noise in the detected sound data), the gain levels can be reduced such that playback resumes the previous volume level.).
Therefore in Sot’s teaching above, “increase weights for a TV, a radio, and a speaker if the noise type includes the media; increase weights for a refrigerator and a washing machine if the noise type includes the vibration; and increase weights for a clearer, an air purifier, a fan, and an air conditioner if the noise type includes the wind process” would be obvious.
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Jang et al. in view of Bryan et al. to include the teaching of Soto above in order to improve voice detection in the detected sound data by classifying noise and adjusting gain applied to the sound data during processing according to the classified noise.

Claims10-11, are rejected under 35 U.S.C. 103 as being unpatentable over Jang et al. in view of Soto (US 2020/0213729 A1).
Regarding Claim 10 Jang et al. teach: The artificial intelligence apparatus according to claim 7, wherein the processor is configured to determine the noise from the second sound data ([0088] In addition, the microphone 122 can receive sounds via a microphone in a phone call mode, a recording mode, a voice recognition mode, and the like, and can process such sounds into audio data. The microphone 122 may also implement various types of noise canceling (or suppression) algorithms to cancel or suppress noise or interference generated when receiving and transmitting audio signals.).
Jang et al.  do not teach: The artificial intelligence apparatus according to claim 7, wherein the processor is configured to determine the noise type from the second sound data by using a noise classification model.
Soto teach: in response to classifying noise in the detected sounds, the gain applied to the sound data during processing can be adjusted up or down to improve voice detection ([0136] In operation, NMDs (network microphone devices) can be exposed to a variety of different types of noise, such as traffic, appliances (e.g., fans, sinks, refrigerators, etc.), construction, interfering speech, etc. To better analyze captured audio input in the presence of such noise, it can be useful to classify noises in the audio input. Different noise sources will produce different sounds, and those different sounds will have different associated sound metadata (e.g., frequency response, signal levels, etc.). The sound metadata associated with different noise sources can have a signature that differentiates one noise source from another. Accordingly, by identifying the different signatures, different noise sources can be classified by analyzing the sound metadata. [0137] For example, in response to classifying noise in the detected sounds, the gain applied to the sound data during processing can be adjusted up or down to improve voice detection. In one example, an NMD may detect that a dishwasher is running based on classifying noise in the detected sound data. In response, the NMD may increase the gain or otherwise raise the volume level of audio played back via the NMD. When the NMD detects that the dishwasher is no longer running (e.g., by no longer identifying the classified noise in the detected sound data), the gain levels can be reduced such that playback resumes the previous volume level.).
Therefore in Sot’s teaching above, “to determine the noise type from any sound data by using a noise classification model” would be obvious.
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Jang et al. in view of Soto above in order to improve voice detection in the detected sound data by classifying noise and adjusting gain applied to the sound data during processing according to the classified noise.

Regarding Claim 11: The artificial intelligence apparatus according to claim 10, wherein the noise classification model includes an artificial neural network and is learned by using a machine learning algorithm or a deep learning algorithm (See Soto [0147] Analyzing the sound metadata can include comparing one or more features of the sound metadata with known noise reference values or a sample population data with known noise. For example, any features of the sound metadata such as signal levels, frequency response spectra, etc. can be compared with noise reference values or values collected and averaged over a sample population. In some embodiments, analyzing the sound metadata includes projecting the frequency response spectrum onto an eigenspace corresponding to aggregated frequency response spectra from a population of NMDs (as described in more detail below with respect to FIGS. 10-13). In at least some embodiments, projecting the frequency response spectrum onto an eigenspace can be performed as a pre-processing step to facilitate downstream classification. In various embodiments, any number of different techniques for classification of noise using the sound metadata can be used, for example machine learning using decision trees, or Bayesian classifiers, neural networks, or any other classification techniques. Alternatively or additionally, various clustering techniques may be used, for example K-Means clustering, mean-shift clustering, expectation-maximization clustering, or any other suitable clustering technique.). 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Labsky et al.(Us 9384736 B2) teach: Providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Combining multiple recognizers can be especially beneficial in embedded environments, such as with automotive assistance systems, tablet computers, and cell phones. A local recognizer at a client device can be advantageously combined with more powerful remote recognizers at a server.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656