Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2018-002163, filed on 01/01/2018.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/28/2020 is being considered by the examiner.
Drawings
The drawing submitted on 07/01/2020 is considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “output control unit” in claims 1-14, and 16-18, “a speech input unit”; “an output control unit”, “a speech output unit”; “an utterance intention analysis unit”, in claim 15.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification  ([0027], [0030] [0045], speech input unit 101, output control unit 110, a speech output unit 123, utterance semantic analysis unit 104) as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.




Claims 1, 3-6, and 9-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated  by Ohmura (EP 3279791 A1 used as a translated version of WO 2016/158792.).

Regarding Claims 1, 16, and 18, Ohmura teaches: An information processing apparatus comprising (Fig.2, information processing device 1, [0022]) : an output control unit (Fig.2, output control unit 10f of control unit 10) configured to execute volume control of system utterance (response volume) on a basis of a combination of a user distance, the user distance being a distance (position of the user with respect to the information processing device 1) from the information processing apparatus to a user([0024] The speech recognition unit 10a recognizes a voice of a user collected by the microphone 12 of the information processing device 1, converts the voice into a character string, and acquires a speech text. [0025] By using natural language processing or the like the semantic analysis unit 10b performs semantic analysis on the speech text acquired by the speech recognition unit 10a. A result of the semantic analysis is output to the response generation unit 10c. [0026] The response generation unit 10c generates a response to the speech from the user on the basis of the semantic analysis result. [0034] The level calculation unit 10d may calculate the allowable output level in accordance with a position of the user with respect to the information processing device 1. For example, in the case where the user is in the vicinity of the information processing device 1, the level calculation unit 10d may calculate a low allowable output level since the user can hear an output response even when the audio volume of the output response is lowered. For example, the position of the user with respect to the information processing device 1 may be acquired by the ranging sensor 15 provided in the information processing device 1.), and user utterance volume, and the user utterance volume being calculated on a basis of user utterance input by the information processing apparatus ([0033] The level calculation unit 10d may calculate the allowable output level in accordance with appearance of the user. For example, when the user speaks in low voices, it is expected that the voice UI responds in low voices too, and it is determined that the user is in an environment in which a loud voice is not appropriate. Therefore, the level calculation unit 10d calculates a low allowable output level. Note that, it may be determined whether the voice is low by comparing with a usual voice volume of a speech from the user, or on the basis of behavior such as covering his/her mouse with hand. [0036]  Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0037] For example the decision unit 10e decides the response output method on the basis on the allowable output level calculated by the level calculation unit 10d. [0038] The output control unit 10f performs control such that a response generated by the response generation unit 10c is output in accordance with the response output method decided by the decision unit 10e.).

Regarding Claim 3, Ohmura teaches: The information processing apparatus according to claim 1, wherein the output control unit executes control to decrease a volume level of the system utterance in a case where the user utterance volume is lower than ordinary volume corresponding to the user distance (See rejection of claim 1, [0033]).

Regarding Claim 4, Ohmura teaches: The information processing apparatus according to claim 1, wherein the output control unit executes the volume control of the system utterance depending on a volume level of an ambient sound other than the user utterance and executes control to make a volume level of the system utterance higher than the volume level of the ambient sound (See rejection of claim 1 and [0030] The level calculation unit 10d may calculates the allowable output level in accordance with existence of an external sound source…In addition, in the case of an important response, the level calculation unit 10d may calculate a high allowable calculation level although there is an external sound source, such that audio volume of the response output is raised to prevent the response from being washed out by the external sound source. ).

Regarding Claim 5, Ohmura teaches: The information processing apparatus according to claim 4, wherein the output control unit executes control to maintain a difference (response output is raised) between the volume level of the system utterance and the volume level of the ambient sound to be approximately constant(See rejection of claim 1 and [0030] The level calculation unit 10d may calculates the allowable output level in accordance with existence of an external sound source…In addition, in the case of an important response, the level calculation unit 10d may calculate a high allowable calculation level although there is an external sound source, such that audio volume of the response output is raised to prevent the response from being washed out by the external sound source. Note: it is inherent based on the above teaching that the difference between the external sound volume and the raised response volume to be constant output for response volume not to be washed out by the external sound volume (if higher than the response volume) in order for the user to hear response. Further Ohmura also teaches in [0033] “In addition, in the case where the user explicitly designates the allowable output level…the level calculation unit 10d may calculated the allowable output level on the basis of the designated allowable output level.”  Therefore it is also inherent for the user to setup ([0029]) or the system to setup automatically based on the environmental factors ([0028] ) to set up higher output response volume level to overcome the environment noise volume level in order to hear the response.).

Regarding Claim 6, Ohmura teaches:  The information processing apparatus according to claim 1, wherein the output control unit controls a volume level of the system utterance in response to a user request (See rejection of claim 1 and [0029] The allowable output levels according to time slots may be set in advance by a user or a system. [0033] “In addition, in the case where the user explicitly designates the allowable output level by himself/herself through a voice command, gesture, device operation…the level calculation unit 10d may calculated the allowable output level on the basis of the designated allowable output level.”).

Regarding Claim 9, Ohmura teaches: The information processing apparatus according to claim 1, wherein the output control unit executes the volume control of the system utterance corresponding to a time zone (See rejection of claim 1 and [0029] The level calculation unit 10d may calculate the allowable output level in accordance with a time slot. For example, in the case of night…the level calculation unit 10d calculates a low allowable output level.).

Regarding Claim 10, Ohmura teaches: The information processing apparatus according to claim 1, wherein the output control unit executes control to output contents of the system utterance to a display unit in a case where a volume control value of the system utterance reaches a predefined maximum or minimum allowable value (See rejection of claim 1 and [0027] The level calculation unit 10d calculates an allowable output level of a response on the basis of a current surrounding environment…For example, the allowable output level is calculated to be high in an environment in which output using voice is preferable, and the allowable output level is calculated to be low in an environment in which output using voice is not preferable, but output using display is preferable. In addition, in the environment in which output using display is preferable, the allowable output level is calculated to be more lower if it is desirable to limit a display content, select a display device, or limit brightness in accordance with a surrounding environment of the user.  [0029] For example, in the case of night, it is necessary to respect neighboring houses and people who are sleeping. Therefore, the level calculation unit 10d calculates a low allowable output level. The allowable output levels according to time slots may be set in advance by a user or a system. [0031] The level calculation unit 10d may calculate the allowable output level in accordance with the surrounding of a user who is a target (in other words, user environment). For example, when a person (including a baby) is sleeping near the user, such a situation is in an environment in which output using voice is not preferable. Therefore, the level calculation unit 10d calculates a low allowable output level such that the output method is switched to “display”.).

Regarding Claim 11, Ohmura teaches: The information processing apparatus according to claim 1, wherein the output control unit acquires context information (context)  of a space (may acquire information on surroundings of the user) where the user is present and executes the volume control of the system utterance based on the context information (context) (See rejection of claim 10 and [0031] The level calculation unit 10d may calculate the allowable output level in accordance with the surrounding of a user who is a target (in other words, user environment). For example, when a person (including a baby) is sleeping near the user, such a situation is in an environment in which output using voice is not preferable. Therefore, the level calculation unit 10d calculates a low allowable output level such that the output method is switched to “display”. [0032] The microphone 12, camera 14, and the like provided on the information processing device 1, may acquire surroundings of the user. [0049] Specifically the information processing device 1, may acquire existence of external sound source, a position, a state or the like of a person near the information processing device 1(including user) by using the microphone 12, the camera 14, or the ranging sensor 15. ).

Regarding Claim 12, Ohmura teaches: The information processing apparatus according to claim 11, wherein the context information (context) includes at least one of a type of sound (sound source as a TV, a radio, a music loudspeaker, construction noise and the like) detected from a space where the user is present, a number of persons in the space where the user is present (sound source includes, conversation between people), or atmosphere of the space where the user is present  (See rejection of claim 11 and [0030]).

Regarding Claim 13, Ohmura teaches: The information processing apparatus according to claim 1, wherein the output control unit acquires a reference value (parameters)  that is an optimal volume level of the system utterance corresponding to the user distance ([0034] ), the user utterance volume ([0033]), and ambient volume ([0030]) from a storage unit (storage unit 17) to execute the volume control based on the reference value (See rejection of claim 12 and [0036] The level calculation unit 10d calculates an allowable output level that is appropriate to a current surrounding environment on the basis of at least one or more of the above describes factors. Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0038] The output control unit 10f performs control such that a response generated by the response generation unit 10c is output in accordance with the response output method decided by the decision unit 10e. [0045] The storage unit 17, stores programs for causing the respective structural elements in the information processing device 1 to function. In addition, the storage unit 17 stores various parameters and various thresholds. The various parameters are used when the level calculation unit 10d calculates an allowable output level. The various thresholds are used when the decision unit 10e decides an output method in accordance with the allowable output level. [0049] Specifically the information processing device 1, may acquire existence of external sound source, a position, a state or the like of a person near the information processing device 1(including user) by using the microphone 12, the camera 14, or the ranging sensor 15.).

Regarding Claim 14, Ohmura teaches: The information processing apparatus according to claim 13, wherein the reference value (a feature (visually impaired, hearing-impaired or the like) is a user-specific reference value (See rejection of claim 13 and [0035] For example, in the case of a hearing-impaired user, an elder, or a person who asks again many time…a response is output by display instead of voice…On the other hand, in the case of a user with bad eyesight or a user without glasses…a response is output by voice instead display. For example, information on a physical characteristic of a user that is used in the case where the accessibility is considered may be acquired from the storage unit 17. [0045] The storage unit 17, stores programs for causing the respective structural elements in the information processing device 1 to function. In addition, the storage unit 17 stores various parameters and various thresholds. The various parameters are used when the level calculation unit 10d calculates an allowable output level. The various thresholds are used when the decision unit 10e decides an output method in accordance with the allowable output level. In addition, the storage unit 17 stores registration information of users. The registration information of a user includes…a feature (visually impaired, hearing-impaired, or the like.) connection information regarding a communication terminal held by the user or the like.).

Regarding Claim 15 and 17, Ohmura teaches: An information processing system comprising: a user terminal (Fig.2, [0022]  information processing device 1 ); and a data processing server (Fig.2, storage unit 17 ([0045]) or predetermined server on the network ([0026], [0039], [0049]), wherein the user terminal includes a speech input unit (a microphone 12) configured to input user utterance ([0024] The speech recognition unit 10a recognizes a voice of a user collected by the microphone 12 of the information processing device 1.), an output control unit (Fig.2, control unit 10f) configured to execute volume control of system utterance ([0026] The response generation unit 10c generates a response to the speech from the user on the basis of the semantic analysis result. [0034] The level calculation unit 10d may calculate the allowable output level in accordance with a position of the user with respect to the information processing device 1. [0036]  Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0037] For example the decision unit 10e decides the response output method on the basis on the allowable output level calculated by the level calculation unit 10d. [0038] The output control unit 10f performs control such that a response generated by the response generation unit 10c is output in accordance with the response output method decided by the decision unit 10e. ), and a speech output unit (Fig.2, Loudspeaker 13) configured to output the system utterance ([0041] The loudspeaker 13 has a function of converting the sound signal into a voice and outputting the voice under the control of the output control unit 10f.), the data processing server (Fig.2, control unit 10) includes an utterance intention analysis unit (semantic analysis unit 10b) configured to analyze intention of the user utterance received from the user terminal ([0025] By using natural language processing or the like the semantic analysis unit 10b performs semantic analysis on the speech text acquired by the speech recognition unit 10a.), the user terminal outputs the system utterance depending on the intention of the user utterance through the speech output unit ([0026] The response generation unit 10c generates a response to the speech from the user on the basis of the semantic analysis result. [0041] The loudspeaker 13 has a function of converting the sound signal into a voice and outputting the voice under the control of the output control unit 10f.), and the output control unit of the user terminal executes volume control of the system utterance on a basis of a combination of a user distance and a user utterance volume, the user distance being a distance from the user terminal to a user, and the user utterance volume being calculated on a basis of user utterance input through the speech input unit ([0033] The level calculation unit 10d may calculate the allowable output level in accordance with appearance of the user. For example, when the user speaks in low voices, it is expected that the voice UI responds in low voices too, and it is determined that the user is in an environment in which a loud voice is not appropriate. Therefore, the level calculation unit 10d calculates a low allowable output level. Note that, it may be determined whether the voice is low by comparing with a usual voice volume of a speech from the user, or on the basis of behavior such as covering his/her mouse with hand. [0034] The level calculation unit 10d may calculate the allowable output level in accordance with a position of the user with respect to the information processing device 1. For example, in the case where the user is in the vicinity of the information processing device 1, the level calculation unit 10d may calculate a low allowable output level since the user can hear an output response even when the audio volume of the output response is lowered. For example, the position of the user with respect to the information processing device 1 may be acquired by the ranging sensor 15 provided in the information processing device 1. [0036]  Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0037] For example the decision unit 10e decides the response output method on the basis on the allowable output level calculated by the level calculation unit 10d. [0038] The output control unit 10f performs control such that a response generated by the response generation unit 10c is output in accordance with the response output method decided by the decision unit 10e. [0041] The loudspeaker 13 has a function of converting the sound signal into a voice and outputting the voice under the control of the output control unit 10f.).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Basye et al.(US 2016/0379638 A1).

Regarding Claim 2, Ohmura teaches: The information processing apparatus according to claim 1, wherein the output control unit executes control to lower a volume level of the system utterance in a case where the user utterance volume is lower than ordinary volume corresponding to the user distance (See rejection of claim 1 and [0033] The level calculation unit 10d may calculate the allowable output level in accordance with appearance of the user. For example, when the user speaks in low voices, itis expected that the voice UI responds in low voices too, and it is determined that the user is in an environment in which a loud voice is not appropriate. Therefore, the level calculation unit 10d calculates a low allowable output level. Note that, it may be determined whether the voice is low by comparing with a usual voice volume of a speech from the user, or on the basis of behavior such as covering his/her mouse with hand. [0030] The level calculation unit 10d may calculates the allowable output level in accordance with existence of an external sound source…In addition, in the case of an important response, the level calculation unit 10d may calculate a high allowable calculation level although there is an external sound source, such that audio volume of the response output is raised to prevent the response from being washed out by the external sound source.).
Ohmura do not explicitly teaches: wherein the output control unit executes control to increase a volume level of the system utterance in a case where the user utterance volume is higher than ordinary volume corresponding to the user distance. 
Basye et al. teach: wherein the output control unit executes control to increase a volume level of the system utterance in a case where the user utterance volume is higher than ordinary volume corresponding to the user distance ([0049] The speech quality may be based on paralinguistic metrics that describe some quality/feature other than the specific words spoken. Paralinguistic features may include acoustic features such as speech tone/pitch, rate of change of pitch (first derivative of pitch), speed, prosody/intonation, resonance, energy/volume, hesitation, phrasing, nasality, breath, whether the speech includes a cough, sneeze, laugh or other non-speech articulation (which are commonly ignored by ASR systems), detected background audio/noises, distance between the user and a device, etc. [0064] Specifically, if a user shouts, in an excited manner, “PLAY SOME MUSIC!!” the speech quality detector 220 may send an indicator to the command processor that the speech had a quality of excitement and the NLU module 260 may send the command processor 290 text and semantic indicators that the utterance included a request to play music. The command processor 290 may then select a music title to play based on the quality of excitement and may thus select a rock song or similar up-tempo song from a user's catalog. In another example, if a user whispers “play some music,” the speech quality detector 220 may send an indicator to the command processor that the speech was whispered and the NLU module 260 may send the command processor 290 text and semantic indicators that the utterance included a request to play music. The command processor 290 may then select a music title to play based on the quality of being whispered and may thus select a mellow or calm song from a user's catalog. Similar selections of actions by different command processors 290 outside the domain of music are also envisioned. As another example, volume of output may be decreased as a result of whispered input speech, or volume increased as a result of excited speech, or the like. As another example, volume of output may be increased if a user is determined to be a long distance away from a device, thus ensuring that the output is loud enough for the user to hear at the user's distance. [0096] The microphone 104 may be configured to capture speech including an utterance. The device 110 (using microphone 104, ASR module 250, etc.) may be configured to determine audio data corresponding to the utterance.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ohmura to include the teaching of Basye et al. above in order to ensuring that the output is loud enough for the user to hear at the user's distance.
Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Ohmura  further in view of Nandy et al.(US 2019/0180740 A1.

Regarding Claim 7, Ohmura teach: The information processing apparatus according to claim 1, wherein the output control unit executes volume control of  the system utterance (See rejection of claim 1).
Ohmura does not teach: The information processing apparatus according to claim 1, wherein the output control unit executes volume control of music that is a system output other than the system utterance and executes control to make a volume level of the system utterance higher than a volume level of the music.
Nandy et al. teach: wherein the output control unit executes volume control of music that is a system output other than the system utterance and executes control to make a volume level of the system utterance higher than a volume level of the music ([0054] As an example, the user 104 may issue a command of “Please schedule lunch with Fred for next Tuesday,” as the user 104 is having a telephone conversation with their contact named Fred. The remote system 120 may return an instruction to the user device 108 to schedule the appointment on the user's 104 calendar for the requested date, and further send back TTS audio response data which includes machine-generated words responsive to the speech command, such as “We have scheduled your lunch appointment with Fred.” [0055] Upon receiving the TTS response audio data, the vehicle computing device 110 may output the machine-generated words represented by the TTS response audio data on loudspeakers of the vehicle. In some examples, the vehicle computing device 110 may simultaneously be outputting sound represented by call audio data received from the contact's user device. In such examples, the vehicle computing device 110 may be configured to mute, or attenuate (e.g., lower volume) the call audio data while the TTS response audio data is output by the vehicle computing device loudspeakers. Further, the voice-enabled device 106 may be configured with components for performing echo cancellation. Using these echo cancellation components, the voice-enabled device 106 may cancel, or filter out, the sound corresponding to the TTS response audio data to prevent call audio data sent to the contact's user device from including the sound corresponding to the TTS response audio data. The techniques described above with respect to simultaneously communicating and outputting call audio data along with TTS response audio data may similarly be applied to situations where, instead of call audio data, music audio data (or other audio data) is being streamed. For instance, the techniques may similarly apply when music audio data is streamed from the user device 108 to the voice-enabled device 106 using the A2DP network 142, and sent to the vehicle computing device 110 to be output by the vehicle computing device 110 while speech utterances 112 are detected after a wake word are communicated, and TTS response audio data is communicated to be output by the vehicle computing device 110.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ohmura to include the teaching of Nandy et al. above in order to simultaneously communicating and outputting music or call audio data along with TTS response audio data.

Regarding Claim 8, Ohmura teach: The information processing apparatus according to claim 1, wherein the output control unit executes volume control of  the system utterance (See rejection of claim 1).
Ohmura does not teach: wherein the output control unit executes volume control of a volume level of the system utterance, a volume level of ordinary music, and a volume level of BGM music, and executes control to make the volume level of the system utterance higher than the volume level of the ordinary music and to make the volume level of the ordinary music higher than the volume level of the BGM music.
Nandy et al. teach: wherein the output control unit executes volume control of a volume level of the system utterance, a volume level of ordinary music, and a volume level of BGM music, and executes control to make the volume level of the system utterance higher than the volume level of the ordinary music (mute or attenuate music while outputting TTS response) and to make the volume level of the ordinary music higher than the volume level of the BGM  music (while mute or attenuate music, filter out echo (BGM (background music) and TTS response) ([0054] As an example, the user 104 may issue a command of “Please schedule lunch with Fred for next Tuesday,” as the user 104 is having a telephone conversation with their contact named Fred. The remote system 120 may return an instruction to the user device 108 to schedule the appointment on the user's 104 calendar for the requested date, and further send back TTS audio response data which includes machine-generated words responsive to the speech command, such as “We have scheduled your lunch appointment with Fred.” [0055] Upon receiving the TTS response audio data, the vehicle computing device 110 may output the machine-generated words represented by the TTS response audio data on loudspeakers of the vehicle. In some examples, the vehicle computing device 110 may simultaneously be outputting sound represented by call audio data received from the contact's user device. In such examples, the vehicle computing device 110 may be configured to mute, or attenuate (e.g., lower volume) the call audio data while the TTS response audio data is output by the vehicle computing device loudspeakers. Further, the voice-enabled device 106 may be configured with components for performing echo cancellation. Using these echo cancellation components, the voice-enabled device 106 may cancel, or filter out, the sound corresponding to the TTS response audio data to prevent call audio data sent to the contact's user device from including the sound corresponding to the TTS response audio data. The techniques described above with respect to simultaneously communicating and outputting call audio data along with TTS response audio data may similarly be applied to situations where, instead of call audio data, music audio data (or other audio data) is being streamed. For instance, the techniques may similarly apply when music audio data is streamed from the user device 108 to the voice-enabled device 106 using the A2DP network 142, and sent to the vehicle computing device 110 to be output by the vehicle computing device 110 while speech utterances 112 are detected after a wake word are communicated, and TTS response audio data is communicated to be output by the vehicle computing device 110. Note: echo cancellation will filter out attenuated BGM and TTS response from being input to voice enable device.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ohmura to include the teaching of Nandy et al. above in order to simultaneously communicating and outputting music or call audio data along with TTS response audio data.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Yoshida et al.(US 2006/0074684 A1) teach: An on-vehicle acoustic control system determines which one of sounds of an audio device and a navigation device should be generated with priority, when both devices are requested to generate respective sounds. The control system further detects a user's physical condition based on an interaction with the user, a picture of the user and biometric information of the user. The control system generates sound in the order of determined priority, and varies the manner of sound generation based on the user's physical condition.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656