Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2018-002163, filed on 01/01/2018.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/28/2020 is being considered by the examiner.
Drawings
The drawing submitted on 07/01/2020 is considered by the examiner.
Response to Amendment
Claims 1-5, and 7-18 are currently pending in the application and among them claims 1 and 15-18 are independent claims. Claims 1-5, 7-13 and 15-18, have been amended and claim 6 has been cancelled.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-18 have been considered but are moot in view of new ground of rejection since the amended limitation were not presented and/or rejected previously. 

 

Claim Interpretation
Claims 1-5, and 7-18, interpretation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, has been withdrawn based on the applicant response in the “Remarks” and as well due to amendment of the claims 1-18, dated 05/09/2022.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-5, and 7-18 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Independent claim 1 recites limitation “receive a second user utterance” in line 13; claim 15, recites limitation “a second user utterance” in line 6 and “receive the second user utterance of the user” in line 14;  claim 16, recites “receiving a second user utterance” in line 15; claim 16, recites “receiving a second user utterance” in line 15; claim 17, recites “a second user utterance” in line 7 and “the second user utterance from the user” in line 28; and claim 18, recites “receiving a second user utterance” in line 17, which are not supported by the disclosure as claimed. Eventhough above claims sentence structured are in a very vague and confusing way whether the “second utterance” is from a second user or from the same first user by reciting “a first user utterance” and “a second user utterance” however there is no support whether the second user utterance is from a second user or from the same first user”. Specification describes several embodiment including several different situational based example where a user in each of a different embodiment and in a different situation input speech and each of those situation are described in a separate embodiment describing the situation, however nowhere the support of the claimed limitation of a second user or a second utterance from the same user in anyone of the embodiments is found when the claimed limitation is interpreted either way(See MPEP 2163 I: To satisfy the written description requirement, a patent specification must describe the claimed invention in sufficient detail that one skilled in the art can reasonably conclude that the inventor had possession of the claimed invention. See, e.g., Moba, B.V. v. Diamond Automation, Inc., 325 F.3d 1306, 1319, 66 USPQ2d 1429, 1438 (Fed. Cir. 2003)).
For examination purpose the examiner will interpret the limitation “receiving a second user utterance” as “receiving a second utterance from or of the user” even though there are no support for the limitation. Examiner interpretation would be consistent with the other claims limitation as claimed, such as claims 15 and 17, “receive the second user utterance of the user”.
Claims 2-5, and 7-14 are rejected based on depending on the base claim 1. 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Independent claims 1, 15-18, recites limitation “receiving a second user utterance” which is vague and confusing since the limitation can be interpreted in both ways as 1) the utterance is a second utterance from the same user or 2)  an utterance from a second user, therefore it is indefinite due to failing to particularly and clearly pointing out the claim’s subject matters(See MPEP 2111.01 I, Under a broadest reasonable interpretation (BRI), words of the claim must be given their plain meaning, unless such meaning is inconsistent with the specification. The ordinary and customary meaning of a term may be evidenced by a variety of sources, including the words of the claims themselves, the specification, drawings, and prior art. However, the best source for determining the meaning of a claim term is the specification - the greatest clarity is obtained when the specification serves as a glossary for the claim terms. In re Zletz, 893 F.2d 319, 321, 13 USPQ2d 1320, 1322 (Fed. Cir. 1989); Chef America, Inc. v. Lamb-Weston, Inc., 358 F.3d 1371, 1372, 69 USPQ2d 1857 (Fed. Cir. 2004).).
Eventhough above claims sentence structured are in a very vague and confusing way whether the “second utterance” is from a second user or from the same first user by reciting “a first user utterance” and “a second user utterance” however there is no support whether the second user utterance is from a second user or from the same first user” in order for the examiner to interpret the limitation in light of the disclosure.
For examination purpose the examiner will interpret the limitation “receiving a second user utterance” as “receiving a second utterance from or of the user” even though there are no support for the limitation. Examiner interpretation would be consistent with the other claims limitation as claimed, such as claims 15 and 17, “receive the second user utterance of the user” even though examiner interpretation is not supported by the specification, however is best guessed based on the scope of the invention. 
Claims 2-5, and 7-14 are rejected based on depending on the base claim 1. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3-5, and 9-18 are rejected under 35 U.S.C. 103 as being unpatentable over  Ohmura (EP 3279791 A1).

Regarding Claims 1, 16, and 18, Ohmura teaches: An information processing apparatus comprising (Fig.2, information processing device 1, [0022]) : a central processing unit (CPU) (Fig.2, control unit 10) configured to([0023] The control unit 10 controls respective structural elements of the information processing device 1. The control unit 10 is implemented by a microcontroller including a central processing unit (CPU),… In addition, as illustrated in FIG. 2, the  control unit 10 according to the embodiment also functions as a speech recognition unit 10a, a semantic analysis unit 10b, a response generation unit 10c, a level  calculation unit 10d, a decision unit 10e, and an output control unit 10f.) : calculate a user distance of a user wherein the user distance is a distance (position of the user with respect to the information processing device 1, i.e. user is in the vicinity of the information processing device acquired by the ranging sensor 15 provided in the information processing device 1) from the information processing apparatus to the user  ([0034] The level calculation unit 10d may calculate the allowable output level in accordance with a position of the user with respect to the information processing device 1. For example, in the case where the user is in the vicinity of the information processing device 1, the level calculation unit 10d may calculate a low allowable output level since the user can hear an output response even when the audio volume of the output response is lowered. For example, the position of the user with respect to the information processing device 1 may be acquired by the ranging sensor 15 provided in the information processing device 1.); calculate a user utterance volume of the user based on a first user utterance input( [0033] For example, when the user speaks in low voices it is expected that the voices UI responds in low voices too, and it is determined that the user is in an environment in which a loud voice is not appropriate. Therefore, the level calculation unit 10d calculates a low allowable output level. Note that, it may be determined whether the voice is low by comparing with a usual voice volume of a speech from the user,…); calculate an ambient volume of an ambient sound around the user ([0027] The level calculation unit 10d calculates an allowable output level of a response on the basis of a current surrounding environment…For example, the allowable output level is calculated to be high in an environment in which output using voice is preferable… [0030] The level calculation unit 10d may calculates the allowable output level in accordance with existence of an external sound source…In addition, in the case of an important response, the level calculation unit 10d may calculate a high allowable calculation level although there is an external sound source, such that audio volume of the response output is raised to prevent the response from being washed out by the external sound source. Note: It is inherent that in order to calculate the allowable output level higher than the ambient sound around the user(external sound source), the level calculation unit 10d has to calculate the ambient sound level in order to raise the output level above the ambient sound level in order to prevent output sound level washed out by the external sound level.); control volume of a system utterance based on a combination of the calculated user distance, the calculated user utterance volume, and the calculated ambient volume ([0030] The level calculation unit 10d may calculates the allowable output level in accordance with existence of an external sound source… [0033] The level calculation unit 10d may calculate the allowable output level in accordance with appearance of the user. [0034] The level calculation unit 10d may calculate the allowable output level in accordance with a position of the user with respect to the information processing device 1. [0036]  Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0049] Specifically the information processing device 1, may acquire existence of external sound source, a position, a state or the like of a person near the information processing device 1(including user) by using the microphone 12, the camera 14, or the ranging sensor 15. [0051] Specifically, the level calculation unit 10d calculates an allowable output level indicating whether it is an environment in which response output using voice is preferable (allowed), on the basis of various factors in a system usage environment (such as existence of external sound source, user environment, user behavior, or position of user). ); and execute semantic analysis on the received user utterance input ([0025] By using natural language processing or the like, the semantic analysis unit 10b performs semantic analysis on the speech text acquired by the speech recognition unit 10a. A result of the semantic analysis is output to the response generation unit 10c. [0026] The response generation unit 10c generates a response to the speech from the user on the basis of the semantic analysis result.); and change the controlled volume of the system utterance based on the semantic analysis on the received user utterance input ([0028] In addition, various factors are used to determine the surrounding environment (in other words, system usage environment). Therefore, the level calculation unit 10d determines a current surrounding environment in accordance with at least one or more of the various  factors (to be described below) and calculate an appropriate allowable output level.  [0036]  Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0037] For example the decision unit 10e decides the response output method on the basis on the allowable output level calculated by the level calculation unit 10d. [0038] The output control unit 10f performs control such that a response generated by the response generation unit 10c is output in accordance with the response output method decided by the decision unit 10e.).
Ohmura do not explicitly teach, receive a second user utterance (interpreted as receive a second utterance from the same user at different point of time)  input based on the ambient sound and the controlled volume of the system utterance; execute semantic analysis on the received second user utterance input; and change the controlled volume of the system utterance based on the semantic analysis on the received second user utterance input.
However, it is obvious that the system will function similarly as for the user first input and accordingly for all other user input, i.e. “a second user utterance (interpreted as receive a second utterance from the same user at different point of time) input based on the ambient sound and the controlled volume of the system utterance; execute semantic analysis on the received second user utterance input; and change the controlled volume of the system utterance based on the semantic analysis on the received second user utterance input” as shown above, based on the system analysis of the environmental factors associating with the user and user utterance in that specific environmental situation i.e. system usage environment ([0028]). Further a second utterance of the user input can also be mapped or shown based on Ohmura’s teaching of other situational system responses similarly from another embodiment, i.e. first utterance with loud voice "what will the weather be like tomorrow?" and the second utterance at a different point of time in different environment, when the user speaks with low voices or a lower voice the same or different command. 
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ohmura to include the teaching of “a second user utterance input based on the ambient sound and the controlled volume of the system utterance; execute semantic analysis on the received second user utterance input; and change the controlled volume of the system utterance based on the semantic analysis on the received second user utterance input” in order to provide appropriate response output in accordance with a current surrounding environment ([0007]).

Regarding Claim 2, Ohmura do not explicitly teach: The information processing apparatus according to claim 1, wherein the CPU is further configured to increase the volume level of the system utterance based on the user utterance volume is higher (higher voice volume) than a reference volume (user usual voice volume) corresponding to the user distance
However, the above limitation will be obvious based on the Ohmura teaching below(See rejection of claim 1 and [0028] Therefore, the level calculation unit 10d determines a current surrounding environment in accordance with at least one or more of the various  factors  and calculate an appropriate allowable output level. [0033] For example, when the user speaks in low voices, it is expected that the voice UI responds in low voices too, and itis determined that the user is in an environment in which a loud voice is not appropriate. Therefore, the  level calculation unit 10d calculates a low allowable output level. Note that, it may be determined whether the voice is low by comparing with a usual voice volume of a speech from the user…Note: it is obvious that user voice volume higher or louder would be determined similarly comparing with the user usual voice volume of a speech.  [0034] The level calculation unit 10d may calculate the allowable output level in accordance with a position of the user with respect to the information processing device 1. [0036] The level calculation unit 10d calculates an allowable output level that is appropriate to a current surrounding environment  on the basis of at least one or more of the above described factors. Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0049] Specifically, the information processing device 1 may acquire existence of an external sound source, a position, state, or the like of a person near the information processing device 1 (including user) by using the microphone 12, the camera 14, or the ranging sensor 15. [0054] As described above, the information processing device 1 (voice UI agent function) according to the embodiment outputs a response by voice in the case of a high allowable output level. For example, in the case where an allowable output level is higher than a first threshold, the decision unit 10e of the information processing device 1 decides to use an output method that outputs a response by voice of a usual audio volume toward every direction from the loudspeaker 13. ).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ohmura to include the teaching of “the CPU is further configured to increase the volume level of the system utterance based on the user utterance volume is higher (higher voice volume) than a reference volume (user usual voice volume) corresponding to the user distance” considering all the factors surrounding to the user in order to provide appropriate response output in accordance with a current surrounding environment ([0007]).

Regarding Claim 3, Ohmura teaches: The information processing apparatus according to claim 1, wherein the CPU is further configured to  decrease a volume level of the system utterance in a case where the user utterance volume is lower (low voice volume)  than ordinary volume (user usual voice volume) corresponding to the user distance (See rejection of claim 2, and [0033]).

Regarding Claim 4, Ohmura teaches: The information processing apparatus according to claim 1, wherein the CPU is further configured to increase the system utterance higher than the volume level of the ambient sound (See rejection of claim 2 and [0030] The level calculation unit 10d may calculates the allowable output level in accordance with existence of an external sound source…In addition, in the case of an important response, the level calculation unit 10d may calculate a high allowable calculation level although there is an external sound source, such that audio volume of the response output is raised to prevent the response from being washed out by the external sound source. ).

Regarding Claim 5, Ohmura teaches: The information processing apparatus according to claim 4, CPU is further configured to maintain a difference (response output is raised to prevent the response from being washed out) between the volume of the system utterance and the ambient volume to approximately constant (See rejection of claim 2 and [0030] The level calculation unit 10d may calculates the allowable output level in accordance with existence of an external sound source…In addition, in the case of an important response, the level calculation unit 10d may calculate a high allowable calculation level although there is an external sound source, such that audio volume of the response output is raised to prevent the response from being washed out by the external sound source. Note: it is obvious based on the above teaching that the difference between the external sound volume and the raised response volume to be approximately constant output for response volume not to be washed out by the external sound volume (if higher than the response volume) in order for the user to hear response.).


Regarding Claim 9, Ohmura teaches: The information processing apparatus according to claim 1, wherein the CPU is further configured to control the volume of the system utterance corresponding to a time zone (See rejection of claim 1 and [0029] The level calculation unit 10d may calculate the allowable output level in accordance with a time slot. For example, in the case of night…the level calculation unit 10d calculates a low allowable output level.).

Regarding Claim 10, Ohmura teaches: The information processing apparatus according to claim 1, wherein the CPU is further configured to control a display device to output contents of the system utterance based on the volume of the system utterance reaches on of maximum allowable value or a minimum allowable value (See rejection of claim 1 and [0027] The level calculation unit 10d calculates an allowable output level of a response on the basis of a current surrounding environment…For example, the allowable output level is calculated to be high in an environment in which output using voice is preferable, and the allowable output level is calculated to be low in an environment in which output using voice is not preferable, but output using display is preferable. In addition, in the environment in which output using display is preferable, the allowable output level is calculated to be more lower if it is desirable to limit a display content, select a display device, or limit brightness in accordance with a surrounding environment of the user.  [0029] For example, in the case of night, it is necessary to respect neighboring houses and people who are sleeping. Therefore, the level calculation unit 10d calculates a low allowable output level. The allowable output levels according to time slots may be set in advance by a user or a system. [0031] The level calculation unit 10d may calculate the allowable output level in accordance with the surrounding of a user who is a target (in other words, user environment). For example, when a person (including a baby) is sleeping near the user, such a situation is in an environment in which output using voice is not preferable. Therefore, the level calculation unit 10d calculates a low allowable output level such that the output method is switched to “display”.).

Regarding Claim 11, Ohmura teaches: The information processing apparatus according to claim 1, wherein CPU is further configured to: acquire context information of a space around the user (acquire information on current situation surrounding environment of the user); and control the volume control of the system utterance based on the acquired context information of the space(See rejection of claim 10 and  [0028] Therefore, the level calculation unit 10d determines a current surrounding environment in accordance with at least one or more of the various  factors  and calculate an appropriate allowable output level. [0031] The level calculation unit 10d may calculate the allowable output level in accordance with the surrounding of a user who is a target (in other words, user environment). For example, when a person (including a baby) is sleeping near the user, such a situation is in an environment in which output using voice is not preferable. Therefore, the level calculation unit 10d calculates a low allowable output level such that the output method is switched to “display”. [0032] The microphone 12, camera 14, and the like provided on the information processing device 1, may acquire surroundings of the user. [0049] Specifically the information processing device 1, may acquire existence of external sound source, a position, a state or the like of a person near the information processing device 1(including user) by using the microphone 12, the camera 14, or the ranging sensor 15. ).

Regarding Claim 12, Ohmura teaches: The information processing apparatus according to claim 11, wherein the context information of the space (information on current situation surrounding environment of the user) includes at least one of a type of sound (sound source as a TV, a radio, a music loudspeaker, construction noise and the like) detected from the space, a number of persons in the space, or atmosphere of the space (See rejection of claim 11 and [0030-0031]).

Regarding Claim 13, Ohmura teaches: The information processing apparatus according to claim 1, wherein the CPU is further configured to:  acquire a reference value (parameter)  corresponding to each of the user distance ([0034] ), the user utterance volume ([0033]), and the ambient volume ([0030]), from a storage device (storage unit 17), wherein the acquired reference value is an optimal volume( allowable output level) of the system utterance; and control the volume of the system utterance  based on the acquired reference value (See rejection of claim 12 and [0036] The level calculation unit 10d calculates an allowable output level that is appropriate to a current surrounding environment on the basis of at least one or more of the above describes factors. Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0038] The output control unit 10f performs control such that a response generated by the response generation unit 10c is output in accordance with the response output method decided by the decision unit 10e. [0045] The storage unit 17, stores programs for causing the respective structural elements in the information processing device 1 to function. In addition, the storage unit 17 stores various parameters and various thresholds. The various parameters are used when the level calculation unit 10d calculates an allowable output level. The various thresholds are used when the decision unit 10e decides an output method in accordance with the allowable output level. [0049] Specifically the information processing device 1, may acquire existence of external sound source, a position, a state or the like of a person near the information processing device 1(including user) by using the microphone 12, the camera 14, or the ranging sensor 15.).

Regarding Claim 14, Ohmura teaches: The information processing apparatus according to claim 13, wherein the reference value (a feature (visually impaired, hearing-impaired or the like) is a user-specific reference value (See rejection of claim 13 and [0035] For example, in the case of a hearing-impaired user, an elder, or a person who asks again many time…a response is output by display instead of voice…On the other hand, in the case of a user with bad eyesight or a user without glasses…a response is output by voice instead display. For example, information on a physical characteristic of a user that is used in the case where the accessibility is considered may be acquired from the storage unit 17. [0045] The storage unit 17, stores programs for causing the respective structural elements in the information processing device 1 to function. In addition, the storage unit 17 stores various parameters and various thresholds. The various parameters are used when the level calculation unit 10d calculates an allowable output level. The various thresholds are used when the decision unit 10e decides an output method in accordance with the allowable output level. In addition, the storage unit 17 stores registration information of users. The registration information of a user includes…a feature (visually impaired, hearing-impaired, or the like.) connection information regarding a communication terminal held by the user or the like.).

Regarding Claim 15 and 17, Ohmura teaches: An information processing system comprising (Fig 2, information processing device 1) : a user terminal (Fig. 5, terminal 3, i.e. smartphone or mobile phone [0062]); and a data processing server (Fig.2, control unit 10 or Fig.5, block 1X ), wherein the user terminal includes: a speech input device (Fig.2 microphone 12 or Fig.5, microphone of smartphone or mobile phone) configured to input a first user utterance of a user and second user utterance of the user([0003] it is possible to  execute a process corresponding to an instruction made by voice of a user, by using an application of a voice UI  installed in a smartphone, a tablet terminal, or the like. Note: It is obvious that any input speech after initial i.e. (command to setup volume output level of the system output) or at different point of time (command to find weather condition) could be consider first or second utterance of the user and the UI of smartphone will receive the input accordingly. [033] In addition, in the case where the user explicitly designates the allowable output level by himself/herself through a voice command, gesture, device operation (such as operation of a hardware button (not illustrated), operation of a remote controller (not illustrated), and the like, the level calculation unit 10d may calculate the allowable output level on the basis of the designated allowable output level.); a first central processing unit (CPU) (CPU of mobile or smart phone) configured to control volume of a system utterance (it is inherent that CPU of the smartphone in communication through the install UI in the smartphone or mobile phone will control volume of the system utterance under  the control of response generation unit 10c, level calculation unit 10d, decision unit 10e, output control unit 10f. [033] In addition, in the case where the user explicitly designates the allowable output level by himself/herself through a voice command, gesture, device operation (such as operation of a hardware button (not illustrated), operation of a remote controller (not illustrated), and the like, the level calculation unit 10d may calculate the allowable output level on the basis of the designated allowable output level.) ; and a speech output device (speaker of the mobile or smartphone or loudspeaker 13) configured to output the system utterance based on the controlled volume ([0033] For example, when the user speaks in low voices, itis expected that the voice UI responds in low voices [0036]  Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0037] For example the decision unit 10e decides the response output method on the basis on the allowable output level calculated by the level calculation unit 10d. [0038] The output control unit 10f performs control such that a response generated by the response generation unit 10c is output in accordance with the response output method decided by the decision unit 10e. [0041] The loudspeaker 13 has a function of converting the sound signal into a voice and outputting the voice under the control of the output control unit 10f. [0062] In addition, the decision unit 10e may decide to use a method that also outputs a response from a display screen of the mobile phone terminal, the smart phone, the wearable terminal, or the like held by the user, which enables to output responses by both voice and display.), and the data processing server (Fig.2, control unit 10) includes:  a second CPU (Fig.2, control unit 10 including a CPU) configured to ([0023]The control unit 10 is implemented by a microcontroller including a central processing unit (CPU),…): receive a user distance of the user and the first utterance of the user from the user terminal, wherein the user distance is a distance from the user terminal to the user ([0034] For example, the position of the user with respect to the information processing device 1 may be acquired by the ranging sensor 15 provided in the information processing device 1. [0043] The ranging sensor 15 has a function of measuring a distance between a user and the information processing device 1 and distances between people around the user and the information processing device 1. [0088] The environment recognition unit 10i recognizes a surrounding environment. For example, the environment recognition unit 10i recognizes positions of surrounding users and users around the device (for example, facial recognition) on the basis of an image of surroundings captured by the camera 14 and sensor data obtained by the ranging sensor 15. Note: Information processing device 1, includes the smartphone or the mobile phone terminal as shown in Fig.5. which incorporated the functionality of control unit 10 through installed UI app on the smartphone or mobile phone, i.e. [0003] it is possible to  execute a process corresponding to an instruction made by voice of a user, by using an application of a voice UI  installed in a smartphone, a tablet terminal, or the like.); calculate a user utterance volume of the received firs user utterance([0033] For example, when the user speaks in low voices it is expected that the voices UI responds in low voices too, and it is determined that the user is in an environment in which a loud voice is not appropriate. Therefore, the level calculation unit 10d calculates a low allowable output level. Note that, it may be determined whether the voice is low by comparing with a usual voice volume of a speech from the user,…); analyze user intention of the received firs user utterance (semantic analysis unit 10b) ([0025] By using natural language processing or the like the semantic analysis unit 10b performs semantic analysis on the speech text acquired by the speech recognition unit 10a. [0026] The response generation unit 10c generates a response to the speech from the user on the basis of the semantic analysis result. For example, in the case where the speech from the user requests “tomorrow’s weather", the response generation unit 10c acquires information on "tomorrow’s weather" from a weather forecast server on a network, and generates a response.);  calculate an ambient volume of an ambient sound around the user ([0030] The level calculation unit 10d may calculates the allowable output level in accordance with existence of an external sound source…In addition, in the case of an important response, the level calculation unit 10d may calculate a high allowable calculation level although there is an external sound source, such that audio volume of the response output is raised to prevent the response from being washed out by the external sound source. Note: It is inherent that in order to calculate the allowable output level higher than the ambient sound (in accordance with existence of an external sound source) around the user, the level calculation unit 10d has to calculate the ambient sound level in order to raise the output level above the ambient sound level in order to prevent output sound level washed out by the external sound level.)); control the first CPU of the user terminal to control the volume of the system utterance based on a combination of the user distance, the calculated user utterance volume, and the calculated ambient volume; control the speech out device of the user terminal to output the system utterance at the controlled volume based on the analyzed user intention of the first user utterance([0003] it is possible to  execute a process corresponding to an instruction made by voice of a user, by using an application of a voice UI  installed in a smartphone, a tablet terminal, or the like. [0028] In addition, various factors are used to determine the surrounding environment (in other words, system usage environment). Therefore, the level calculation unit 10d determines a current surrounding environment in accordance with at least one or more of the various  factors  and calculate an appropriate allowable output level. [0033] The level calculation unit 10d may calculate the allowable output level in accordance with appearance of the user. For example, when the user speaks in low voices, it is expected that the voice UI responds in low voices too, and it is determined that the user is in an environment in which a loud voice is not appropriate. Therefore, the level calculation unit 10d calculates a low allowable output level. Note that, it may be determined whether the voice is low by comparing with a usual voice volume of a speech from the user, or on the basis of behavior such as covering his/her mouse with hand. [0034] The level calculation unit 10d may calculate the allowable output level in accordance with a position of the user with respect to the information processing device 1. For example, in the case where the user is in the vicinity of the information processing device 1, the level calculation unit 10d may calculate a low allowable output level since the user can hear an output response even when the audio volume of the output response is lowered. For example, the position of the user with respect to the information processing device 1 may be acquired by the ranging sensor 15 provided in the information processing device 1. [0036]  Alternatively, the level calculation unit 10d may calculate a final allowable output level by summing weighted allowable output levels calculated for the respective factors. [0037] For example the decision unit 10e decides the response output method on the basis on the allowable output level calculated by the level calculation unit 10d. [0038] The output control unit 10f performs control such that a response generated by the response generation unit 10c is output in accordance with the response output method decided by the decision unit 10e. [0041] The loudspeaker 13 has a function of converting the sound signal into a voice and outputting the voice under the control of the output control unit 10f. [0062] In addition, the decision unit 10e may decide to use a method that also outputs a response from a display screen of the mobile phone terminal, the smart phone, the wearable terminal, or the like held by the user, which enables to output responses by both voice and display.).
Ohmura do not explicitly teach, receive the  second user utterance (interpreted as receive a second utterance of the same user at different point of time) from the user terminal based on the ambient sound and the output system utterance at the controlled volume; execute semantic analysis on the received second user utterance; and change the controlled volume of the system utterance based on the semantic analysis on the received second user utterance; and control the speech output device of the user terminal to output the system utterance at the changed volume.
However, it is obvious that the system will function similarly as for the user first input and accordingly for all other user input, system will automatically adjust to output level based on the environment from the previous output level setting based on the previous environment to the next current environment based on the Ohmura teaching and scope of the invention  i.e. “receive the  second user utterance (interpreted as receive a second utterance of the same user at different point of time) from the user terminal based on the ambient sound and the output system utterance at the controlled volume; execute semantic analysis on the received second user utterance; and change the controlled volume of the system utterance based on the semantic analysis on the received second user utterance; and control the speech output device of the user terminal to output the system utterance at the changed volume” as shown above, based on the system analysis of the environmental factors associating with the user and user utterance in that specific environmental situation i.e. system usage environment ([0028]). Further a second utterance of the user input can also be mapped or shown based on Ohmura’s teaching of other situational system responses similarly from another embodiment, i.e. first utterance with loud voice "what will the weather be like tomorrow?" and the second utterance at a different point of time in different environment, when the user speaks with low voices or a lower voice the same or different command and system will adjust to output level from high based on previous environmental factors to low based on a current environmental factors. Further user smartphone terminal will be controlled through the installed app UI associated with the control unit 10 (See Fig.2, and Fig.5).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ohmura to include the teaching of “receive the  second user utterance from the user terminal based on the ambient sound and the output system utterance at the controlled volume; execute semantic analysis on the received second user utterance; and change the controlled volume of the system utterance based on the semantic analysis on the received second user utterance; and control the speech output device of the user terminal to output the system utterance at the changed volume” in order to provide appropriate response output in accordance with a current surrounding environment ([0007]).
 
Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Ohmura  further in view of Nandy et al.(US 2019/0180740 A1.

Regarding Claim 7, Ohmura teach: The information processing apparatus according to claim 1, wherein the CPU is further configured to control volume of  the system utterance (See rejection of claim 1).
Ohmura does not teach: The information processing apparatus according to claim 1, wherein the CPU control volume of a music  and control volume of the system utterance higher than the volume of the music.
Nandy et al. teach: wherein the CPU control volume of a music  and control volume of the system utterance higher than the volume of the music ([0055] In such examples, the vehicle computing device 110 may be configured to mute, or attenuate (e.g., lower volume) the call audio data while the TTS response audio data is output by the vehicle computing device loudspeakers. The techniques described above with respect to simultaneously communicating and outputting call audio data along with TTS response audio data may similarly be applied to situations where, instead of call audio data, music audio data (or other audio data) is being streamed. For instance, the techniques may similarly apply when music audio data is streamed from the user device 108 to the voice-enabled device 106 using the A2DP network 142, and sent to the vehicle computing device 110 to be output by the vehicle computing device 110 while speech utterances 112 are detected after a wake word are communicated, and TTS response audio data is communicated to be output by the vehicle computing device 110.).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ohmura to include the teaching of Nandy et al. above in order to simultaneously communicating and outputting music or call audio data along with TTS response audio data.

Regarding Claim 8, Ohmura teach: The information processing apparatus according to claim 1, wherein the CPU is further configured to: Control volume of  the system utterance (See rejection of claim 1).
Ohmura does not teach: wherein the CPU is further configured to: control a volume of an ordinary music, and a volume of a background (BGM) music; control the volume level of the system utterance higher than the volume of the ordinary music(mute or attenuate music while outputting TTS response); and control the volume of the ordinary music higher than the volume of the BGM music (filter out echo (BGM (background music or sounds from TTS response while outputting TTS response).
Nandy et al. teach: wherein the CPU is further configured to: control a volume of an ordinary music, and a volume of a background (BGM) music; control the volume level of the system utterance higher than the volume of the ordinary music(mute or attenuate music while outputting TTS response); and control the volume of the ordinary music higher than the volume of the BGM music (filter out echo ( includes BGM (background music) or sounds from TTS response while outputting TTS response) ([0055] Upon receiving the TTS response audio data, the vehicle computing device 110 may output the machine-generated words represented by the TTS response audio data on loudspeakers of the vehicle. In some examples, the vehicle computing device 110 may simultaneously be outputting sound represented by call audio data received from the contact's user device. In such examples, the vehicle computing device 110 may be configured to mute, or attenuate (e.g., lower volume) the call audio data while the TTS response audio data is output by the vehicle computing device loudspeakers. Further, the voice-enabled device 106 may be configured with components for performing echo cancellation. Using these echo cancellation components, the voice-enabled device 106 may cancel, or filter out, the sound corresponding to the TTS response audio data to prevent call audio data sent to the contact's user device from including the sound corresponding to the TTS response audio data. The techniques described above with respect to simultaneously communicating and outputting call audio data along with TTS response audio data may similarly be applied to situations where, instead of call audio data, music audio data (or other audio data) is being streamed. For instance, the techniques may similarly apply when music audio data is streamed from the user device 108 to the voice-enabled device 106 using the A2DP network 142, and sent to the vehicle computing device 110 to be output by the vehicle computing device 110 while speech utterances 112 are detected after a wake word are communicated, and TTS response audio data is communicated to be output by the vehicle computing device 110. Note: echo cancellation will filter out sounds from attenuated(e.g., lower volume) BGM and generated TTS response inputted back to voice enable device as echo being output by the vehicle computing device 110 .).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ohmura to include the teaching of Nandy et al. above in order to simultaneously communicating and outputting music or call audio data along with TTS response audio data.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Yoshida et al.(US 2006/0074684 A1) teach: An on-vehicle acoustic control system determines which one of sounds of an audio device and a navigation device should be generated with priority, when both devices are requested to generate respective sounds. The control system further detects a user's physical condition based on an interaction with the user, a picture of the user and biometric information of the user. The control system generates sound in the order of determined priority, and varies the manner of sound generation based on the user's physical condition.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656