Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Claim Rejections - 35 USC § 101
Applicant’s amendment of claim 20 to recite a non-transitory computer readable medium suffices to obviate 35 U.S.C. 101 rejection of the claim.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3-8, 13-17, 19, 20 rejected under 35 U.S.C. 103 as being unpatentable over Stevans: 20180108343 hereinafter Ste and further in view of Brown: 20150186156 .

Regarding claim 1, 19, 20
Ste teaches:
An information processing device, terminal, method and coded instructions comprising : a memory; and processing circuitry (Ste: ¶ 64; Claim 17: system operable by a processor operable of instruction stored in a memory) configured to
infer an action purpose of a user (Ste: ¶ 23; Fig 2: action system 202 determines a user intent based on a transcription of a user utterance and performs an action inferred or otherwise determined based interpretation of the utterance) based on a result of sensing by one or more sensors (Ste: ¶ 4, 22-24; Fig 2: user utterance determined based on sound captured by a microphone(s)); and 
control, based on the inferred action purpose, output of a voice to the user by an audio output unit (Ste: ¶ 4, 22-24; Fig 2: action system 202 determines an appropriate action to a user  
wherein the memory stores a plurality of target modes of speaking according to the action purpose of the user (Ste: Abstract; ¶ 22-24, 29, 45, 48: determination of a particular user intent based on a recognized phrase thereby signaling a particular user action purpose, intent, etc. which invokes a particular assistant having a particular voice output including particular prosody characteristics thereof, the system comprises plural assistants with associated voice models said assistants stored in memory and invoked based on a recognized user action purpose, intent, etc.), 
each of the plurality of target modes of speaking corresponding to each action purpose of the user (id: a particular action purpose, intent, etc. as signaled by particular user speech selects a voice output based thereon, e.g. a user desiring voice navigation data with a particular resilience against car noise would invoke a voice trained on a first language model whereas a user seeking knowledge of Latino culture would invoke a voice trained on a second language model),
 each of the plurality of target modes of speaking defining at least one of volume, pitch or speed of the voice output by the audio output unit (Ste: ¶ 36, 55; Table 1: parameters include speed, formality, arousal, etc.), and the processing circuitry is further configured to identify, from among the plurality of target modes of speaking stored in the memory, a first target mode of speaking that corresponds to the inferred action purpose of the user, the first target mode of speaking defining at least one of first volume, first pitch or first speed of the voice output by the audio output unit (Ste: ¶ 22-29, 33-38, 43-48, 55, 56: Table 1: a particular utterance signaling a user action purpose, intent, etc. activates a particular assistant, said assistant comprising a voice model including at least speed), 
receive a sound collection result of collecting speaking of the user (Ste: ¶ 22-29; Fig 3: e.g. generation of transcriptions based on user utterances), 
control the output of the voice such that at least one of volume, pitch or speed of the voice output to the user by the audio output unit is gradually changed from the at least one of the second volume, the second pitch or the second speed of the identified mode of speaking of the user to the at least one of the first volume, the first pitch or the first speed of the identified first target mode of speaking (Ste: ¶ 22-29, 33-38, 43-48, 55, 56: Table 1: a particular utterance signaling a user action purpose, intent, etc. activates a particular assistant, said assistant comprising a voice model including at least speed; a second utterance operates to signal a second action purpose, intent etc. and instantiate a second assistant bearing second voice characteristics such as speed).

Ste strongly suggests the alteration of a particular output voice of the system gradually over time in as much as Ste discusses voice “morphing” of a particular output voice based on a variety of user parameters including speed, frequency, spectral, pitch, etc. parameters (Ste: ¶ 29, 44, 45: morphing is considered the changing of a voice over time i.e. such that the voice is gradually changed or adapted based on particular parameters) as well as teaching the utility of first, second, etc. voice assistants operative to output speech to a user using first, second, etc. voices. However, Ste does not explicitly teach a voice assistant or voice response system operable to identify a mode of speaking of the user based on the received sound collection result, the identified mode of speaking of the user defining at least one of second volume,    second pitch or second speed, and thereby control the voice output such that the voice is gradually changed from the at least one of the second volume, the second pitch or the second speed of the identified mode of speaking of the user to the at least one of the first volume, the first pitch or the first speed of the identified first target mode of speaking.

In a related field of endeavor Brown teaches a system and method for a virtual assistant a memory; and processing circuitry (Brown: Fig 2: processor(s) 202, memory 204, etc.) wherein modes of interaction, including the manner of speech with which a user has interacted with an assistant, are analyzed and used to adapt the assistant to emulate to user context, user modes of interaction with the assistant, etc. in this way the assistant seeks to adapt audible and other modalities to a user by adapting particular audible, voice, etc. parameters such as speed, volume, cadence, inflection, etc. (Brown: ¶ 99-101 202, 203) and in this way the assistant operates to mimic parameters representative of a user interaction with the assistant i.e. speed of speaking, accent, etc. It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Brown taught voice mimicry to adapt the morphing over time of a first, second, etc. voice assistant such as that taught by Ste. The average skilled practitioner would have been motivated to do so for at least the purpose of emulating human-to-human interactions, mimicking the volume level of a spoken command word or phrase that initiated the interaction, etc. and would have expected predictable results therefrom. 

Regarding claim 3
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein at least one of the one or more sensors senses sound occurring in a place where the user is located, and the processing circuitry causes acoustic characteristics of the voice output by the audio output unit to be 
  
Regarding claim 4
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein the processing circuitry causes acoustic characteristics of the voice output by the audio output unit to be changed further according to a topic corresponding to the voice output to the user (Ste: ¶ 26, 30-38, 43-45, 65, 69: meaning of a user utterance determines parameters of the voice output such as a bank teller assistant voice for bank operations; an Abraham Lincoln voice for civil war history, an educational voice for helping children with homework, etc.).

Regarding claim 5
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein the processing circuitry controls the output of the voice to the user by the audio output unit further on the basis of whether or not the action purpose of the user is business (Ste: ¶ 38: data providers such as a business determine and output particular voices based on parameters desired by and/or inherent to said business). While Ste in view of Brown does not explicitly teach controlling an output voice, and/or parameters thereof based on the number of the users, Examiner has taken official notice which Applicant has failed to timely and explicitly traverse, it is thus accepted as 

Regarding claim 6
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein the processing circuitry causes a frequency of sensing by at least one of the one or more sensors to be further changed on the basis of the result of inference by the processing circuitry. Examiner has taken official notice which Applicant has failed to timely and explicitly traverse, it is thus accepted as Applicant’s Prior Art (APA, please see MPEP 2144.03) that adjusting a frequency of sensing by adjusting a sampling rate for a sensor would have comprised an obvious inclusion. The average skilled practitioner would have been motivated to do so for the purpose of increasing or decreasing a sensitivity, frequency range, etc. of a sensor and would have expected only predictable results therefrom.

Regarding claim 7
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein the processing circuitry causes a topic corresponding to the voice output by the audio output unit to be further changed on the basis of the result of inference by the processing circuitry (Ste: ¶ 26, 30-38, 43-45, 54-58: determination of user intent selects an output voicing and adjusts parameters thereof such as by speech morphing based on determined parameters of the determined user intent or by 

Regarding claim 8
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein the processing circuitry causes a length of the voice for each voice output to the user to be further changed on the basis of the result of inference by the processing circuitry (Ste: Abstract; ¶ 12, 26, 30-38, 43-45, 54-58; Fig 6: the available plugins comprise a speed parameter for each/any of the particular parameterized personalities such as a Santa Claus, Abraham Lincoln, etc. and as such selection of a separate parameterized voice resolves a diversity of speed values; further a particular personality may also comprise a prescribed duration of voice output such as a particular time of day, day of week, time of year, season, holiday, etc.); (Brown: ¶ 99-101 202, 203: analysis of speech parameters such as speed of a user voice adapts assistant to mimic said parameters). 

Regarding claim 13
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein the processing circuitry successively infers the action purpose of the user every time a sensing result of sensing by the one or more sensors is obtained, and in a case where a second action purpose that differs from a first action purpose of the user initially inferred by the processing circuitry is inferred by the processing circuitry, the output control unit causes the output mode of the voice output by the 

Regarding claim 14
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein every time a voice is output by the audio output unit, a change in mode of speaking of the user is identified, and the processing circuitry causes the output mode of the voice output to the user by the audio output unit to be gradually changed further on the basis of a change degree to which the mode of speaking of the user has changed for each voice output to the user (Ste: Abstract; ¶ 12, 26, 30-38, 43-45, 54-58; Fig 6: system iteratively adapts the voice output to a user)

Regarding claim 15
Ste in view of Brown does not explicitly teach processing circuitry that, in a case where reliability of the action purpose of the user inferred by the processing circuitry is lower than a predetermined threshold value, inquires of the user about the action purpose of the user, the processing circuitry causes the output mode of the voice output to the user by the audio output unit to be gradually changed on the basis of a mode of target speaking corresponding to an answer of the user to the inquiry by the inquiry unit.
Examiner has taken official notice which Applicant has failed to timely and explicitly traverse, it is thus accepted as Applicant’s Prior Art (APA, please see MPEP 2144.03) that a request for disambiguation by a user in the event of a determination falling below a confidence 

Regarding claim 16
Ste in view of Brown teaches or suggests:
The information processing device according to claim 1, wherein the processing circuitry that recognizes an action of the user on the basis of a sensing result of sensing by the one or more sensors (Ste: ¶ 30-40: determination of a user function such as making toast or driving as well as determination of a user location such as proximity of a particular location conflates user action with particular voice and/or voice output parameters); (Brown: ¶ 24-28, 38, 68, 69, 174, etc.: system iteratively adapts the voice output to a user based on sensed user parameters, and/or sensed user context, etc. including location, topic, time of day, etc.), and the processing circuitry infers the action purpose of the user on the basis of a result of recognition by the processing circuitry (Ste: ¶ 22-24, 30-40, 43-46; Fig 5-7: particular audible parameters of the voice output are determined, changed, etc. based on parameters derived from the user utterance); (Brown: ¶ 24-28, 38, 68, 69, 174, etc.: system iteratively adapts audible parameters of voice output to a user based on sensed user parameters, and/or sensed user context, etc.).

Regarding claim 17
Ste in view of Brown teaches or suggests:
processing circuitry infers an action purpose of the user, the use being identified on the basis of the result of sensing by the one or more sensors (Ste: ¶ 22-24, 30-40, 43-46; Fig 5-7: determination of a user function such as making toast or driving as well as determination of a user location such as proximity of a particular location conflates user action with particular voice and/or voice output parameters); (Brown: ¶ 24-28, 38, 68, 69, 174, etc.: system iteratively adapts the voice output to a user based on sensed user parameters, and/or sensed user context, etc. including location, topic, time of day, etc.), but does not teach the action purpose inference relative to a basis of use corresponding to a room in which the user is located. Examiner has taken official notice which Applicant has failed to timely and explicitly traverse, it is thus accepted as Applicant’s Prior Art (APA, please see MPEP 2144.03) that determination of a room location of a user was well known in the art before the effective filing date of the instant invention and would have comprised an obvious parameter by which to adjust the particular voice and/or voice output parameters of the Ste in view of Brown system and method. The average skilled practitioner would have been motivated to do so for the purpose of using a doctor voice at a doctor’s office, a chef voice in a kitchen, etc. and would have expected only predictable results therefrom.

Response to Arguments
Applicant’s arguments in concert with claim amendments, see Remarks and Claims, filed 12/23/21, with respect to the rejection(s) of claim(s) 1-4, 7-8, and 18-20 under 35 U.S.C. 102(a)(2) as being anticipated by Stevans, Claims 5-6 under 35 U.S.C. 103 as being unpatentable over Stevans, and Claims 9-17 under 35 U.S.C. 103 as being unpatentable over Stevans in view of Jothilingamunder have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Stevans in view of Brown.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701. The examiner can normally be reached 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN CHIN can be reached on 5712727848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To 





/PAUL C MCCORD/Primary Examiner, Art Unit 2654