Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 2/4/2020 are being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-19 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more.
The independent claims 1, 10 and 11 recite A speech processing method, computer program stored on a recording medium, and an apparatus comprising one or more processors configured to implement a method comprising: converting a response text, which is generated in response to a spoken utterance of a user, to a spoken response utterance; obtaining external situation information while outputting the spoken response utterance; generating a dynamic spoken response utterance by converting the spoken response utterance on the basis of the external situation information; and outputting the dynamic spoken response utterance: 
The limitations of “converting”, “obtaining”, “generating” and “outputting” as drafted cover a human organizing of activities where a human hears an utterance “set temperature to 70 degrees”, 
This judicial exception is not integrated into a practical application. In particular the claim recites additional element of “processor”, “memory”, which is a form of generic computer equipment. In the as-filed Specifications “[0099] In addition, the memory 160 may store therein a command to be executed by the information processor 150, including, for example, a command for converting the response text, which is generated in response to the user's spoken utterance, to the spoken response utterance, a command for obtaining the external situation information while outputting the spoken response utterance, a command for generating the dynamic spoken response utterance Docket No. 3130-3239by converting the spoken response utterance on the basis of the external situation information, and a command for outputting the dynamic spoken response utterance. In addition, the memory 160 may store therein various information processed by the information processor 150” the elements “processor”, “memory” are all general purpose computer devices.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.

Claims 3 & 13 recite wherein generating the dynamic spoken response utterance comprises generating a first dynamic spoken response utterance by inserting a silent section into the spoken response utterance in response to a determination that the noise is the first noise. This amounts to a human simply pausing their readout of the text message when the ambient noise is high. No additional limitations are present.
Claims 4 & 14 recite generating the first dynamic spoken response utterance until the first noise becomes less than the first reference value; and when the first noise becomes less than the first reference value, stopping inserting the silent section and resuming generating the spoken response utterance. This amounts to a human simply pausing their readout of the text message until the ambient now has returned to an inaudible level, and then resume recitation of the text message. No additional limitations are present.
Claims 5 & 15 recite further comprising outputting a prestored utterance after stopping inserting the silent section and prior to resuming outputting the spoken response utterance.  This amounts to the human, while he has paused reading the text, saying “I will resume my message when noise is low”. No additional limitations are present.

Claims 7 and 17 recite wherein generating the dynamic spoken response utterance comprises: generating the second dynamic spoken response utterance until the second noise becomes less than the second reference value; and when the second noise becomes less than the second reference value, stopping generating the second dynamic spoken response utterance and resuming generating the spoken response utterance.  This amounts to a human increasing his volume as he is reading the text message in response to noise and when the noise subsides, reading it at a normal level. No other limitations are present.
Claims 8 and 18 recite wherein obtaining the external situation information comprises obtaining time limit information, based on which output of the spoken response utterance should be stopped within a predetermined time.  This amounts to a second person instructing the first person to only read the text message for a fixed amount of time. No other limitations are present.
Claims 9 and 19 recite wherein generating the dynamic spoken response utterance comprises generating a third dynamic spoken response utterance by changing an output rate of the spoken response utterance on the basis of the time limit information.  This amounts to a human speeding up the reading of the text message to make sure it fits into a time range stipulated by the second person. No other limitations are present.



The claim does not fall within at least one of the four categories of patent eligible subject matter because the claim can be construed as non-statutory ‘signal claims’. Patent case law has held that transitory forms of signal transmission (for example, a propagating electrical or electromagnetic signal per se) fail to meet the requirements of 35 U.S.C. §101. See In re Nuijten, 500 F.3d 1346, 1357, 84 USPQ2d 1495, 1503 (Fed. Cir. 2007) and MPEP §2106. Applicant’s claims are directed to a computer readable recording medium that could be broadly construed as ‘signal claims’. The Specification, [00140], recites examples of computer-readable medium but it says the media are “not limited to” the examples. However, this language of “not limited to” does not expressly require that a “computer-readable recording medium” must be non-transitory, so a non-transitory embodiment could be understood to be ‘optional’. The USPTO takes the position that claims directed to a computer-readable medium be broadly construed in accordance with the Specification, and that claims directed to a computer-readable medium that do not expressly exclude transitory embodiments may fail to meet the requirements of 35 U.S.C. §101. Conceivably, “a computer-readable recording medium” could only provide for some transitory storage of instructions. Applicant can overcome this rejection by amending claim 10 to set forth “a non-transitory computer readable recording medium storing instructions”.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 10, 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Liang (US-10276149 B1) and in further view of Agrawal (US-20170193982 A1)

With respect to claims 1 and 11 Liang teaches A speech processing method, comprising: converting a response text, which is generated in response to a spoken utterance of a user, to a spoken response utterance (¶Col 22, ll 57-66: "FIGS. 8A through 8B illustrate dynamically selecting TTS ...During runtime operations, the speech-controlled device 110 captures (814) input audio corresponding to an utterance, and sends (816 illustrated in FIG. 8A) audio data corresponding thereto to the server(s) 120. The server(s) 120 determines (818) output content responsive to the spoken utterance. The server(s) 120 may perform ASR on the input audio data to determine input text data..., and  ¶ Col 23 ll 29-50 The server(s) 120 performs TTS processing on the output text data to create (832) output audio data. Creating the output audio data may be based on the deviation between the rate of speech of the spoken command and the average rate of speech of the user. If the deviation is below a threshold (i.e., if the rate of speech in the input audio data is substantially similar to the average rate of speech of the user), the server(s) 120 may create the output audio data using settings (e.g., creating default portions of the output audio data specific to the command and/or creating the output audio data to be output at a default speed). If the deviation is above the threshold (i.e., the rate of speech in the input audio data is 
Liang  does not teach obtaining external situation information while outputting the spoken response utterance; generating a dynamic spoken response utterance by converting the spoken response utterance on the basis of the external situation information; outputting the dynamic spoken response utterance.
Agrawal teaches obtaining external situation information while outputting the spoken response utterance (¶ [0081]: "For example, if the readout module 205 has to pause the audio readout (e.g., in response to ambient noise levels exceeding a threshold, in response to an incoming notification of higher priority, or in response to user command), then the summary module 305 may generate a summary of the portion of the audio readout already presented.");
generating a dynamic spoken response utterance by converting the spoken response utterance on the basis of the external situation information (¶ [0081]:" For example, if the readout module 205 has to pause the audio readout (e.g., in response to ambient noise levels exceeding a threshold, in response to an incoming notification of higher priority, or in response to user command), then the summary module 305 may generate a summary of the portion of the audio readout already presented."); and 
outputting the dynamic spoken response utterance (¶ [0082]:" The prompt module 310, in one embodiment, is configured to prompt the user whether to present an audio readout of a notification. The prompt module 310 may further listen for a response from the user. In one embodiment, the user may respond affirmatively or negatively. In another embodiment, the user may respond to the user command, for example command to present the audio readout. Based on the user response, the prompt module 310 may control the readout module 205 to present the audio readout.")
to incorporate user attention via using sensor data into delivery of notifications (Agarwal, Abstract).

With respect to claims 2 and 12 Liang does not teach measuring noise, as the external situation information, inputted through a microphone after outputting the spoken response utterance; determining a noise that exceeds a first reference value as a first noise, which is direct response information of the user; and determining a noise that exceeds a second reference value and is less than the first reference value as a second noise, which is indirect audio information of surroundings. 
Agrawal teaches measuring noise, as the external situation information, inputted through a microphone after outputting the spoken response utterance (¶ [0081]: "For example, if the readout module 205 has to pause the audio readout (e.g., in response to ambient noise levels exceeding a threshold, in response to an incoming notification of higher priority, or in response to user command), then the summary module 305 may generate a summary of the portion of the audio readout already presented.”, and ¶  [0137]: "In some embodiments, measuring 510 the ambient noise level includes receiving data from a microphone, noise meter, or other device for measuring ambient noise.");
determining a noise that exceeds a first reference value as a first noise, which is direct response information of the user (¶ [0081]: "For example, if the readout module 205 has to pause the audio readout (e.g., in response to ambient noise levels exceeding a threshold, in response to an incoming notification of higher priority, or in response to user command), then the summary module 305 may generate a summary of the portion of the audio readout already presented. "); and
determining a noise that exceeds a second reference value and is less than the first reference value as a second noise, which is indirect audio information of surroundings (¶ [0010]: "In certain embodiments, determining the user attention state includes measuring an ambient noise level at the [second reference value]”, and ¶ [0016] “In certain embodiments, the apparatus includes an ambient noise sensor, wherein the memory further comprises code executable by the processor to: measure an ambient noise level at the apparatus, compare the ambient noise level to an inaudible state threshold, pause the audio readout in response to the ambient noise level being above the inaudible state threshold, re-measure an ambient noise level at the user device in response to pausing the audio readout, compare the re-measured ambient noise level to an audible state threshold, resume the audio readout in response to the re-measured ambient noise level being below the audible state threshold for a threshold amount of time, and present visual notification and the visual cue, in response to the ambient noise level remaining above the audible state threshold for a predetermined time frame ")
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Liang to include the teachings of Agrawal motivation being to incorporate user attention via using sensor data into delivery of notifications (Agarwal, Abstract).

With respect to claims 3 and 13 Liang does not teach wherein generating the dynamic spoken response utterance comprises generating a first dynamic spoken response utterance by inserting a silent section into the spoken response utterance in response to a determination that the noise is the first noise.
Agrawal teaches wherein generating the dynamic spoken response utterance comprises generating a first dynamic spoken response utterance by inserting a silent section into the spoken response utterance in response to a determination that the noise is the first noise.  (¶ [0081]: "For example, if the readout module 205 has to pause the audio readout (e.g., in response to ambient noise levels exceeding a threshold, in response to an incoming notification of higher priority, or in response to 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Liang to include the teachings of Agrawal motivation being to incorporate user attention via using sensor data into delivery of notifications (Agarwal, Abstract).

With respect to claims 4 and 14, Liang does not teach generating the first dynamic spoken response utterance until the first noise becomes less than the first reference value; and when the first noise becomes less than the first reference value, stopping inserting the silent section and resuming generating the spoken response utterance.
Agrawal teaches generating the first dynamic spoken response utterance until the first noise becomes less than the first reference value (¶ [0016]: "In certain embodiments, the apparatus includes an ambient noise sensor, wherein the memory further comprises code executable by the processor to: measure an ambient noise level at the apparatus, compare the ambient noise level to an inaudible state threshold, pause the audio readout in response to the ambient noise level being above the inaudible state threshold, ...") ; and 
when the first noise becomes less than the first reference value, stopping inserting the silent section and resuming generating the spoken response utterance (¶ [0016]: "... re-measure an ambient noise level at the user device in response to pausing the audio readout, compare the re-measured ambient noise level to an audible state threshold, resume the audio readout in response to the re-measured ambient noise level being below the audible state threshold for a threshold amount of time, and present visual notification and the visual cue, in response to the ambient noise level remaining above the audible state threshold for a predetermined time frame.")
to incorporate user attention via using sensor data into delivery of notifications (Agarwal, Abstract)

With respect to claim 10 Liang in view of Agrawal teaches A computer-readable recording medium on which a computer program is stored for implementing the method according to claim 1 using a computer.  (¶ Liang: Col 30 ll 28-40: "FIG. 18 is a block diagram conceptually illustrating a user device 110 (e.g., the speech-controlled device 110 described herein) that may be used with the described system 100. FIG. 19 is a block diagram conceptually illustrating example components of a remote device, such as the server 120 that may assist with ASR processing, NLU processing, or command processing. Multiple servers 120 may be included in the system 100, such as one server 120 for performing ASR, one server 120 for performing NLU, etc. In operation, each of these devices (or groups of devices) may include computer-readable and computer-executable instructions that reside on the respective device (110/120), as will be discussed further below.”)  Additionally, Liang in view of Agrawal teaches the method of claim 1, as described in the rejection of claim 1 above.

Claims 5 and 15 are  is rejected under 35 U.S.C. 103 as being unpatentable over Liang, Agrawal as applied to claims 4 and 14, and in further view of Walters (US 9172747 B2).
With respect to claims 5 and 15 Walters teaches further comprising outputting a prestored utterance after stopping inserting the silent section and prior to resuming outputting the spoken response utterance.  (¶ Col. 27 ll 57 - Col. 28 ll 59: " In another embodiment, user 710 may, for example, stop the dialogue with virtual assistant 710 (for example, by uttering a command or interaction via an interface on user device 702)…for example, a different strategy may be invoked if a task is resumed after a short interruption as opposed to resuming a task after a day or on a different device or modality. For 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Liang, Agrawal to include the teachings of Walters, motivation being to give the user the choice of continuing the dialog after a pause or delaying the dialog further, which gives the user additional flexibility regarding when and where to continue the dialog session  (Walters Col. 26 ll 45 – Col. 27 ll 56).

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Liang, Agrawal as applied to claim 2 and 12, and in further view of Flury (US 20090016329 A1).

With respect to claims 6 and 16 Flury teaches wherein generating the dynamic spoken response utterance comprises generating a second dynamic spoken response utterance by increasing a volume of the spoken response utterance or by increasing a pitch of the spoken response utterance in response to a determination that the noise is the second noise (¶ [0119]: "In this step, the supervisory entity 19 can furthermore control adaptation of a characteristic of the selected interface device as a function of information relating to its sensory context. For example, if loudspeakers that reproduce messages to the user in synthesized speech form are selected as the interface device and a background noise level above a threshold level is detected in the immediate environment of the loudspeakers, the supervisory entity 19 can command the interface device to increase the volume at which messages are reproduced. The distance between the user and the loudspeakers can also be taken into account for this.")
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Liang, Agrawal to include the teachings of Flury motivation .

Claims 7 and 17 is  rejected under 35 U.S.C. 103 as being unpatentable over Liang, Agrawal, Flury as applied to claims 6 and 16, and in further view of Haparnas (US-20050282590-A1 ).

With respect to claims 7 and 17 Haparnas teaches generating the second dynamic spoken response utterance until the second noise becomes less than the second reference value; and (¶ [0041]: "For example, if the ambient noise level is measured as being 5 decibels higher than the first threshold T1, then control software 1122 may cause the speaker volume to be increased by 10 decibels. In other embodiments, the speaker volume is increased by a predetermined value regardless of the degree of difference between the measured ambient noise level and the first threshold T1.")
when the second noise becomes less than the second reference value, stopping generating the second dynamic spoken response utterance and resuming generating the spoken response utterance.  (¶ [0046]: "If the measured ambient noise level is not less than the second threshold T2, then the surrounding noise level is approximately between the first and second thresholds T1 and T2 (i.e., T1&gt;=ambient noise&gt;=T2). Under this condition, control software 1122 maintains the speaker volume at the default volume level, and therefore no instructions are issued for increasing or decreasing the volume of speaker 110.”, and ¶ [0043] “Therefore, if it is determined that the ambient noise level is below the second threshold T2, then control software 1122 causes control mechanism 140 to decrease the volume of speaker 110 or the ring tone. The decrease in volume would reduce the volume of audio output generate by speaker 110, preferably, to a degree that the user can still hear the audio output, without disrupting the serenity of the surrounding environment. For example, if the user walks into a 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Liang, Agrawal, Flury to include the teachings of Harpanas motivation being to adjust the volume in relationships to the noise in the environment (Harpanas, Abstract)

Claims 8 and 18 is rejected under 35 U.S.C. 103 as being unpatentable over Liang, Agrawal as applied to claims 1 and 11, and in further view of Higbie (US-20170103754-A1)

With respect to claims 8 and 18  Higbie teaches wherein obtaining the external situation information comprises obtaining time limit information, based on which output of the spoken response utterance should be stopped within a predetermined time (¶ [0013]: "In some embodiments, the one or more programs include instructions that when executed by the one or more processors, further cause the device to perform: detecting a second event identifying an end time for the speech interactive content; in response to detecting the second event identifying the end time for the speech interactive content, terminating the playback of the speech interactive content at the end time, and turning on a speech recognizer to start listening for a user's voice command for a predetermined period of time.")
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Liang, Agrawal to include the teachings of Higbie motivation being that using predefined events/metadata improves the latency issues inherent in multi-stream switching by delivering precise timing control (Higbie, [0247-0248])

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Liang, Agrawal, and Higbie as applied to claim 8 and 18, and in further view of Xie (US 20030212559 A1).

With respect to claims 9 and 19, Liang and Agrawal don’t teach but Higbie teaches generating a third dynamic spoken response utterance […] on the basis of the time limit information (¶[0013]: "In some embodiments, the one or more programs include instructions that when executed by the one or more processors, further cause the device to perform: detecting a second event identifying an end time for the speech interactive content; in response to detecting the second event identifying the end time for the speech interactive content, terminating the playback of the speech interactive content at the end time, and turning on a speech recognizer to start listening for a user's voice command for a predetermined period of time.")
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Liang, Agrawal to include the teachings of Higbie motivation being that using predefined events/metadata improves the latency issues inherent in multi-stream switching by delivering precise timing control (Higbie, [0247-0248])
Liang, Agrawal, Higbie do not teach wherein generating the dynamic spoken response utterance comprises generating a third dynamic spoken response utterance by changing an output rate of the spoken response utterance on the basis of the time limit information.
Xie teaches wherein generating the dynamic spoken response utterance comprises generating a third dynamic spoken response utterance by changing an output rate of the spoken response utterance [[on the basis of the time limit information]]  (¶ [0027]: "The commands may include, for example: a command to begin synthesizing speech corresponding to the text included in the file so that the text is reproduced audibly; a command to end the synthesis; a command to preset a start-up time and/or an end time for the speech synthesis; a command to select/change a voice(s) used in the speech synthesis; a command to select/change the speed of the synthesized speech; a command corresponding 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Liang, Agrawal, Higbie to include the teachings of Xie motivation being to give the users of electronic devices an immersive experience that assimilates content without looking at displays (Xie, [0005-0006]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 






/A.N.P./               Examiner, Art Unit 2657   

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657