DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
          35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

          Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  

The claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claims are directed to the abstract idea of dialog management for multiple users, as explained in detail below.  Other than reciting “a processor, memory and a device” nothing in the claim element precludes the steps from practically being performed by mental processing. For example, the language, receiving output audio data representing synthesized speech of a list of entries (can be done by a user speaking); beginning playback of audio corresponding to the output audio data during the playback of the audio (can be done by another user speaking); detecting user speech (can be done by detecting someone speaking); determining input audio data representing the user speech (can be done by someone analyzing the data); determining a first time corresponding to the beginning of the playback (can be done by looking at the time during the speaking); determining a second time corresponding to the detecting the user speech (can be done by a user determining a second time); determining offset time data representing a difference between the first time and the second time (can be done by analyzing the time difference); performing speech processing on the input audio data to determine the user speech refers to an entry that is absent from the user speech (can be done by a user analyzing the context of the speech data); processing the offset time data to determine the entry is a first entry in the list of entries (can be done by a user analyzing the data); and causing an action to be performed based at least in part on the first entry (can be done by a user performing an action based on the speech).  Furthermore, causing playback of output audio (can be done by a user outputting); detecting input audio representing user speech (can be done by a user recognizing another user speaking); determining input audio data representing the user speech (can be done by someone analyzing the data); determining time data representing timing of the user speech relative to the playback of the output audio (can be done by a user analyzing the time); sending the input audio data (can be done by a user outputting audio data); and sending the input audio data and the time data (can be done by a user outputting audio data and time data).  Also, receiving input audio data representing an utterance (can be done by a user receiving speech from another user speaking); receiving time data representing timing of the utterance relative to audio being output by the user device (can be done by a user analyzing the time data); performing processing using the input audio data and the time data to determine a first selection referred to in the utterance (can be done by a user analyzing the context and timing data); and causing an action to be performed based at least in part on the selection (can be done by a user performing an action based on what was spoken).
The present claim language under its broadest reasonable interpretation, covers performance of mental processing and recites generic computer components, which all falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements which are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.
	Dependent claims 2-4, 6-11 and 13-20 recite similar language such as analyzing time data, processing speech data, performing text to speech processing and determining speech speed, which can all be done by a human and is non-statutory.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-2, 4-12, 14-18 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Denenberg et al. (USPN 6,724,864), hereinafter referenced as Denenberg.

Regarding claim 1, Denenberg discloses a computer-implemented method comprising: 
receiving output audio data representing synthesized speech of a list of entries (receive voice prompts; column 5, lines 32-55 with column 10, lines 5-20 and column 14, lines 10-33); 
beginning playback of audio corresponding to the output audio data (prompt stream being played; column 10, lines 53-64); 
during the playback of the audio, detecting user speech (user speech barge-in; with column 10, lines 5-20); 
determining input audio data representing the user speech (user barges in; column 12, lines 5-19); 
determining a first time corresponding to the beginning of the playback (prompts being played; column 11, lines 14-65); 
determining a second time corresponding to the detecting the user speech (detecting barge-in; column 11, lines 14-65); 
determining offset time data representing a difference between the first time and the second time (determining timing of events; column 11, lines 14-65);
 performing speech processing on the input audio data to determine the user speech refers to an entry that is absent from the user speech (barge-in may depend on which prompt was playing or which prompts the user heard when barge-in occurred; column 11, lines 14-65); 
processing the offset time data to determine the entry is a first entry in the list of entries (determine prompts; column 11, lines 14-65); and 
causing an action to be performed based at least in part on the first entry (perform a particular action; column 7, line 41 – column 8, line 24 with column 14, lines 10-33).
Regarding claim 2, Denenberg discloses a method further comprising: 
determining stored data corresponding to the output audio data (stored data; column 13, lines 5-25); 
determining a start point of the output audio data (timing data; column 7, lines 41-65); 
determining, using the offset time data and the start point, a first portion of the output audio data (perform calculations of timing; column 7, line 41 – column 8, line 24); 
determining the first portion of the output audio data corresponds to a portion of the list of entries representing the first entry (voice prompts; column 7, line 41 – column 8, line 24); and 
sending an indication of the first entry to a speech processing component (sending voice prompts; column 7, line 41 – column 8, line 24).
Regarding claim 4, Denenberg discloses a method further comprising: 
using the offset time data to determine the user speech began after output began of audio representing the first entry in the list of entries but prior to beginning output of audio representing a second entry in the list of entries (analyzing the timing data; column 7, line 41 – column 8, line 24 with column 13, lines 5-35); and 
causing dialog data to be stored representing the first entry but not the second entry (voice prompt currently being playing, but stops before playing second prompt; column 7, line 41 – column 8, line 24 with column 13, lines 5-35).
Regarding claim 5, Denenberg discloses a method comprising: 
causing playback of output audio (prompt stream being played; column 10, lines 53-64 with column 14, lines 10-33); 
detecting input audio representing user speech (user issue commands; with column 14, lines 10-33); 
determining input audio data representing the user speech (user issue commands; with column 14, lines 10-33); 
determining time data representing timing of the user speech relative to the playback of the output audio (timing data; column 7, lines 41-65 with column 13, lines 5-25); 
sending, to at least one remote device, the input audio data (barge-in data; column 7, line 41 – column 8, line 24, column 13, lines 5-25); and 
sending, to the at least one remote device, the input audio data and the time data (barge-in data with timing; column 7, line 41 – column 8, line 24, column 13, lines 5-25).
Regarding claim 6, Denenberg discloses a method further comprising: 
determining a first timestamp corresponding to a beginning of the playback (timing data; column 7, line 41 – column 8, line 24); 
determining a second timestamp corresponding to the detecting the input audio (timing data; column 7, line 41 – column 8, line 24); and 
determining an offset representing a difference between the first timestamp and the second timestamp, wherein the time data represents the offset (perform time calculation; column 7, line 41 – column 8, line 24).
Regarding claim 7, Denenberg discloses a method further comprising: processing the input audio data to determine that the user speech was system directed (users speech command; column 14, lines 10-33).
Regarding claim 8, Denenberg discloses a method wherein the output audio corresponds to synthesized speech representing a list of entries (prompts be output; with column 14, lines 10-33).
Regarding claim 9, Denenberg discloses a method further comprising, prior to causing the playback: 
receiving second input audio corresponding to previous user speech (user issuing a command; column 14, lines 10-33); 
determining second input audio data representing the previous user speech (back up to previous sentence; column 14, lines 10-33); 
sending, to the at least one remote device, the second input audio data (sending the data accordingly; column 14, lines 10-33); and 
receiving, from the at least one remote device, output audio data representing the list of entries in response to the second input audio data (voice prompts; column 14, lines 10-33).
Regarding claim 10, Denenberg discloses a method wherein the output audio corresponds to synthesized speech representing a list of entries and wherein the method further comprises, after sending the input audio data to the at least one remote device:
receiving, from the at least one remote device, output audio data representing further information regarding a first entry in the list of entries (voice prompts; column 14, lines 10-33); and 
causing playback of second output audio corresponding to the output audio data (playing data accordingly; column 14, lines 10-33).
Regarding claim 11, Denenberg discloses a method further comprising: 
sending, to the at least one remote device, data indicating the input audio data is associated with the time data (timing data of the prompt; column 7, line 41 – column 8, line 24 with column 14, lines 10-33).
Regarding claim 12, Denenberg discloses a computing system comprising: 
at least one processor (column 18, line 45); and 
at least one memory comprising instructions that, when executed by the at least one processor (column 18, line 47), cause the computing system to: 
receive, from a user device, input audio data representing an utterance (user issues command; with column 14, lines 10-33); 
receive, from the user device, time data representing timing of the utterance relative to audio being output by the user device (amount of time that the current voice prompt has been playing; column 7, line 41 – column 8, line 24); 
perform processing using the input audio data and the time data to determine a first selection referred to in the utterance (process the timing of the data to determine when barge-in occurred to select particular prompt; column 7, line 41 – column 8, line 24 with column 14, lines 10-33); and 
cause an action to be performed based at least in part on the selection (perform a particular action; column 7, line 41 – column 8, line 24 with column 14, lines 10-33).
Regarding claim 14, Denenberg discloses a system wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to: 
determine the user device is associated with stored data corresponding to the audio and corresponding to a list of entries (stores voice prompts; column 13, lines 5-25); and 
use the stored data and the time data to determine a first entry from the list of entries, the first entry corresponding to the first selection (stores voice prompts and timing data; column 13, lines 5-25 with column 7, lines 41-65).
Regarding claim 15, Denenberg discloses a system wherein the time data corresponds to offset time data at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to: 
use the stored data to determine a start time corresponding to the audio (timing data; column 7, lines 41-65); 
use the start time and the offset time data to determine a first portion of the list of entries (use the data to analyze the barge in to make a determination; column 7, line 41 – column 8, line 24); 
determine the first portion of the list of entries corresponds to the first selection (voice prompts; column 7, line 41 – column 8, line 24) and 
perform speech processing using the input audio data based at least in part on the first selection (process voice command; column 14, lines 10-33).
Regarding claim 16, Denenberg discloses a system wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to: 
use the time data to determine the utterance began after output began of audio representing a first entry in a list of entries but prior to beginning output of audio representing a second entry in the list of entries (analyzing time data; column 7, line 41 – column 8, line 24); and 
causing dialog data to be stored representing the first entry but not the second entry (discard remainder of prompts; column 7, line 41 – column 8, line 24).
Regarding claim 17, Denenberg discloses a system wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to: 
determine the action to be performed based at least in part on the dialog data perform action; column 7, line 41 – column 8, line 24).
Regarding claim 18, Denenberg discloses a system wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to, prior to receipt of the input audio data: 
receive, from the user device, second input audio data representing a second utterance (user commands; column 14, lines 10-33); 
perform speech processing on the second input audio data to determine output data representing a list of entries (voice prompts; column 14, lines 10-33); 
perform text-to-speech (TTS) processing using the output data to determine output audio data (text-to-speech; column 1, lines 26-34); and 
send the output audio data to the user device to cause the user device to output the audio using the output audio data (output data accordingly; column 14, lines 10-33).
Regarding claim 20, Denenberg discloses a system wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to: 
receive, from the user device, indicator data associating the input audio data with the time data (timing data; column 7, lines 41-65).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Deneberg in view of Costa (PGPUB 2018/0365898).

Regarding claim 3, Denenberg discloses a method as described above, but does not specifically teach a method further comprising, prior to performing the speech processing: 
receiving input image data corresponding to the input audio data; and 
processing the input audio data and the input image data using a first component to determine that the user speech was system directed.
Costa discloses a method further comprising, prior to performing the speech processing: 
receiving input image data corresponding to the input audio data (using gazing at an object while speaking a command; p. 0025, 0086); and 
processing the input audio data and the input image data using a first component to determine that the user speech was system directed (determining based on the gaze and the context that the user intended to select a specific device; p. 0025, 0086), to determine intent.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the system as described above, to determine that the user intends to interact with a particular object.

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Deneberg in view of Orr et al. (PGPUB 2016/0378747), hereinafter referenced as Orr.

Regarding claim 13, Denenberg discloses a system as described above, however, does not specifically teach wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to: 
perform speech processing using the input audio data to determine the utterance refers to a selection that is not named in the utterance.
Orr discloses a system comprising:
performing speech processing using the input audio data to determine the utterance refers to a selection that is not named in the utterance (user utters the command play that song from Top Gun, however, the user can say “No, I meant the other one”, which the actual other song is not named in the utterance; p. 0265-0268), to provide alternate ways to say data.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the system as described above, to provide a system that analyzes the context and provide a variety of ways to say data.

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Deneberg in view of Shevchenko et al. (USPN 10,594,757), hereinafter referenced as Shevchenko.
Regarding claim 19, Denenberg discloses a system wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the computing system to: 
further use the second time data to determine the first selection (use the amount of time to indicate when the barge-in transpired to determine selection; column 7, line 41 – column 8, line 24 with column 13, lines 5-25 and column 14, lines 10-33), but does not specifically teach a system comprising determining user profile data corresponding to the utterance; and determining, using the user profile data, second time data corresponding to a speech speed.
Shevchenko discloses a method comprising determining user profile data corresponding to the utterance (accessing the profile; column 79, lines 23-63); and 
determining, using the user profile data, second time data corresponding to a speech speed (using the profile to teach desired speech speed), to accommodate the user.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the system as described above, to increase the effectiveness of communications.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  This information has been detailed in the PTO 892 attached (Notice of References Cited).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JAKIEDA R JACKSON/Primary Examiner, Art Unit 2657