Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to correspondence filed 03/09/21 regarding application 15/548,265, in which claims 1, 11, 17, 23, and 27 were amended. In order to expedite allowance, the examiner has further amended claims 1, 23, and 27. Claims 1-3, 6-9, 11-14, 17, 20, 22, 23, 27, and 29-31 are pending and have been considered.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in a telephone interview with Roger S. Sampson, Reg. No. 44,314 on 03/17/21.

The application has been amended as follows:
In the claims:

1. (Currently Amended) A method of processing audio data, the method comprising:
receiving, by a control system that includes one or more processors, audio data corresponding to a recording of a conference, the audio data including data corresponding to conference participant speech of each of a plurality of conference participants, wherein the audio data are received after the conference has been completed;

analyzing the selected playback audio data to determine conversational dynamics data that includes one or more of: data indicating the frequency and duration of conference participant speech; data indicating instances of conference participant doubletalk during which at least two conference participants are speaking simultaneously; or data indicating instances of conference participant conversations;
applying the conversational dynamics data as one or more variables of a spatial optimization cost function of a vector describing a virtual conference participant position for each of the conference participants in a virtual acoustic space, wherein the spatial optimization cost function includes a perceptual cost term that tends to place conversational participants who speak frequently in front of a listener;
applying an optimization technique to the spatial optimization cost function to determine a locally optimal solution;
providing, by the control system, the selected playback audio data to a speaker system; and
controlling, by the control system, post-conference playback of the selected playback audio data on the speaker system according to the locally optimal solution.
2. (Original) The method of claim 1, further comprising receiving an indication of a target playback time duration, wherein the selecting involves making a time duration of the playback 
3. (Previously Presented) The method of claim 1, wherein a time duration of the playback audio data is determined, at least in part, by multiplying a time duration of at least one selected portion of the conference participant speech by an acceleration coefficient.
4. (Canceled)
5. (Canceled)
6. (Previously Presented) The method of claim 1, wherein the selecting involves a topic section process and wherein the topic section process involves:
receiving a topic list of conference topics; and
determining a list of selected conference topics, the list of selected conference topics comprising a subset of the conference topics.
7. (Original) The method of claim 6, further comprising receiving topic ranking data indicating estimated relevance of each conference topic on the topic list, wherein determining the list of selected conference topics is based, at least in part, on the topic ranking data.
8. (Previously Presented) The method of claim 1, wherein the selecting involves a talkspurt filtering process and wherein the talkspurt filtering process involves removing an initial portion of an input talkspurt, the initial portion comprising a time interval from an input talkspurt start time to an output talkspurt start time.
9. (Original) The method of claim 8, further comprising calculating an output talkspurt time duration based, at least in part, on an input talkspurt time duration.
10. (Canceled)
11. (Previously Presented) The method of claim 1, wherein the selecting involves the acoustic feature selection process and wherein the acoustic feature selection process involves determining one or more of pitch variance, speech rate or loudness.

13. (Previously Presented) The method of claim 1, further comprising modifying a start time or an end time of at least one instance of conference participant speech.
14. (Previously Presented) The method of claim 13, wherein the modifying involves at least one of: expanding a time interval corresponding to an instance of conference participant speech; or merging two or more instances of conference participant speech, corresponding with a single conference endpoint, that overlap in time.
15. (Canceled)
16. (Canceled)
17. (Previously Presented) The method of claim 1, further comprising scheduling instances of conference participant speech for playback based, at least in part, on a set of perceptually-motivated rules, wherein the set of perceptually-motivated rules includes one or more of: a rule indicating that two talkspurts of a single conference participant should not overlap in time; a rule indicating that two talkspurts should not overlap in time if the two talkspurts correspond to a single endpoint; a rule wherein, given two consecutive input talkspurts A and B, A having occurred before B, the playback of an instance of conference participant speech corresponding to B may begin before the playback of an instance of conference participant speech corresponding to A is complete, but not before the playback of the instance of conference participant speech corresponding to A has started; or a rule allowing the playback of an instance of conference participant speech corresponding to B to begin no sooner than a time T before the playback of an instance of conference participant speech corresponding to A is complete, wherein T is greater than zero.
18. (Canceled)
19. (Canceled)
20. (Previously Presented) The method of claim 1, further comprising: 
providing instructions for controlling a display to provide a graphical user interface;

processing the audio data based, at least in part, on the input.
21. (Canceled)
22. (Previously Presented) The method of claim 20, wherein the input corresponds to an indication of a target playback time duration.

23. (Currently Amended) An apparatus, comprising:
an interface system; and
a control system including one or more processors, the control system being capable of:
receiving, via the interface system, audio data corresponding to a recording of a conference, the audio data including data corresponding to conference participant speech of each of a plurality of conference participants, wherein the audio data are received after the conference has been completed;
selecting only a portion of the conference participant speech as selected playback audio data for post-conference playback, wherein the selecting involves one or more of: (a) a topic selection process of selecting conference participant speech as selected playback audio data according to estimated relevance of the conference participant speech to one or more conference topics; (b) a topic selection process of selecting conference participant speech as selected playback audio data according to estimated relevance of the conference participant speech to one or more topics of a conference segment; (c) determining the selected playback audio data by removing input talkspurts having an input talkspurt time duration that is below a threshold input talkspurt time duration; (d) a talkspurt filtering process of determining the selected playback audio data by removing a portion of input talkspurts having an input talkspurt time duration that is at or above the threshold input talkspurt time duration; or (e) an acoustic feature selection process of determining the selected playback audio data by selecting conference participant speech for playback according to at least one acoustic feature;
analyzing the selected playback audio data to determine conversational dynamics data that includes data indicating the frequency and duration of conference participant speech;
perceptual cost term indicating that conversational participants who speak frequently should be rendered at virtual conference participant positions that are relatively closer to a listener than conversational participants who speak less frequently;
applying an optimization technique to the spatial optimization cost function to determine a locally optimal solution;
providing, by the control system, the selected playback audio data to a speaker system; and
controlling post-conference playback of the processed selected playback audio data on the speaker system.
24.-26. (Canceled)
27. (Currently Amended) A non-transitory medium having software stored thereon, the software including instructions for controlling one or more devices for:
receiving audio data corresponding to a recording of a conference, the audio data including data corresponding to conference participant speech of each of a plurality of conference participants, 
selecting only a portion of the conference participant speech as selected playback audio data for post-conference playback, wherein the selecting involves one or more of: (a) a topic selection process of selecting conference participant speech as selected playback audio data according to estimated relevance of the conference participant speech to one or more conference topics; (b) a topic selection process of selecting conference participant speech as selected playback audio data according to estimated relevance of the conference participant speech to one or more topics of a conference segment; (c) determining the selected playback audio data by removing input talkspurts having an input talkspurt time duration that is below a threshold input 
analyzing the selected playback audio data to determine conversational dynamics data that includes one or more of: data indicating the frequency and duration of conference participant speech; data indicating instances of conference participant doubletalk during which at least two conference participants are speaking simultaneously; or data indicating instances of conference participant conversations;
applying the conversational dynamics data as one or more variables of a spatial optimization cost function of a vector describing a virtual conference participant position for each of the conference participants in a virtual acoustic space, wherein the spatial optimization cost function includes a perceptual cost term that tends to place conversational participants who speak frequently in front of a listener;
applying an optimization technique to the spatial optimization cost function to determine a locally optimal solution;



providing, via the interface system, the selected playback audio data to a speaker system; and
controlling post-conference playback of the selected playback audio data on the speaker system according to the locally optimal solution.
28. (Canceled) 

30. (Previously Presented) The method of claim 1, further comprising scheduling first and second instances of the selected playback audio data, which were recorded during a single time interval of the conference and which did not overlap in time during the conference, to be played back overlapped in time, or scheduling a third instance of the selected playback audio data, which was previously overlapped in time during the conference with a fourth instance of conference participant speech, to be played back further overlapped in time during the post-conference playback.
31. (Previously Presented) The apparatus of claim 23, wherein the control system is further capable of scheduling first and second instances of the selected playback audio data, which were recorded during a single time interval of the conference and which did not overlap in time during the conference, to be played back overlapped in time, or scheduling a third instance of the selected playback audio data, which was previously overlapped in time during the conference with a fourth instance of conference participant speech, to be played back further overlapped in time during the post-conference playback.



Response to Arguments
Amended independent claims 1, 23, and 27 overcome the 35 U.S.C. 103 rejections of claims 1-3, 6-9, 11-14, 17, 20, 22, 23, 27, and 29-31, and so they are withdrawn. 

Allowable Subject Matter
Claims 1-3, 6-9, 11-14, 17, 20, 22, 23, 27, and 29-31 are allowed.
The following is an examiner’s statement of reasons for allowance: 

The closest prior art to independent claims 1, 23, and 27 is Basu et al. (2008/0300872). Basu discloses receiving audio data corresponding to a recording of a conference, the audio data5 including data corresponding to conference participant speech of each of a plurality of conference participants (receiving and recording audio from a meeting dialog between a plurality of participants, Paragraphs 0030, 0034, and 0071); selecting only a portion of the conference participant speech as selected playback audio data (variable digest of information is selected for playback, Paragraphs 0045-0047, 0052, and 0059), wherein the selecting involves at least one selection process selected from a group of selection processes consisting of: (a) a topic selection process of selecting conference 10participant speech as selected playback audio data according to estimated relevance of the conference participant speech to one or more conference topics (relevance estimated for particular playback sections regarding the relevance of a portion to a particular topic discussed in a meeting conference, Paragraphs 0030, 0059, 0066-0068, and 0070-0073); (b) a topic selection process of selecting conference participant speech as selected playback audio data according to estimated relevance of the conference participant speech to one or more topics of a conference segment (relevance estimated for particular playback sections regarding the relevance of a portion to a particular topic, Paragraphs 0059, 0066-0068, and 0070-0073; keywords are additionally grouped according to speakers which may overlap in a segment, Paragraph 0054, 0057-0058, 0067, and 0072); (c) determining the selected playback audio by removing input talkspurts having an input talkspurt time duration that is below a threshold input15 talkspurt time duration (removing content segments considered to be too short, Paragraph 0071, which indicates the necessary presence of a threshold to deem what is considered to be "too short"); (d) a talkspurt filtering process of determining the selected playback audio by removing a portion of input talkspurts having an input talkspurt time duration that is at or above the threshold input talkspurt time duration; and (e) an acoustic feature selection process of determining the selected playback audio by selecting conference participant speech for playback according to at least one acoustic feature (relevance consideration using acoustic features, Paragraphs 0057 and 0067); and analyzing the audio data to determine conversation dynamics data that includes at least data indicating instances of conference participant doubletalk during which at least two conference participants are speaking simultaneously (turn recognition includes determining an overlap of two or more speakers, e.g. two or more speakers speaking concurrently, [0057]); providing the selected playback audio data to a speaker system for playback and controlling playback of the selected playback audio data on the speaker system (playback of a scalable audio digest of a meeting, Paragraphs 0030, 0045-0046, 0059, 0068, and 0073; speaker peripheral, Paragraph 0085). However, Basu does not disclose the limitations of amended independent claims 1, 23, and 27.

Boustead et al. (WO 2013/142731) discloses tending to place conversational participants who speak frequently in front of a listener (a dominant talker may be detected e.g. based on talker activity, page 33 lines 31-32, rotating the conference scene so that the dominant talker is rendered close to the midline 215 in front of the head of the listener 211, page 36 lines 7-20).

Khan et al. (2014/0111603) discloses detecting a dominant speaker by frequency of input audio, see [0057].

A combination or modification of Basu, Boustead, Khan, and the other prior art of record would not have resulted in the limitations of claims 1, 23, and 27, and therefore claims 1, 23, and 27 would not 

Dependent claims 2-3, 6-9, 11-14, 17, 20, 22, and 29-31 are allowable because they further limit allowable parent claims 1, 23, and 27. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 9:00 AM - 4:30 PM. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Dan Washburn can be reached on 571/272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571/270-6135. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Jesse S Pullias/
Primary Examiner, Art Unit 2657                                         03/18/21