DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 8-11, 13, 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Gorti in view of Lee et al (US 2015/0358583).

Claims 1 and 11, Gorti teaches a system and a computer-implemented method, the computer-implemented method comprising:
a)	receiving call data associated with a conference call; (Gorti: incoming signal, [0032]);
b)	analyzing the call data to identify a plurality of participants on the conference call; (Gorti: Fig. 1, identify speaker(s) by voice/speaker recognition, PIN, user name and/or phone number, [0032]);
	c)	determining whether two or more participants of the plurality of participants are speaking at a same time; (Gorti: two speakers to speak at the same time… , ranking priority is violated thus causing interruption, [0029]… due to simultaneous transmission of two audio signals,… [0041]);

d)	based upon a determination that the two or more participants are speaking at the same time, tracking the two or more participants to identify a first participant that continues speaking and a second participant that voluntarily yields to the first participant (While Gorti teaches the  current speaker continue to speak until he is done speaking (block 366 of Fig. 3A , [0052]; Placing the unsuccessful interrupter to the top of the queue and will be unblocked first, [0031] Fig. 3B, thus  interrupt handler 206 will block one signal based on priority ranking, [0041].  Gorti does not detail voluntary yielding as required by the amendment.  Lee, while keeps tracking of each channel, teaches “Yields, when a party stops speaking after the other party interrupts and allows the other party to continue speaking after interrupting, indicate high dominance for fewer yields and low dominance for more yields. Yields are measured by speech after an interruption and whether the party continues to speak, e.g. classified speech, or yields to the other party, e.g. classified silence, [0022]);

e)	displaying a queuing element on a graphical user interface (GUI) associated with the conference call to indicate that the second participant is in a queue to speak after the first participant has finished. (Gorti: the interrupt handler 206 of FIG. 2 reviews the queue 212 for blocked participants (block 372). If the queue 212 includes an identifier of a participant, then the participant next in the queue 212 is unblocked and the interrupt handler 206  206 of FIG. 2 causes the indicator 408 of FIG. 4 to convey that information to the unblocked participant (block 374), [0031, 0053]).
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Lee into the teaching of Gorti in some occasions an option for a party to allow another party to speak even in the event of an occurring interruption.
 
Claims 3 and 13. (Original) The computer-implemented method of claim 1, wherein analyzing the call data to identify a plurality of participants on the conference call (see claim1) includes: identifying one or more devices connected to the conference call (see claim 1); extracting an audio portion of the call data, (if any, of keywords were detected over a predetermined period of time, [0050]); processing the audio portion to associate utterances to the one or more devices; and mapping the one or more devices to the plurality of participants. (comparing an utterance/spoken words/phrases of a current speaker to a keyword associated with the conference call to determine a relevancy of the utterance, Abstract and [0030]).

Claims 8 and 18. (Original) The computer-implemented method of claim 1, wherein determining whether the two or more participants are speaking at the same time (see claim 1) includes: -4-Application No.: 17/083,241 Attorney Docket No.: 00212-0148-00000 determining whether different audio inputs from different devices are input at the same time (simultaneous transmission of two audio signals,… [0041]);
or determining that there are two or more voices being input at the same time, (Gorti: [0026]) and based upon a determination that the different audio inputs are input from the different devices at the same time or that there are two or more voices being input at the same time, determining two or more participants are speaking at the same time, (see claim 1).
Claims 9 and 19,  wherein determining whether two or more participants are speaking at the same time further (see claim 1) includes, before determining the two or more participants are speaking at the same time: determining whether an audio input of the different audio inputs is a background noise (Gorti: the environment (e.g., an office, car, or crowded place, [0009], ambient noise, [0032]); and based on a determination that the audio input of the different audio inputs is not background noise (by voice verification/recognition, [0032]), determining two or more participants are speaking at the same time, (see claim 1).
Claim 10. (Original) The computer-implemented method of claim 8, wherein determining whether two or more participants are speaking at the same time  further includes, before determining two or more participants are speaking at the same time (See claim 1): determining whether a voice of the two or more voices is a non-verbal utterance (Gorti: the environment (e.g., an office, car, or crowded place, [0009], ambient noise, [0032]); and based on a determination that the voice is not a non-verbal utterance (by voice verification/recognition, [0032]), determining two or more participants are speaking at the same time, (see claim 1).



Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Gorti in view of Lee and further in view of Pang.
Claims 4 and 14, Gorti details “queuing element includes an indicator that the first participant is currently speaking, and the computer-implemented method further includes: determining when the first participant has stopped speaking; (Gorti: 366 of Fig. 3A, “done speaking”. Fig. 4 showing queuing element… with current speaker and the next speaker in waiting (2nd of 5 waiting) till received the indication 408 displayed to the unblocked participant(s) may be an image, text (e.g., "You have the floor"), or a combination thereof, [0053]); and displaying an indicator that it is the second participant's turn to speak. (Gorti: received the indication 408 displayed to the unblocked participant(s) may be an image, text (e.g., "You have the floor"), or a combination thereof, [0053]).
Gorti teaches ‘blocking’ the audio signals of certain participants and while blocking here could also mean “muting”.  Examiner wishes to provide additional prior art explicitly describing “muting audio input of devices of all participants other than the second participant”. Pang teaches “while all meeting participants can be systematically muted when not in possession of the talking stick, [0030]).
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Pang into the teaching of Gorti to specifically enhancing the system with the capability to mute the unspeaking participants to avoid interruption till they are given the floor.

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Gorti in view of Lee and Pang and further in view of Sarris.
Claim 5 and 15. The computer-implemented method of claim 4, wherein the GUI, as displayed on a device associated with the second participant, includes an opt-out element that when selected by a user input removes the second participant from the queue.  (Gorti does not but Sarris teaches: leave queue button 2027, when pressed will remove the participant (301) from the waiting queue (2023), [0167].
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Sarris into the teaching of Gorti to specifically provide a waiting participant an opportunity to leave the waiting queue if he or she deems waiting is no longer necessary. 

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Gorti in view of Lee and further in view of Finkelstein – Hereinafter Fink).


Claims 2 and 12, Gorti teaches “analyzing the call data to identify a plurality of participants on the conference call includes: extracting an audio portion of the call data, see claim 1); Gorti does not teach “processing the audio portion to form a feature vector; -2-Application No.: 17/083,241 Attorney Docket No.: 00212-0148-00000 processing the feature vector through a neural network model to map utterances to one or more entities; and mapping the one or more entities to the plurality of participants”.
Fink teaches “ Audio input 130 in the form of natural language speech may be captured by microphone 24 and processed by audio processor 134 to create audio data. Audio data from the audio processor 134 may be transformed by feature extractor 136 into data for processing by a speech recognition engine 140 of the speech recognition program 120. In some examples, feature extractor 136 may identify portions of the audio data over a time interval that contain speech for processing, [0061-0063] and mapping, [0151]). 
Therefore, it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Fink into the teaching of Gorti for the purpose of utilizing various techniques including using mel-frequency cepstral coefficients (MFCCs), linear discriminant analysis, deep neural network techniques, etc. to process the audio data and generate feature vectors for comparing and matching pronunciations of speech components, such as phonemes, to particular words and/or phrases.

Claims 6, 7, 16 and 17  are rejected under 35 U.S.C. 103 as being unpatentable over Gorti in view of Lee and Pang and further in view of Finkelstein – Hereinafter Fink).
	Claims 6 and 16, wherein determining when the first participant has stopped speaking (Gorti: when first speaker is done with speaking, (block 366 of Fig. 3A); Pang: Whoever wished to speak after the leader would take the talking stick, [0015]) includes: 22Attorney Docket No.: 00212-0148-00000 
extracting an audio portion of the call data; (Gorti: [0050]).  Please note that Fink also teaches this limitation, “feature extractor 136 may identify portions of the audio data over a time interval that contain speech for processing, [0061]);
processing the audio portion to determine text by a speech-to-text function; (Fink: Using the feature extractor 136 and speech recognition engine 140, the speech recognition program 120 may process feature vectors 142 and other speech recognition data 148 to generate recognized text 66, [0068]).
processing the text to form text feature vectors, (Fink: The speech recognition engine 140 also may compare the feature vectors and other audio data with sequences of sounds to identify words and/or phrases that match the spoken sounds of the audio data, [0064,0068]); and 
processing the text feature vectors though text analysis (Fink: the parser 40 analyzes the text and confidence values to determine an intent of the user in speaking the received utterance, [0074]) or 

an end of a topic the first participant was discussing.  (Fink: such conversation and topic/session tracking may enable the system to assist a team that is working and speaking collaboratively to complete a task, [0189]).
Therefore, it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Fink into the teaching of Gorti for the purpose of utilizing various techniques including using mel-frequency cepstral coefficients (MFCCs), linear discriminant analysis, deep neural network techniques, etc. to process the audio data and generate feature vectors for comparing and matching pronunciations of speech components, such as phonemes, to particular words and/or phrases.

Claims 7 and 17, wherein determining when the first participant has stopped speaking (Gorti: block 366 of Fig. 3A; Pang: Whoever wished to speak after the leader would take the talking stick, [0015]) includes: 
extracting an audio portion of the call data; (Gorti: [0050];  Fink:  Audio data from the audio processor 134 may be transformed by feature extractor 136 into data for processing by a speech recognition engine 140 of the speech recognition program 120. In some examples, feature extractor 136 may identify portions of the audio data over a time interval that contain speech for processing, [0061]);
processing the audio portion to determine text by a speech-to-text function; (Fink: Using the feature extractor 136 and speech recognition engine 140, the speech recognition program 120 may process feature vectors 142 and other speech recognition data 148 to generate recognized text 66, [0068]), 
determining whether the text includes keywords (Fink: the voice listener 30 may comprise a keyword detection algorithm configured to identify a keyword or keyword phrase in the translated text. The voice listener 30 may assign a confidence value to text that indicates a likelihood that the text is a keyword or keyword phrase, [0344]); and based upon a determination that the text includes the keywords, determining the first participant has stopped speaking.  (See Gorti: block 366 of fig 3A; Pang: Whoever wished to speak after the leader would take the talking stick, [0015]).

Claim 20 is  rejected under 35 U.S.C. 103 as being unpatentable over Gorti in view of Lee and Pang and further in view of Fink.
 	20. (Original) A non-transitory computer-readable medium may store instructions that, when executed by a processor, cause the processor to perform a method, the method comprising: receiving call data associated with a conference call; analyzing the call data to identify a plurality of participants on the conference call; determining whether two or more participants are speaking at a same time; based upon a determination that the two or more participants are speaking at the same time, tracking the two or more participants to identify a first participant that continues speaking and a second participant that voluntarily yields to the first participant displaying a queuing element on a graphical user interface (GUI) associated with the conference call to indicate that the second participant is in a queue to speak after the first participant has finished; and determining when the first participant has stopped speaking by: extracting an audio portion of the call data; -9-Application No.: 17/083,241 Attorney Docket No.: 00212-0148-00000 processing the audio portion to determine text by a speech-to-text function; processing the text to form text feature vectors; processing the text feature vectors though text analysis or topic modeling neural network models to determine an end of the first participant speaking or an end of a topic the first participant was discussing; muting audio input of devices of all participants other than the second participant; and displaying an indicator that it is the second participant's turn to speak.
Claim 20 is a combination of claims 1, 4 and 6 as previously understood and demonstrated by the applicant.  For the sake of brevity, please see claims 1, 4 and 6.
	Therefore, it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Lee, Pang and Fink into the teaching of Gorti for the purpose of providing an orderly queued speaking turn so that everyone can have a fair chance to present his/her viewpoint by receiving the talking stick while everyone else has to stay muted/silent until he/she is call to speak one-at-a-time in an ordered fashion…. And also utilizing various techniques including using mel-frequency cepstral coefficients (MFCCs), linear discriminant analysis, deep neural network techniques, etc. to process the audio data and generate feature vectors for comparing and matching pronunciations of speech components, such as phonemes, to particular words and/or phrases.
			


Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUNG-HOANG J. NGUYEN whose telephone number is (571)270-1949. The examiner can normally be reached Reg. Sched. 6:00-3:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 570-272-7530. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PHUNG-HOANG J NGUYEN/Primary Examiner, Art Unit 2651