DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 01/13/2022, 02/10/2022, and 09/07/2022 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendment
The amendments filed on September 07, 2022 have been entered.
Claims 1-4 and 8-23 have been amended.

         Response to Arguments
Applicant’s arguments filed on September 07, 2022 have been considered but are not persuasive. 

Applicant’s argument:
Sinkov, whether considered singly or in combination with the other cited references, fails to describe, teach, or suggest each limitation recited by independent claims 1, 11, and 17. For example, Sinkov, whether considered singly or in combination with the other cited references, fails to describe, teach, or suggest "receiving, by a digital content management system, audio content corresponding to speech from a plurality of participants of a meeting from a first client device, "Page 14 of 18  "receiving, by the digital content management system, a time-based record of volume from a second client device, the time-based record of volume having data points indicating a level of volume for each point in time in the meeting," and "associating, by the digital content management system, the plurality of participants of the meeting with corresponding segments of the audio content by associating a given participant of the meeting with one or more corresponding segments of the audio content that include speech of the given participant based on synchronizing volumes of the time-based record of volume received from the second client device with the audio content received from the first client device," as recited by currently amended independent claim 1 and as similarly recited by currently amended independent claims 11 and 17. 
Neither reference teaches nor suggests the above-mentioned limitations. For instance, neither reference teaches nor suggests "receiving ... audio content . .. from a first client device," "receiving . .. a time-based record of volume from a second client device," and "synchronizing volumes of the time-based record of volume received from the second client device with the audio content received from the first client device." Rather, Sinkov discusses using each present device to record an audio stream-including volume level for each present speaker. Diamant fails to remedy the deficiencies of Sinkov.  
Further, neither reference teaches nor suggests "associating, by the digital content management system, the plurality of participants of the meeting with corresponding segments of the audio content by associating a given participant of the meeting with one or more corresponding segments of the audio content that include speech of the given participant based on synchronizing volumes of the time-based record of volume received from the second client device with the audio content received from the first client device." Rather, Sinkov discloses using relative volume levels captured at each device to associate speech with meeting participants. In other words, Sinkov suggests comparing the volumes captured at each device in order to identify which meeting participant was speaking at a given point in time. Again, Diamant fails to remedy the deficiencies of Sinkov.

Examiners’ response to argument:
The examiners respectfully disagree. Per Claim 1, Sinkov discloses receiving, by a digital content management system, audio content corresponding to speech from a plurality of participants of a meeting from a first client device (Parag. [0009] and Parag. [0017-0021]; (The art teaches determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on relative volume levels at each of the personal audio input devices. Recording audio information from a meeting may also include simultaneously recording audio input at the first one of the personal audio input audio devices and on the first channel and audio input at the second one of the personal audio input audio devices and on the second channel in response to the first and second meeting participants speaking at the same time. Recording audio information from a meeting may also include filtering the audio input at the first channel and the second channel to separate speech by the first participant from speech by the second participant. Filtering the audio input may be based on a distance related volume weakening coefficient, signal latency between the personal audio input devices, and/or ambient noise. In the event of double-talk when two or more speakers talk simultaneously for a period of time, the system may initially identify each speaker, and record double-talk on all principal smartphones owned by current speakers. After a double talk episode has ended, the system may attempt clearing each recorded fragment from double-talk by non-owners prior to placing it into the corresponding speaker channel. Such clearing may be facilitated by simultaneous processing of recorded fragments from all principal phones engaged in the double-talk. i.e., the speech from a plurality of the participants is received from a first client device)); receiving, by the digital content management system a time-based record of volume from a second client device, the time-based record of volume having data points indicating a level of volume for each point in time in the meeting (Parga. [0009]; (The art teaches that recording audio information from a meeting includes determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a first channel audio input at a first one of the personal audio input audio devices corresponding to the first particular speaker, identifying that a second particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a second channel audio input at a second one of the personal audio input audio devices corresponding to the second particular speaker. Fig. 2 shows that the signals are represented by and amplitude as a function of time (e.g., J2(t)+αH1(t−β)+A1 (John's channel); Parag. [0019])); associating, by the digital content management system, the plurality of participants of the meeting with corresponding segments of the audio content by associating a given participant of the meeting with one or more corresponding segments of the audio content that include speech of the given participant based on synchronizing volumes of the time-based record of volume received from the second client device with the audio content received from the first client device (Parag. [0037]; (The art teaches in FIG. 2  a schematic illustration 200 of storyline compilation, post-meeting annotation and voice-to-text features. Each of the meeting participants 110, 120, 130 has been an active speaker at some time during the meeting; accordingly, channels 210, 220, 230 corresponding to the participants 110, 120, 130 have been created by the system and kept as audio fragments 240 of active speakers. When a fragment of double-talk is identified, the fragment may be recorded on more than one principal device, as illustrated by double-talk fragments 250a, 250b. Even though audio signals recorded by the two principal devices represent the same conversation, the audio signals may not be identical, as explained elsewhere herein and illustrated by audio signal profile functions 260a, 260b. The system may attempt to resolve double talk fragment and retrieve individual fragments assignable to each active speaker channel by applying various filtering techniques 270, such as LMS filtering. If successful, separate fragments 280a, 280b may be added to their respective channels. Otherwise, double-talk recorded on each principal recording device may be added to the corresponding channel, all double-talk fragments may be cross-referenced and switchable between channels)). 

Same applies to independent claims 11 and 17.

	Claims 2-4, 8-10, 12-16, 18-21, and 23 are rejected by Sinkov in view of Diamant based on the amendments.

Claim 22 is rejected by Sinkov in view of Diamant and Gleim based on the amendment.






Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-4, 8-21, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Sinkov et al. (Pub. No. US 2019/0200121), hereinafter Sinkov; in view of Diamant (Pub. No. US 2019/0341050), hereinafter Diamant. 

Claim 1. 	Sinkov discloses a computer-implemented method comprising: 
receiving, by a digital content management system, audio content corresponding to speech from a plurality of participants of a meeting from a first client device (Parag. [0009] and Parag. [0017-0021]; (The art teaches determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on relative volume levels at each of the personal audio input devices. Recording audio information from a meeting may also include simultaneously recording audio input at the first one of the personal audio input audio devices and on the first channel and audio input at the second one of the personal audio input audio devices and on the second channel in response to the first and second meeting participants speaking at the same time. Recording audio information from a meeting may also include filtering the audio input at the first channel and the second channel to separate speech by the first participant from speech by the second participant. Filtering the audio input may be based on a distance related volume weakening coefficient, signal latency between the personal audio input devices, and/or ambient noise. In the event of double-talk when two or more speakers talk simultaneously for a period of time, the system may initially identify each speaker, and record double-talk on all principal smartphones owned by current speakers. After a double talk episode has ended, the system may attempt clearing each recorded fragment from double-talk by non-owners prior to placing it into the corresponding speaker channel. Such clearing may be facilitated by simultaneous processing of recorded fragments from all principal phones engaged in the double-talk. i.e., the speech from a plurality of the participants is received from a first client device));  
receiving, by the digital content management system a time-based record of volume from a second client device, the time-based record of volume having data points indicating a level of volume for each point in time in the meeting (Parga. [0009]; (The art teaches that recording audio information from a meeting includes determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a first channel audio input at a first one of the personal audio input audio devices corresponding to the first particular speaker, identifying that a second particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a second channel audio input at a second one of the personal audio input audio devices corresponding to the second particular speaker. Fig. 2 shows that the signals are represented by and amplitude as a function of time (e.g., J2(t)+αH1(t−β)+A1 (John's channel); Parag. [0019])); 
associating, by the digital content management system, the plurality of participants of the meeting with corresponding segments of the audio content by associating a given participant of the meeting with one or more corresponding segments of the audio content that include speech of the given participant based on synchronizing volumes of the time-based record of volume received from the second client device with the audio content received from the first client device (Parag. [0037]; (The art teaches in FIG. 2  a schematic illustration 200 of storyline compilation, post-meeting annotation and voice-to-text features. Each of the meeting participants 110, 120, 130 has been an active speaker at some time during the meeting; accordingly, channels 210, 220, 230 corresponding to the participants 110, 120, 130 have been created by the system and kept as audio fragments 240 of active speakers. When a fragment of double-talk is identified, the fragment may be recorded on more than one principal device, as illustrated by double-talk fragments 250a, 250b. Even though audio signals recorded by the two principal devices represent the same conversation, the audio signals may not be identical, as explained elsewhere herein and illustrated by audio signal profile functions 260a, 260b. The system may attempt to resolve double talk fragment and retrieve individual fragments assignable to each active speaker channel by applying various filtering techniques 270, such as LMS filtering. If successful, separate fragments 280a, 280b may be added to their respective channels. Otherwise, double-talk recorded on each principal recording device may be added to the corresponding channel, all double-talk fragments may be cross-referenced and switchable between channels)).
Sinkov doesn’t explicitly disclose generating, by the digital content management system, a digital meeting item for one or more participants from the plurality of participants based on associating the plurality of participants with the corresponding segments of the audio content.
However, Diamant discloses generating, by the digital content management system, a digital meeting item for one or more participants from the plurality of participants based on associating the plurality of participants with the corresponding segments of the audio content (Parag. [0052-0053], Parag. [0109], Parag. [0115], and Parag. [0138]; (The art teaches that FIG. 7 is a visual representation of an example output of diarization machine. In FIG. 6, a vertical axis is used to denote WHO (e.g., Bob) is speaking (i.e., identifying the user of the client device); the horizontal axis denotes WHEN (e.g., 30.01 s-34.87 s) that speaker is speaking; and the depth axis denotes from WHERE (e.g., 23°) that speaker is speaking. Diarization machine 132 uses this WHO/WHEN/WHERE information to label corresponding segments 604 of the audio signal(s) 606 under analysis with labels 608. The segments 604 and/or corresponding labels may be output from the diarization machine 132 in any suitable format. The output effectively associates speech with a particular speaker (i.e., identifying the user of the client device) during a conversation among N speakers, and allows the audio signal corresponding to each speech utterance (with WHO/WHEN/WHERE labeling/metadata) to be used for myriad downstream operations. One nonlimiting downstream operation is conversation transcription. FIG. 1B teaches a computerized conference assistant 106 may include a speech recognition machine 130. As shown in FIG. 8, the speech recognition machine 130 may be configured to translate an audio signal of recorded speech (e.g., signals 112, beamformed signal 150, signal 606, and/or segments 604) into text 800. The art also teaches that in some examples, transcribed speech and/or speaker identity information may be gathered by computerized intelligent assistant 1300 in real time, in order to build the transcript in real time, and/or in order to provide notifications to conference participants about the transcribed speech in real time. In some examples, computerized intelligent assistant 1300 may be configured, for a stream of speech audio captured by a microphone, to identify a current speaker and to analyze the speech audio in order to transcribe speech text, substantially in parallel and/or in real time, so that speaker identity and transcribed speech text may be independently available. Accordingly, computerized intelligent assistant 1300 may be able to provide notifications to the conference participants in real time (e.g., for display at companion devices) indicating that another conference participant is currently speaking and including transcribed speech of the other conference participant, even before the other conference participant has finished speaking. The art also teaches that the transcript may be analyzed using any suitable machine learning (ML) and/or artificial intelligence (AI) techniques, wherein such analysis may include, for raw audio observed during a conference, recognizing text corresponding to the raw audio, and recognizing one or more salient features of the text and/or raw audio. Non-limiting examples of salient features that may be recognized by ML and/or AI techniques include 1) an intent (e.g., an intended task of a conference participant), 2) a context (e.g., a task currently being performed by a conference participant), 3) a topic and/or 4) an action item or commitment (e.g., a task that a conference participant promises to perform))). 
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]).

Claim 2. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 1, 
Sinkov further discloses the computer-implemented method further comprising analyzing the time-based record of volume received from the second client device to determine a primary speaking volume, wherein associating the plurality of participants of the meeting with the corresponding segments of the audio content comprises determining that a first participant is associated with the primary speaking volume (Parag. [0009] and Parag. [0017-0021]; (The art teaches determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on relative volume levels at each of the personal audio input devices. Recording audio information from a meeting may also include simultaneously recording audio input (i.e., first set of audio data) at the first one of the personal audio input audio devices and on the first channel and audio input at the second one of the personal audio input audio devices and on the second channel in response to the first and second meeting participants speaking at the same time. Recording audio information from a meeting may also include filtering the audio input at the first channel and the second channel to separate speech by the first participant from speech by the second participant. Filtering the audio input may be based on a distance related volume weakening coefficient, signal latency between the personal audio input devices, and/or ambient noise. In the event of double-talk when two or more speakers talk simultaneously for a period of time, the system may initially identify each speaker, and record double-talk on all principal smartphones owned by current speakers)).
Claim 3. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 2, 
Sinkov further discloses wherein determining the primary speaking volume comprises determining the primary speaking volume based on comparing the volumes of the time-based record of volume received from the second client device (Parag. [0009] and Parag. [0018-0021]; (The art teaches determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on relative volume levels at each of the personal audio input devices. Recording audio information from a meeting may also include simultaneously recording audio input at the first one of the personal audio input audio devices and on the first channel and audio input at the second one of the personal audio input audio devices and on the second channel in response to the first and second meeting participants speaking at the same time. Recording audio information from a meeting may also include filtering the audio input at the first channel and the second channel to separate speech by the first participant from speech by the second participant. Filtering the audio input may be based on a distance related volume weakening coefficient, signal latency between the personal audio input devices, and/or ambient noise. In the event of double-talk when two or more speakers talk simultaneously for a period of time, the system may initially identify each speaker, and record double-talk on all principal smartphones owned by current speakers. The availability of two symmetric cross-recordings may facilitate assessing the coefficients (after an initial cancellation of ambient noises) and filtering out the weaker components using, for example, echo cancellation technique. Even if the double-talk suppression process has not fully succeeded, each channel unambiguously represents a corresponding speaker and any mix of speaker voices may be instantly identified in a full record by referring to the simultaneous recording by other principal phone(s), i.e. by switching channels of simultaneous speakers)).

Claim 4. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 1,   
Sinkov further discloses the computer-implemented method further comprising receiving, from the first client device or the second client device, voiceprint data or inflection data corresponding to the speech from the plurality of participants of the meeting, wherein associating the plurality of participants of the meeting with the corresponding segments of the audio content comprises associating the plurality of participants of the meeting with the corresponding segments of the audio content based on the voiceprint data or the inflection data (Parag. [0037]; (The art teaches in FIG. 2  a schematic illustration 200 of storyline compilation, post-meeting annotation and voice-to-text features. Each of the meeting participants 110, 120, 130 has been an active speaker at some time during the meeting; accordingly, channels 210, 220, 230 corresponding to the participants 110, 120, 130 have been created by the system and kept as audio fragments 240 of active speakers. When a fragment of double-talk is identified, the fragment may be recorded on more than one principal device, as illustrated by double-talk fragments 250a, 250b. Even though audio signals recorded by the two principal devices represent the same conversation, the audio signals may not be identical, as explained elsewhere herein and illustrated by audio signal profile functions 260a, 260b. The system may attempt to resolve double talk fragment and retrieve individual fragments assignable to each active speaker channel by applying various filtering techniques 270, such as LMS filtering. If successful, separate fragments 280a, 280b may be added to their respective channels. Otherwise, double-talk recorded on each principal recording device may be added to the corresponding channel, all double-talk fragments may be cross-referenced and switchable between channels)). 
Page 3 of 17
Claim 8. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 1,  
Sinkov doen’t explicitly disclose the computer-implemented method further comprising: generating an identification tag corresponding a first participant from the plurality of participants; and modifying a meeting transcript comprising a text representation of the audio content by associating the identification tag with a corresponding segment of the audio content.
However, Diamant discloses generating an identification tag corresponding a first participant from the plurality of participants; and modifying a meeting transcript comprising a text representation of the audio content by associating the identification tag with a corresponding segment of the audio content (Parag. [0052-0053], Parag. [0109], Parag. [0115], and Parag. [0138]; (The art teaches that FIG. 7 is a visual representation of an example output of diarization machine. In FIG. 6, a vertical axis is used to denote WHO (e.g., Bob) is speaking; the horizontal axis denotes WHEN (e.g., 30.01 s-34.87 s) that speaker is speaking; and the depth axis denotes from WHERE (e.g., 23°) that speaker is speaking. Diarization machine 132 uses this WHO/WHEN/WHERE information to label corresponding segments 604 of the audio signal(s) 606 under analysis with labels 608. The segments 604 and/or corresponding labels may be output from the diarization machine 132 in any suitable format. The output effectively associates speech with a particular speaker during a conversation among N speakers, and allows the audio signal corresponding to each speech utterance (with WHO/WHEN/WHERE labeling/metadata) to be used for myriad downstream operations. One nonlimiting downstream operation is conversation transcription. The art also teaches that the transcript may be analyzed using any suitable machine learning (ML) and/or artificial intelligence (AI) techniques, wherein such analysis may include, for raw audio observed during a conference, recognizing text corresponding to the raw audio, and recognizing one or more salient features of the text and/or raw audio. Non-limiting examples of salient features that may be recognized by ML and/or AI techniques include 1) an intent (e.g., an intended task of a conference participant), 2) a context (e.g., a task currently being performed by a conference participant), 3) a topic and/or 4) an action item or commitment (e.g., a task that a conference participant promises to perform) (i.e., meeting item), Also, in an example, the art teaches that the machine learning classifier may be configured to receive any other suitable transcript data automatically recorded at 211, e.g., transcribed speech audio in the form of text. The transcription machine may be configured to analyze the transcript to detect words having a predefined sentiment (e.g., positive, negative, “happy”, or any other suitable sentiment), in order to present a sentiment analysis summary at a companion device of a conference participant (i.e., first user), indicating a frequency of utterance of words having the predefined sentiment)).  
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]).

Claim 9. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 1,  
Sinkov doesn’t explicitly disclose wherein generating the digital meeting item for the one or more participants comprises: generating, for a participant of the meeting, an action item prompt to complete an action item; and providing the action item prompt for display on a client device of the participant.
However, Diamant discloses wherein generating the digital meeting item for the one or more participants comprises: generating, for a participant of the meeting, an action item prompt to complete an action item; and providing the action item prompt for display on a client device of the participant (Parag. [0109-0111]; (The art teaches that reviewable transcript may be analyzed using any suitable machine learning (ML) and/or artificial intelligence (AI) techniques, wherein such analysis may include, for raw audio observed during a conference, recognizing text corresponding to the raw audio, and recognizing one or more salient features of the text and/or raw audio. Non-limiting examples of salient features that may be recognized by ML and/or AI techniques include 1) an intent (e.g., an intended task of a conference participant), 2) a context (e.g., a task currently being performed by a conference participant), 3) a topic and/or 4) an action item or commitment (e.g., a task that a conference participant promises to perform). The art teaches that a reviewable transcript may be provided to other individuals instead of or in addition to providing the reviewable transcript to conference participants (i.e., including the first client device). In an example, a reviewable transcript may be provided to a supervisor, colleague, or employee of a conference participant. In an example, the conference leader or any other suitable member of an organization associated with the conference may restrict sharing of the reviewable transcript (e.g., so that the conference leader's permission is needed for sharing, or so that the reviewable transcript can only be shared within the organization, in accordance with security and/or privacy policies of the organization). The reviewable transcript may be shared in an unabridged and/or edited form, e.g., the conference leader may initially review the reviewable transcript in order to redact sensitive information, before sharing the redacted transcript with any suitable individuals. The reviewable transcript may be filtered to focus on content of interest (e.g., name mentions and action items) for any individual receiving the reviewable transcript. i.e., the action item prompt is generated and displayed)). 
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]).

Claim 10. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 1,  
Sinkov doesn’t explicitly disclose the computer-implemented method further comprising generating a transcript of the audio content using associations between the plurality of participants and the corresponding segments of the audio content.
However, Diamant discloses generating a transcript of the audio content using associations between the plurality of participants and the corresponding segments of the audio content (Parag. [0060]; (The art teaches that Labeled and/or partially labelled audio segments may be used to not only determine which of a plurality of N speakers is responsible for an utterance, but also translate the utterance into a textural representation for downstream operations, such as transcription)). Page 6 of 17  
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]). 

Claim 11. 	Sinkov discloses a non-transitory computer readable storage medium comprising instructions that, when executed by at least one processor (Parag. [0009-0010]), cause a computing device to: 
receive audio content corresponding to speech from a plurality of participants of a meeting from a first client device (Parag. [0009] and Parag. [0017-0021]; (The art teaches determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on relative volume levels at each of the personal audio input devices. Recording audio information from a meeting may also include simultaneously recording audio input at the first one of the personal audio input audio devices and on the first channel and audio input at the second one of the personal audio input audio devices and on the second channel in response to the first and second meeting participants speaking at the same time. Recording audio information from a meeting may also include filtering the audio input at the first channel and the second channel to separate speech by the first participant from speech by the second participant. Filtering the audio input may be based on a distance related volume weakening coefficient, signal latency between the personal audio input devices, and/or ambient noise. In the event of double-talk when two or more speakers talk simultaneously for a period of time, the system may initially identify each speaker, and record double-talk on all principal smartphones owned by current speakers. After a double talk episode has ended, the system may attempt clearing each recorded fragment from double-talk by non-owners prior to placing it into the corresponding speaker channel. Such clearing may be facilitated by simultaneous processing of recorded fragments from all principal phones engaged in the double-talk. i.e., the speech from a plurality of the participants is received from a first client device)); 
receive a time-based record of volume from a second client device, the time-based record of volume having data points indicating a level of volume for each point in time in the meeting (Parga. [0009]; (The art teaches that recording audio information from a meeting includes determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a first channel audio input at a first one of the personal audio input audio devices corresponding to the first particular speaker, identifying that a second particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a second channel audio input at a second one of the personal audio input audio devices corresponding to the second particular speaker. Fig. 2 shows that the signals are represented by and amplitude as a function of time (e.g., J2(t)+αH1(t−β)+A1 (John's channel); Parag. [0019]));  
associate the plurality of participants of the meeting with corresponding segments of the audio content by associating a given participant of the meeting with one or more corresponding segments of the audio content that include speech of the given participant based on synchronizing volumes of the time-based record of volume received from the second client device with the audio content received from the first client device (Parag. [0037]; (The art teaches in FIG. 2  a schematic illustration 200 of storyline compilation, post-meeting annotation and voice-to-text features. Each of the meeting participants 110, 120, 130 has been an active speaker at some time during the meeting; accordingly, channels 210, 220, 230 corresponding to the participants 110, 120, 130 have been created by the system and kept as audio fragments 240 of active speakers. When a fragment of double-talk is identified, the fragment may be recorded on more than one principal device, as illustrated by double-talk fragments 250a, 250b. Even though audio signals recorded by the two principal devices represent the same conversation, the audio signals may not be identical, as explained elsewhere herein and illustrated by audio signal profile functions 260a, 260b. The system may attempt to resolve double talk fragment and retrieve individual fragments assignable to each active speaker channel by applying various filtering techniques 270, such as LMS filtering. If successful, separate fragments 280a, 280b may be added to their respective channels. Otherwise, double-talk recorded on each principal recording device may be added to the corresponding channel, all double-talk fragments may be cross-referenced and switchable between channels)).
Sinkov doesn’t explicitly disclose generate a digital meeting item for one or more participants from the plurality of participants based on associating the plurality of participants with the corresponding segments of the audio content.
However, Diamant discloses generate a digital meeting item for one or more participants from the plurality of participants based on associating the plurality of participants with the corresponding segments of the audio content (Parag. [0052-0053], Parag. [0109], Parag. [0115], and Parag. [0138]; (The art teaches that FIG. 7 is a visual representation of an example output of diarization machine. In FIG. 6, a vertical axis is used to denote WHO (e.g., Bob) is speaking (i.e., identifying the user of the client device); the horizontal axis denotes WHEN (e.g., 30.01 s-34.87 s) that speaker is speaking; and the depth axis denotes from WHERE (e.g., 23°) that speaker is speaking. Diarization machine 132 uses this WHO/WHEN/WHERE information to label corresponding segments 604 of the audio signal(s) 606 under analysis with labels 608. The segments 604 and/or corresponding labels may be output from the diarization machine 132 in any suitable format. The output effectively associates speech with a particular speaker (i.e., identifying the user of the client device) during a conversation among N speakers, and allows the audio signal corresponding to each speech utterance (with WHO/WHEN/WHERE labeling/metadata) to be used for myriad downstream operations. One nonlimiting downstream operation is conversation transcription. FIG. 1B teaches a computerized conference assistant 106 may include a speech recognition machine 130. As shown in FIG. 8, the speech recognition machine 130 may be configured to translate an audio signal of recorded speech (e.g., signals 112, beamformed signal 150, signal 606, and/or segments 604) into text 800. The art also teaches that in some examples, transcribed speech and/or speaker identity information may be gathered by computerized intelligent assistant 1300 in real time, in order to build the transcript in real time, and/or in order to provide notifications to conference participants about the transcribed speech in real time. In some examples, computerized intelligent assistant 1300 may be configured, for a stream of speech audio captured by a microphone, to identify a current speaker and to analyze the speech audio in order to transcribe speech text, substantially in parallel and/or in real time, so that speaker identity and transcribed speech text may be independently available. Accordingly, computerized intelligent assistant 1300 may be able to provide notifications to the conference participants in real time (e.g., for display at companion devices) indicating that another conference participant is currently speaking and including transcribed speech of the other conference participant, even before the other conference participant has finished speaking. The art also teaches that the transcript may be analyzed using any suitable machine learning (ML) and/or artificial intelligence (AI) techniques, wherein such analysis may include, for raw audio observed during a conference, recognizing text corresponding to the raw audio, and recognizing one or more salient features of the text and/or raw audio. Non-limiting examples of salient features that may be recognized by ML and/or AI techniques include 1) an intent (e.g., an intended task of a conference participant), 2) a context (e.g., a task currently being performed by a conference participant), 3) a topic and/or 4) an action item or commitment (e.g., a task that a conference participant promises to perform))). 
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]).  

Claim 12 is taught by Sinkov in view of Diamant as described for claim 2. 
  
Claim 13. 	Sinkov in view of Diamant discloses the non-transitory computer readable storage medium of claim 11, 
Sinkov doesn’t explicitly disclose the non-transitory computer readable storage medium further comprising instructions that, when executed by the at least one processor, cause the computing device to: track participation data corresponding to a first participant from the plurality of participants based on at least one segment of the audio content associated with the first participant; and generate a participation report based on the participation data. Page 8 of 17
However, Diamant discloses instructions that, when executed by the at least one processor, cause the computing device to: track participation data corresponding to a first participant from the plurality of participants based on at least one segment of the audio content associated with the first participant; and generate a participation report based on the participation data (Parag. [0004], Parag. [0023-0024], Parag. [0082], Parag. [0109], and Parag. [0138]; (The art teaches that the conference transcript can be used by participants for reviewing various multi-modal interactions and other events of interest that happened in the conference. The conference transcript can be analyzed to provide conference participants with feedback regarding their own participation in the conference, other participants, and team/organizational trends. The art also teaches that the transcript may be analyzed using any suitable machine learning (ML) and/or artificial intelligence (AI) techniques, wherein such analysis may include, for raw audio observed during a conference, recognizing text corresponding to the raw audio, and recognizing one or more salient features of the text and/or raw audio. Non-limiting examples of salient features that may be recognized by ML and/or AI techniques include 1) an intent (e.g., an intended task of a conference participant), 2) a context (e.g., a task currently being performed by a conference participant), 3) a topic and/or 4) an action item or commitment (e.g., a task that a conference participant promises to perform) (i.e., meeting item). Also, in an example, the art teaches that the machine learning classifier may be configured to receive any other suitable transcript data automatically recorded at 211, e.g., transcribed speech audio in the form of text. The transcription machine may be configured to analyze the transcript to detect words having a predefined sentiment (e.g., positive, negative, “happy”, or any other suitable sentiment), in order to present a sentiment analysis summary at a companion device of a conference participant, indicating a frequency of utterance of words having the predefined sentiment)))). 
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]). 

Claim 14. 	Sinkov in view of Diamant discloses the non-transitory computer readable storage medium of claim 13,  
Sinkov doesn’t explicitly disclose wherein the participation data includes at least one of a length of time spoken by the first participant or a number of interruptions by the first participant. 
However, Diamant discloses wherein the participation data includes at least one of a length of time spoken by the first participant or a number of interruptions by the first participant (Parag. [0052-0053], Parag. [0109], Parag. [0115], and Parag. [0138]; (The art teaches that FIG. 7 is a visual representation of an example output of diarization machine. In FIG. 6, a vertical axis is used to denote WHO (e.g., Bob) is speaking; the horizontal axis denotes WHEN (e.g., 30.01 s-34.87 s) that speaker is speaking; and the depth axis denotes from WHERE (e.g., 23°) that speaker is speaking)).  
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]).

Claim 15. 	Sinkov in view of Diamant discloses the non-transitory computer readable storage medium of claim 13,  
Sinkov doesn’t explicitly disclose the non-transitory computer readable storage medium further comprising instructions that, when executed by the at least one processor, cause the computing device to provide the participation report for display on a client device associated with the first participant. 
However, Diamant discloses instructions that, when executed by the at least one processor, cause the computing device to provide the participation report for display on a client device associated with the first participant (Parag. [0052-0053], Parag. [0109], Parag. [0115], and Parag. [0138]; (The art teaches that FIG. 7 is a visual representation of an example output of diarization machine. In FIG. 6, a vertical axis is used to denote WHO (e.g., Bob) is speaking; the horizontal axis denotes WHEN (e.g., 30.01 s-34.87 s) that speaker is speaking; and the depth axis denotes from WHERE (e.g., 23°) that speaker is speaking. Diarization machine 132 uses this WHO/WHEN/WHERE information to label corresponding segments 604 of the audio signal(s) 606 under analysis with labels 608. The segments 604 and/or corresponding labels may be output from the diarization machine 132 in any suitable format. The output effectively associates speech with a particular speaker during a conversation among N speakers, and allows the audio signal corresponding to each speech utterance (with WHO/WHEN/WHERE labeling/metadata) to be used for myriad downstream operations. One nonlimiting downstream operation is conversation transcription. The art also teaches that the transcript may be analyzed using any suitable machine learning (ML) and/or artificial intelligence (AI) techniques, wherein such analysis may include, for raw audio observed during a conference, recognizing text corresponding to the raw audio, and recognizing one or more salient features of the text and/or raw audio. Non-limiting examples of salient features that may be recognized by ML and/or AI techniques include 1) an intent (e.g., an intended task of a conference participant), 2) a context (e.g., a task currently being performed by a conference participant), 3) a topic and/or 4) an action item or commitment (e.g., a task that a conference participant promises to perform) (i.e., meeting item), Also, in an example, the art teaches that the machine learning classifier may be configured to receive any other suitable transcript data automatically recorded at 211, e.g., transcribed speech audio in the form of text. The transcription machine may be configured to analyze the transcript to detect words having a predefined sentiment (e.g., positive, negative, “happy”, or any other suitable sentiment), in order to present a sentiment analysis (i.e., displayed for the user) summary at a companion device of a conference participant (i.e., first user), indicating a frequency of utterance of words having the predefined sentiment)).  
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]).

Claim 16 is taught by Sinkov in view of Diamant as described for claim 4. 

Claim 17. 	Sinkov discloses a system comprising: at least one processor; and a non-transitory computer readable storage medium comprising instructions that, when executed by the at least one processor (Parag. [0009-0010]), cause the system to:  
receive audio content corresponding to speech from a plurality of participants of a meeting from a first client device (Parag. [0009] and Parag. [0017-0021]; (The art teaches determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on relative volume levels at each of the personal audio input devices. Recording audio information from a meeting may also include simultaneously recording audio input at the first one of the personal audio input audio devices and on the first channel and audio input at the second one of the personal audio input audio devices and on the second channel in response to the first and second meeting participants speaking at the same time. Recording audio information from a meeting may also include filtering the audio input at the first channel and the second channel to separate speech by the first participant from speech by the second participant. Filtering the audio input may be based on a distance related volume weakening coefficient, signal latency between the personal audio input devices, and/or ambient noise. In the event of double-talk when two or more speakers talk simultaneously for a period of time, the system may initially identify each speaker, and record double-talk on all principal smartphones owned by current speakers. After a double talk episode has ended, the system may attempt clearing each recorded fragment from double-talk by non-owners prior to placing it into the corresponding speaker channel. Such clearing may be facilitated by simultaneous processing of recorded fragments from all principal phones engaged in the double-talk. i.e., the speech from a plurality of the participants is received from a first client device));   
receive a time-based record of volume from a second client device, the time-based record of volume having data points indicating a level of volume for each point in time in the meeting (Parga. [0009]; (The art teaches that recording audio information from a meeting includes determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a first channel audio input at a first one of the personal audio input audio devices corresponding to the first particular speaker, identifying that a second particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a second channel audio input at a second one of the personal audio input audio devices corresponding to the second particular speaker. Fig. 2 shows that the signals are represented by and amplitude as a function of time (e.g., J2(t)+αH1(t−β)+A1 (John's channel); Parag. [0019])); 
associate the plurality of participants of the meeting with corresponding segments of the audio content by associating a given participant of the meeting with one or more corresponding segments of the audio content that include speech of the given participant based on synchronizing volumes of the time-based record of volume received from the second client device with the audio content received from the first client device (Parag. [0037]; (The art teaches in FIG. 2  a schematic illustration 200 of storyline compilation, post-meeting annotation and voice-to-text features. Each of the meeting participants 110, 120, 130 has been an active speaker at some time during the meeting; accordingly, channels 210, 220, 230 corresponding to the participants 110, 120, 130 have been created by the system and kept as audio fragments 240 of active speakers. When a fragment of double-talk is identified, the fragment may be recorded on more than one principal device, as illustrated by double-talk fragments 250a, 250b. Even though audio signals recorded by the two principal devices represent the same conversation, the audio signals may not be identical, as explained elsewhere herein and illustrated by audio signal profile functions 260a, 260b. The system may attempt to resolve double talk fragment and retrieve individual fragments assignable to each active speaker channel by applying various filtering techniques 270, such as LMS filtering. If successful, separate fragments 280a, 280b may be added to their respective channels. Otherwise, double-talk recorded on each principal recording device may be added to the corresponding channel, all double-talk fragments may be cross-referenced and switchable between channels)). 
Sinkov doesn’t explicitly disclose generate a digital meeting item for one or more participants from the plurality of participants based on associating the plurality of participants with the corresponding segments of the audio content. 
However, Diamant discloses generate a digital meeting item for one or more participants from the plurality of participants based on associating the plurality of participants with the corresponding segments of the audio content (Parag. [0052-0053], Parag. [0109], Parag. [0115], and Parag. [0138]; (The art teaches that FIG. 7 is a visual representation of an example output of diarization machine. In FIG. 6, a vertical axis is used to denote WHO (e.g., Bob) is speaking (i.e., identifying the user of the client device); the horizontal axis denotes WHEN (e.g., 30.01 s-34.87 s) that speaker is speaking; and the depth axis denotes from WHERE (e.g., 23°) that speaker is speaking. Diarization machine 132 uses this WHO/WHEN/WHERE information to label corresponding segments 604 of the audio signal(s) 606 under analysis with labels 608. The segments 604 and/or corresponding labels may be output from the diarization machine 132 in any suitable format. The output effectively associates speech with a particular speaker (i.e., identifying the user of the client device) during a conversation among N speakers, and allows the audio signal corresponding to each speech utterance (with WHO/WHEN/WHERE labeling/metadata) to be used for myriad downstream operations. One nonlimiting downstream operation is conversation transcription. FIG. 1B teaches a computerized conference assistant 106 may include a speech recognition machine 130. As shown in FIG. 8, the speech recognition machine 130 may be configured to translate an audio signal of recorded speech (e.g., signals 112, beamformed signal 150, signal 606, and/or segments 604) into text 800. The art also teaches that in some examples, transcribed speech and/or speaker identity information may be gathered by computerized intelligent assistant 1300 in real time, in order to build the transcript in real time, and/or in order to provide notifications to conference participants about the transcribed speech in real time. In some examples, computerized intelligent assistant 1300 may be configured, for a stream of speech audio captured by a microphone, to identify a current speaker and to analyze the speech audio in order to transcribe speech text, substantially in parallel and/or in real time, so that speaker identity and transcribed speech text may be independently available. Accordingly, computerized intelligent assistant 1300 may be able to provide notifications to the conference participants in real time (e.g., for display at companion devices) indicating that another conference participant is currently speaking and including transcribed speech of the other conference participant, even before the other conference participant has finished speaking. The art also teaches that the transcript may be analyzed using any suitable machine learning (ML) and/or artificial intelligence (AI) techniques, wherein such analysis may include, for raw audio observed during a conference, recognizing text corresponding to the raw audio, and recognizing one or more salient features of the text and/or raw audio. Non-limiting examples of salient features that may be recognized by ML and/or AI techniques include 1) an intent (e.g., an intended task of a conference participant), 2) a context (e.g., a task currently being performed by a conference participant), 3) a topic and/or 4) an action item or commitment (e.g., a task that a conference participant promises to perform))). 
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]).  
 
Claim 18. 	Sinkov in view of Diamant discloses the system of claim 17,  
Sinkov doesn’t explicitly disclose the system further comprising instructions that, when executed by the at least one processor, cause the system to generate a transcript of the audio content using associations between the plurality of participants and the corresponding segments of the audio content. 
However, Diamant discloses instructions that, when executed by the at least one processor, cause the system to generate a transcript of the audio content using associations between the plurality of participants and the corresponding segments of the audio content (Parag. [0060]; (The art teaches that Labeled and/or partially labelled audio segments may be used to not only determine which of a plurality of N speakers is responsible for an utterance, but also translate the utterance into a textural representation for downstream operations, such as transcription)). Page 6 of 17  
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify Sinkov to incorporate the teaching of Diamant. This would be convenient to coordinate the conference, by providing a transcript of the conference to conference participants for subsequent review, tracking arrivals and departures of conference participants, providing cues to conference participants during the conference, and/or analyzing the information in order to summarize one or more aspects of the conference for subsequent review (Parag. [0024]).   

Claim 19. 	Sinkov in view of Diamant discloses the system of claim 17, 
Sinkov further discloses wherein the instructions, when executed by the at least one processor, causes the system to receive the audio content from the first client device by receiving the audio content from a client device associated with a first participant from the plurality of participants of the meeting (Parag. [0009] and Parag. [0017-0021]; (The art teaches determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on relative volume levels at each of the personal audio input devices. Recording audio information from a meeting may also include simultaneously recording audio input (i.e., first set of audio data) at the first one of the personal audio input audio devices and on the first channel and audio input at the second one of the personal audio input audio devices and on the second channel in response to the first and second meeting participants speaking at the same time. Recording audio information from a meeting may also include filtering the audio input at the first channel and the second channel to separate speech by the first participant from speech by the second participant. Filtering the audio input may be based on a distance related volume weakening coefficient, signal latency between the personal audio input devices, and/or ambient noise. In the event of double-talk when two or more speakers talk simultaneously for a period of time, the system may initially identify each speaker, and record double-talk on all principal smartphones owned by current speakers. After a double talk episode has ended, the system may attempt clearing each recorded fragment from double-talk by non-owners prior to placing it into the corresponding speaker channel. Such clearing may be facilitated by simultaneous processing of recorded fragments from all principal phones engaged in the double-talk)).   
 


Claim 20. 	Sinkov in view of Diamant discloses the system of claim 17,  
Sinkov further discloses further comprising instructions that, when executed by the at least one processor, cause the system to analyze the time-based record of volume received from the second client device to determine a primary speaking volume (Parag. [0009] and Parag. [0017-0021]; (The art teaches determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on relative volume levels at each of the personal audio input devices. Recording audio information from a meeting may also include simultaneously recording audio input (i.e., first set of audio data) at the first one of the personal audio input audio devices and on the first channel and audio input at the second one of the personal audio input audio devices and on the second channel in response to the first and second meeting participants speaking at the same time. Recording audio information from a meeting may also include filtering the audio input at the first channel and the second channel to separate speech by the first participant from speech by the second participant. Filtering the audio input may be based on a distance related volume weakening coefficient, signal latency between the personal audio input devices, and/or ambient noise. In the event of double-talk when two or more speakers talk simultaneously for a period of time, the system may initially identify each speaker, and record double-talk on all principal smartphones owned by current speakers)).

Claim 21. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 1,  
Sinkov further discloses wherein receiving the time-based record of volume having the data points indicating the level of volume for each point in time in the meeting comprises receiving the time-based record of volume having the data points indicating at least a first level of volume and a second level of volume throughout the meeting (Parga. [0009]; (The art teaches that recording audio information from a meeting includes determining which of a plurality of specific personal audio input audio devices correspond to which specific meeting participants, measuring volume levels at each of the personal audio input devices in response to each of the meeting participants speaking, identifying that a first particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a first channel audio input at a first one of the personal audio input audio devices corresponding to the first particular speaker, identifying that a second particular one of the participants is speaking based on stored voice profiles and/or relative volume levels at each of the personal audio input devices, recording on a second channel audio input at a second one of the personal audio input audio devices corresponding to the second particular speaker. Fig. 2 shows that the signals are represented by and amplitude as a function of time (e.g., J2(t)+αH1(t−β)+A1 (John's channel); Parag. [0019])).

Claim 23. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 21,  
Sinkov further discloses wherein the data points of the time-based record of volume further indicate a third level of volume corresponding to a third distance from a third participant to the second client device (Parag. [0037]; (The art teaches in FIG. 2  a schematic illustration 200 of storyline compilation, post-meeting annotation and voice-to-text features. Each of the meeting participants 110, 120, 130 has been an active speaker at some time during the meeting; accordingly, channels 210, 220, 230 corresponding to the participants 110, 120, 130 have been created by the system and kept as audio fragments 240 of active speakers. When a fragment of double-talk is identified, the fragment may be recorded on more than one principal device, as illustrated by double-talk fragments 250a, 250b. Even though audio signals recorded by the two principal devices represent the same conversation, the audio signals may not be identical, as explained elsewhere herein and illustrated by audio signal profile functions 260a, 260b. The system may attempt to resolve double talk fragment and retrieve individual fragments assignable to each active speaker channel by applying various filtering techniques 270, such as LMS filtering. If successful, separate fragments 280a, 280b may be added to their respective channels. Otherwise, double-talk recorded on each principal recording device may be added to the corresponding channel, all double-talk fragments may be cross-referenced and switchable between channels)).   

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Sinkov et al. (Pub. No. US 2019/0200121), hereinafter Sinkov; in view of Diamant (Pub. No. US 2019/0341050), hereinafter Diamant, and in view of Gleim (Pub. No. US 2015/0063553).

Claim 22. 	Sinkov in view of Diamant discloses the computer-implemented method of claim 21,   
The combination doesn’t explicitly disclose wherein: the first level of volume corresponds to a first distance between a first participant and the second client device; and the second level of volume corresponds to a second distance between a second participant and the second client device. 
However, Gleim discloses wherein: the first level of volume corresponds to a first distance between a first participant and the second client device; and the second level of volume corresponds to a second distance between a second participant and the second client device (Parag. [0051]; (The art teaches that the volume level can tell us how loud someone is talking, but it also tells us how far a speaker is from their physical microphone. For 3D sound conferencing, we intentionally level the sound to remove the information about how far the speaker is from their physical microphone so that we can then use an attenuator to intentionally and negative or positive volume information that communicates the distance between the speaker (speaking participant) and the listener (listening participant) in the mapped room. i.e., the volume corresponds to the distance between the meeting participant and the user device, as equivalent to the applicant’s definition)).
It would be obvious to one of ordinary skill in the art at the time before the effective filling date of the claimed invention to modify the combination to incorporate the teaching of Gleim. This would be convenient to enhance virtual learning system in which the participant can feel he or she is really experiencing an actual classroom environment with each user or participant having the ability to distinguish between multiple voices (Parag. [0002]).





Conclusion
		The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Nord (US 2015/0117626) – Related art in the area related to using audio signals to identify when client devices are co-located, (Abstract, a technique manages an online meeting. The technique includes providing an audio output signal to a first client device currently participating in the online meeting. The audio output signal directs the first client device to play a particular sound (e.g., a unique tone or a unique series of tones). The technique further involves receiving an audio input signal from a second client device. The audio input signal includes the particular sound. The technique further involves identifying the second client device as being co-located with the first client device in response to the audio input signal which includes the particular sound. Such operation enables the electronic circuitry (e.g., a processing circuit of an online meeting server) to learn whether any client devices are co-located and accordingly associate multiple devices to a single user connected to the online meeting).
		Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDELBASST TALIOUA whose telephone number is (571)272-4061.  The examiner can normally be reached on Monday-Thursday 7:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Trost can be reached on 571-272-7872.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/A.T./Patent Examiner, Art Unit 2442                                                                                                                                                                                                        
/WILLIAM G TROST IV/Supervisory Patent Examiner, Art Unit 2442