DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .




Claim Objections
Claim 9 is objected to because of the following informalities:  the system of claim 88 should be “the system of claim 8.”  Appropriate correction is required.












Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tablan et al. (US 2021/0383913 A1) in view of Garnavi et al. (US 10,002,311 B1).
As to claim 1, Tablan discloses a method of electronically documenting a conversation [Paragraph 0001], the method comprising: 
capturing audio of a conversation between a first speaker and a second speaker [“Speaker diarization is the attribution of streams of input audio data (speech data) to the particular individuals taking part in a conversation.” Paragraph 0109];
generating conversation audio data from the captured audio[“Speaker diarization uses a combination of speaker segmentation, where speaker change points in the audio data are identified, and speaker clustering, where streams of speech (audio streams) are grouped together on the basis of characteristics of the speech.” Paragraph 0109]; 
segmenting the conversation audio data into a plurality of utterances according to a speaker segmentation technique [“The text data 16 is divided 116 into utterances 118. The utterances represent short passages/phrases/sentences of speech. Where diarization is used to attribute a particular stream of input audio data to a particular speaker (e.g. therapist or one or more of the patient(s)), this may therefore be used to attribute each utterance in the exchange to either the therapist or to the one or more patient.” Paragraph 0115]; 
for each utterance: passing the utterance to a neural network model [Deep learning model], the neural network model configured to receive the utterance as an input and generate a feature representation of the utterance as an output  [“Assigning a semantic representation to the utterances involves using the first part or portion of a deep learning model. The first part of the deep learning model may include a single layer or multiple stacked layers.” Paragraph 0141]; 
assigning the utterance feature representation to a first speaker cluster [Patient] or a second speaker cluster [Therapist] according to a clustering technique [“Each utterance may be automatically identified as deriving from either the patient or the therapist by the use of e.g. diarization.” Paragraph 0116]; 
assigning a speaker identifier to the utterance [Patient utterance] based on the cluster assignment of the utterance [“Preferably, both sets of information are transferred when the transcript is divided into individual utterances, in order to produce e.g. patient utterances 118, and therapist utterances 118.” Paragraph 0116]; 
generating a text representation [Convert to text] of the utterance [Paragraph 0117]; 
generating a digital transcript [Transcript text data] of the conversation by chronologically ordering the text representations of the utterances according to the time data for the utterances [Paragraph 0119]; and 
importing the digital transcript into [Assignment of semantic representations] an electronic conversation artifact [“The display also includes confidence scores for the predictions. The display includes text and/or graphical representations of the predictions and confidence scores.” Paragraphs 0146 and 0168].  
Tablan discloses time on the task, but fails to disclose storing time data [Paragraph 0146].
However, Garnavi teaches storing time data indicating the chronological position [Eye tracking instance] of the utterance in the conversation [“These keywords are recorded in the database in synchronization with the time stamp of the eye tracking data in order to identify the eye tracking instance to the corresponding audio input.” Column 8, lines 30-37].
Tablan and Garnavi are analogous because they are all directed to audio processing system. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious using the recording of keywords with time stamp taught by Garnavi in an utterance assignment model performance system such as that of Tablan as suggested by Garnavi, for the obvious purpose of the predicted sequence of utterance is displayed on the user interface, by combining prior art elements according to known methods to yield predictable results. 

As to claim 2, Tablan discloses the method of claim 1, wherein the first speaker is a healthcare professional and the second speaker is a patient, and wherein the conversation is a medical consultation [Paragraph 0001].  

As to claim 3, Tablan discloses the method of claim 1, wherein the electronic conversation artifact is an electronic medical record [Paragraph 0122]. 

As to claim 4, Tablan discloses the method of claim 1, wherein the neural network model includes a convolutional neural network [Paragraph 0142].  

As to claim 5, Tablan discloses a method of generating and delivering an electronic prompt to a speaker during a conversation [Paragraph 0121], the method comprising: see claim 1’s rejection for the rest of limitations. 
analyzing the utterance text data to determine an utterance subject matter [Paragraph 0120]; 
generating a prompt using the utterance subject matter [Paragraph 0121].
 Tablan discloses a prompt question to the patient but fails to disclose displaying the prompt.
However, Garnavi teaches displaying the prompt at an information gatherer device [Column 4, lines 53-59]. 
Tablan and Garnavi are analogous because they are all directed to audio processing system. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious using the visual display to inform the user taught by Garnavi in an utterance assignment model performance system such as that of Tablan as suggested by Garnavi, for the obvious purpose of the eye-gaze tracker to complete the session recording, by combining prior art elements according to known methods to yield predictable results.

As to claim 6, Tablan discloses the method of claim 5, wherein the utterance text data is analyzed using a text classifier [Paragraph 0111]. 

As to claim 7, Tablan discloses a system for processing audio via speaker clustering [Paragraph 0109], the system comprising: 
an audio capture device [Microphone] configured to capture audio of a conversation between a first speaker and a second speaker and generate conversation audio data from the captured audio [Paragraph 0085]. See claim 1’s rejection; 
an audio processing server [Server 3 on FIG. 1] communicatively connected to the audio capture device, the audio processing server configured to: receive the conversation audio data from the audio capture device [Paragraph 0088]. See claim 1’s rejection for the rest of the limitations. 

As to claim 8, Tablan discloses the system of claim 7, wherein the audio processing server is further configured to generate a text representation of at least one utterance [Paragraph 0111]. 

As to claim 9, Tablan discloses the system of claim 8, wherein the audio processing server is further configured to generate a digital transcript of the conversation including text representations of two or more utterances [Paragraph 0111].  

As to claim 10, Tablan discloses the system of claim 7, further comprising an information gatherer device communicatively connected to the audio processing server via a network, the information gatherer device configured to display the digital transcript for review by a user [Paragraph 0087].  

As to claim 11, Tablan discloses the system of claim 7, wherein upon approval of the digital transcript, the digital transcript is automatically imported into an electronic conversation artifact [Paragraph 0146].  

As to claim 12, Tablan discloses the system of claim 11, further comprising an electronic conversation artifact server communicatively connected to the audio processing server via the network, wherein the electronic artifact server is configured to generate and store the electronic conversation artifact [Paragraphs 0111 and 0146].

As to claim 13, Tablan discloses the system of claim 7, wherein the neural network model is a VGGish model [Paragraph 0142].  

As to claim 14, Tablan discloses the system of claim 7, wherein the speaker clustering pipeline further includes extracting time series data of the utterance audio data [Paragraph 0146].  

As to claim 15, Tablan discloses the system of claim 7, wherein the speaker clustering pipeline further includes extracting mel features using an FFT window function [Paragraph 0109].  

As to claim 16, Tablan discloses the system of claim 14, wherein the utterance audio data is provided to the neural network model as a batch of spectrogram frames [Paragraph 0025].

As to claim 17, Tablan discloses the system of claim 16, wherein the number of frames in the batch is determined by the time length of the utterance audio data [Paragraph 0146].  

As to claim 18, Tablan discloses the system of claim 7, wherein the cluster model implements a similarity function for determining the cluster assignment [Paragraph 0109].

As to claim 19, Tablan discloses the system of claim 7, wherein the audio processing server comprises a speaker identification subsystem comprising a convolutional neural network and an attention-based LSTM layer [Paragraph 0025].  

As to claim 20, Tablan discloses the system of claim 19, wherein the convolutional neural network is SincNet [Paragraph 0142].




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892 form.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GERALD GAUTHIER whose telephone number is (571)272-7539. The examiner can normally be reached 8:00 AM to 4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FAN TSANG can be reached on (571) 272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GERALD GAUTHIER/Primary Examiner, Art Unit 2653                                                                                                                                                                                                        
August 10, 2022