DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on   07/12/2022. 
Claims 1-5, 7, and 9-16 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner. 
	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 15, and 16 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Please see the new mapping citing Okabe with respect to the amended limitations for further detail. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 7, and 9-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tsiartas et al. (US PG Pub No. 2017/0084295), hereinafter Tsiartas, in view of Doerflinger (U.S. PG Pub No. 2019/0180871), hereinafter Doerflinger, and further in view of Okabe et al. (U.S. PG Pub No. 2015/0287402), hereinafter Okabe.

Regarding claims 1, 15, and 16, Tsiartas teaches
(claim 1) A computer-enabled method for obtaining a diagnosis of a mental health disorder or condition, the method comprising ([0018] a method for detecting a speaker’s state):
(claim 15) An electronic device, comprising ([0039] a computing device):
(claim 15) a display ([0039] an output device, such as a display);
(claim 15) one or more processors ([0039] a processor);
(claim 15) a memory ([0039] a memory); and
(claim 15) one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for ([0056] the speech analytics platforms, i.e. one or more programs, may be implemented as a set of instructions embodied in one or more computer readable media, i.e. stored in the memory, that are executable by one or more processors, i.e. configured to be executed):
(claim 16) A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device having a display, cause the electronic device to ([0039], [0056] the speech analytics platforms, i.e. one or more programs, may be implemented as a set of instructions embodied in one or more computer readable media, that are executable by one or more processors of a computing device, i.e. one or more processors of an electronic device, where the device has a display):

receiving an audio input ([0034:9-24] a person provides speech input, i.e. audio input, captured, i.e. receiving, by a microphone);
sampling the received audio input by one or more microphones to generate an electrical audio signal ([0034:9-24] speech input captured, i.e. received audio input, by a microphone, i.e. one or more microphones, is converted, i.e. sampling…to generate, into a digital audio stream, i.e. electrical audio signal, to be provided to the speech analytics system);
converting the audio signal into a text string ([0021], [0031:1-15], [0043:8-12] an ASR module can extract features from the audio signal, i.e. converting the audio signal, which can include outputting a transcription, i.e. a text string);
identifying a speaker associated with the text string ([0089] the audio is segmented, and speaker models are used to determine the identity of the speaker who spoke the segment, i.e. identifying a speaker, where the segments are further tagged to indicate which speaker spoke particular words or phrases, i.e. associated with the text string, and the tags are further used during feature extraction);
detecting an indicator of the mental health condition based on a portion of the text string ([0021], [0025], [0031:1-15] a speaker’s state, i.e. mental health condition, can be detected by analyzing the extracted features, i.e. indicator, including words extracted from the audio signal, i.e. based on a portion of the text string), wherein detecting the indicator of the mental disorder or condition comprises:
applying a machine learning classifier to the portion of the text string and generating from the classifier an indicator of the mental health condition ([0025], [0027:1-8],[0031:1-25], [0037:1-13], [0086:1-17], [0090], [0115:1-3] the ASR may feed transcriptions, i.e. portion of the text string, to a machine learning model, such as a classifier, i.e. applying a machine learning classifier, that is used to interpret the features extracted from the speech signal, where the speaker state prediction is provided by the model, i.e. generating from the classifier an indicator, and the speaker state may be a mental health state, such as PTSD, i.e. indicator of the mental health condition), wherein the machine learning classifier --is-- generated using training data, the training data comprising a plurality of audio inputs previously associated with a known mental health condition ([0025], [0027:1-8],[0037:1-14],[0076],[0080:1-6] the machine learning model, such as a classifier, may be generated and trained by applying machine learning algorithms to a set of training data, i.e. machine learning classifier --is-- generated using training data, where training data can use any audio input data, i.e. training data comprising a plurality of audio inputs, that are appropriately labelled, and   labelling allows for identification of a speaker state, where a speaker state may be a mental health state, such as depression, stress, PTSD, or brain injury, i.e. previously associated with a known mental health condition);
determining, based on at least a portion of the audio signal, a predefined audio characteristic of a plurality of predefined audio characteristics, wherein determining the predefined audio characteristic comprises determining one or more electrical properties of the electrical audio signal ([0022:14-27],[0031:1-15], [0034:9-24] features extracted from the audio signal, i.e. determining, based on at least a portion of the audio signal, can include a prosodic, such as intonation, timing, pausing, rate, loudness, quality, and variability, and speech/non-speech voicing patterns, i.e. a predefined audio characteristic of a plurality of predefined audio characteristics, where the speech input is converted into a digital audio stream, i.e. electrical audio signal, and the prosodic, such as intonation, timing, pausing, rate, loudness, quality, and variability, and speech/non-speech voicing patterns are extracted from the audio signal, i.e. determining one or more electrical properties);
identifying, based on the determined audio characteristic of the plurality of predefined audio characteristics corresponding to the portion of the audio input, an emotion corresponding to the portion of the audio input ([0031:29-36] extracted features, i.e. based on the determined audio characteristic…, may be used to detect the speaker’s emotional state, such as happy or nervous, i.e. identifying…an emotion corresponding to the portion of the audio input);
generating a set of structured data based on the text string, the speaker, the predefined audio characteristic, and the identified emotion... ([0031:1-15], [0032-3], [0043:8-12], [0047], [0049] data summaries are provided, i.e. generating a set of structured data, that can include feature outputs, such as the transcription by the ASR, i.e. based on the text string, speaker state information, i.e. based on…the speaker…the identified emotion, and visualizations of raw feature output such as prosodic characteristics, i.e. predefined audio characteristic); and
providing an output for obtaining the diagnosis of the mental disorder or condition, wherein the output is indicative of at least a portion of the set of structured data ([0025], [0031], [0038] the speaker state may be output, i.e. providing an output, where the speaker state is based on analysis of the various features extracted from the audio signal, i.e. indicative of at least a portion of the set of structured data, where the speaker state can include a mental health state, such as depressed, stressed, PTSD, brain injury, or others, i.e. obtaining the diagnosis of the mental disorder or condition).  
While Tsiartas provides the use of a machine learning model, such as a classifier, to interpret features extracted from speech, and the output of data summaries and feature outputs, Tsiartas does not specifically teach that the machine learning model is a neural network or that the feature outputs are searchable, and thus does not teach
wherein the machine learning classifier includes a neural network generated using training data...; and
wherein the generated set of structured data is configured to enable cross-modality search and retrieval of the text string, the speaker, the predefined audio characteristic, and the identified emotion.
Doerflinger, however, teaches wherein the machine learning classifier includes a neural network generated using training data ([0005], [0044] textual data is evaluated by a neural net, i.e. machine learning classifier includes a neural network, to generate confidence scores identifying signs of neurological symptoms, where the neural net is pre-trained, i.e. generated…training).
And where Tsiartas specifically teaches that the training of a machine learning model utilizes a set of training data [0037:1-14].
Tsiartas and Doerflinger are analogous art because they are from a similar field of endeavor in evaluating human input to determine signs of mental health issues. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of a trained machine learning model teachings of Tsiartas with the specific use of a neural network as taught by Doerflinger. It would have been obvious to combine the references to enable using a neural network to interpret a patient’s speech input and generate a confidence score that the patient is exhibiting symptoms of a neurological problem (Doerflinger [0044]).
While Tsiartas in view of Doerflinger provides the output of data summaries and feature outputs, Tsiartas in view of Doerflinger does not specifically teach that the feature outputs are searchable, and thus does not teach
wherein the generated set of structured data is configured to enable cross-modality search and retrieval of the text string, the speaker, the predefined audio characteristic, and the identified emotion.
Okabe, however, teaches wherein the generated set of structured data is configured to enable cross-modality search and retrieval of the text string, the speaker, the predefined audio characteristic, and the identified emotion (Figs.8 and 9,[0043-5],[0048],[088-9],[0093] the call search criteria is acquired, i.e. search, where the search criteria can include a keyword, i.e. text string, an expression or sound, i.e. predefined audio characteristic, a specific emotion, i.e. identified emotion, and the data is associated with a particular speaker using voice recognition, i.e. the speaker, where the results of the search are displayed for review, i.e. generated set of structured data is configured to enable cross-modality search and retrieval).
Tsiartas, Doerflinger, and Okabe are analogous art because they are from a similar field of endeavor in processing speech to determine text and emotion. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the output of data summaries and feature outputs teachings of Tsiartas, as modified by Doerflinger, with the use of search criteria to find details in speech including text, speaker identification, expressions and sounds, and emotions as taught by Okabe. It would have been obvious to combine the references to enable the extraction and display of utterance sections that include specific keywords, sounds, or emotions (Okabe [0088-9]).

Regarding claim 2, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Tsiartas further teaches
wherein the mental condition is PTSD ([0025] the mental health state, i.e. mental condition, may include PTSD).  

Regarding claim 3, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Tsiartas further teaches 
audio input includes one or more audio files ([0004:11-13], [0116], [0119], [0123] data storage may store a digital form of speech data, i.e. audio files, which can then be retrieved for segmentation, speaker diarization, and feature extraction by the analytics module, i.e. audio input).  

Regarding claim 4, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Tsiartas further teaches
converting the audio input into a predetermined format different from the first format ([0034:9-24] a person provides speech input captured by a microphone, i.e. audio input…the first format, which is converted into a digital audio stream, i.e. a predetermined format different from the first, to be provided to the speech analytics system).  

Regarding claim 7, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Doerflinger further teaches 
detecting the indicator of the mental disorder or condition comprises detecting one or more predefined words in the portion of the text string ([0025], [0030], [0041-2], [0058] the synthesis function analyzes supplied data, such as text, i.e. portion of the text string, to determine the presence of inappropriate language, i.e. detecting one or more predefined words, and where the synthesis function assists in determining if a patient is having a mental episode based on the cues identified in the analysis, i.e. indicator of the mental disorder).  
Where the motivation to combine is the same as previously presented.

Regarding claim 9, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Tsiartas further teaches
the indicator includes a speech pattern ([0031:1-21] the speaker state is determined by analyzing extracted features, i.e. indicator, including voice quality patterns and voicing patterns, i.e. speech pattern).  

Regarding claim 10, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Tsiartas further teaches
the predefined acoustic characteristic comprises speech rate, pitch, intonation, energy level, or a combination thereof ([0031:1-15], [0041:1-9] the extracted features, i.e. predefined acoustic characteristic, includes prosodic features such as rate, i.e. speech rate, intonation, loudness, i.e. energy level, and pitch).  

Regarding claim 11, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Tsiartas further teaches
associating a portion of the text string with the speaker, the predefined audio characteristic, the identified emotion, or a combination thereof (Figs. 3A and 3B, [0049:9-15], [0050-1] the feedback, i.e. set of structured data, display includes a visual representation of the detected words, i.e. portion of the text string, a display showing the speaker, i.e. associating…with the speaker, a visual representation of speech-related characteristics, i.e. predefined audio characteristic, and emotions, i.e. identified emotion).  

Regarding claim 12, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Tsiartas further teaches
associating the portion of the audio input with the speaker, the predefined audio characteristic, the identified emotion, or a combination thereof (Figs. 3A and 3B, [0049:9-15], [0050-1] the feedback, i.e. set of structured data, display includes a visual representation of the speech signal, i.e. portion of the audio input, a display showing the speaker, i.e. associating…with the speaker, a visual representation of speech-related characteristics, i.e. predefined audio characteristic, and emotions, i.e. identified emotion).  

Regarding claim 13, Tsiartas in view of Doerflinger and Okabe teaches claim 1, and Okabe further teaches 
receiving a user input indicative of a query for a keyword ([0039-40], [0067:14-17], [0096] a call analysis server accepts inputs from a user, i.e. receiving a user input, such as a search criteria, i.e. indicative of a query, where the call search criteria may be a keyword, i.e. query for a keyword).  
Where the motivation to combine is the same as previously presented.

Regarding claim 14, Tsiartas in view of Doerflinger and Okabe teaches claim 13, and Okabe further teaches 
in response to receiving the user input, providing an output indicative of a segment of the audio input ([0093], [0096] the call analysis server acquires the search criteria keyword, i.e. receiving the user input, and extracts the call data containing the keyword, i.e. in response, and displays the calls and utterance sections in each of the calls, i.e. providing an output indicative of a segment of the audio input, that include the keyword).  
Where the motivation to combine is the same as previously presented.

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tsiartas, in view of Doerflinger, in view of Okabe, and further in view of Bastide et al. (U.S. PG Pub No. 2018/0331990), hereinafter Bastide.

Regarding claim 5, Tsiartas in view of Doerflinger and Okabe teaches claim 1. 
While Tsiartas in view of Doerflinger and Okabe provides the use of ASR to extract transcripts from audio signals, Tsiartas in view of Doerflinger and Okabe does not specifically teach that the ASR algorithm is commercial off-the-shelf or open source, and thus does not teach
the conversion of the audio input into the text string is based on a commercial off-the-shelf algorithm, an open-source ASR software development kit ("SDK"), or a combination thereof  
Bastide, however, teaches the conversion of the audio input into the text string is based on a commercial off-the-shelf algorithm, an open-source ASR software development kit ("SDK"), or a combination thereof ([0036:22-26] speech to plain text via speech recognition, i.e. conversion of the audio input into the text string, may be performed via the commercially available IMB Watson Speech to Text application, i.e. commercial off-the-shelf algorithm).
Tsiartas, Doerflinger, Okabe, and Bastide are analogous art because they are from a similar field of endeavor in processing human speech to recognize emotion. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of ASR teachings of Tsiartas, as modified by Doerflinger and Okabe, with the ASR algorithm being specifically a commercial off-the-shelf algorithm as taught by Bastide. It would have been obvious to combine the references to enable linguistic analysis on the plain text of audible speech using any of a number of commercially-available algorithms for that purpose (Bastide [0036:22-35]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICOLE A K SCHMIEDER/           Examiner, Art Unit 2659  

/PIERRE LOUIS DESIR/           Supervisory Patent Examiner, Art Unit 2659