DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-16 are pending in this application.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/13/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because of the following informality: The device 600 referenced in the specifications in [00104] is not shown in Fig. 6.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more.

Regarding claims 1, 15, and 16, the limitation(s) of “receiving an audio input…”, “converting…”, “identifying…”, “generating…”, and “providing…”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. More specifically, the mental process of a human hearing someone speaking and writing down the words, recognizing the identity of the speaker, hearing specific qualities of the speaker’s speech, recognizing an emotion associated with how the speaker sounds, writing down information related to the words spoken, the speaker, the quality of the speech, and the emotion, and writing down a diagnosis of a mental condition based on the information. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the --Mental Processes-- grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to receive, convert, identify, determine, identify, generate, and provide amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.

	With respect to claim 2, the claim recites “the mental condition is PTSD”, which reads on a human recognizing the words, speaker, manner of speaking, and emotion expresses in combination as representing a manifestation of PTSD. No additional limitations are present.
	


	With respect to claim 4, the claim recites “converting the audio input into a predetermined format different from the first format”, which reads to a human hearing speech by different speakers and transcribing by hand the speech heard by order of speaker. No additional limitations are present.

	With respect to claim 5, the claim recites “the conversion…is based on a commercial off-the-shelf algorithm, and open-source ASR software…”, which is considered generalized computer implementation of insignificant pre-solution activity - data gathering (see MPEP 2106.05(g)), as the conversion of audio information to text using a commercially-available algorithm is gathering data using a conventional system. No additional limitations are present.

With respect to claim 6, the claim recites “detecting an indicator…based on a portion of the text string”, which reads on a human reading text and recognizing the text as having indicators of a specific mental condition. No additional limitations are present.

With respect to claim 7, the claim recites “detecting an indicator…one or more predefined words in a portion of the text string”, which reads on a human reading text 

With respect to claim 8, the claim recites “providing the portion…” and “receiving…the indicator…”, which reads on a human reading a written transcript and writing down a phrase indicative of a mental condition based on the reading of the transcript. No additional limitations are present.

With respect to claim 9, the claim recites “indicator includes a speech pattern”, which reads on a human recognizing a manner in which a person is speaking as being indicative of a mental condition. No additional limitations are present.

With respect to claim 10, the claim recites “predefined acoustic characteristic…”, which reads on a human recognizing how fast a person is speaking, the pitch of their voice, the emphasis used, or the volume. No additional limitations are present.

	With respect to claim 11, the claim recites “associating a portion of the text string…”, which reads on a human recognizing that part of the written text corresponds to the speaker, how the speaker said the words, or the emotion the speaker seemed to have at the time the words were spoken. No additional limitations are present.

	With respect to claim 12, the claim recites “associating a portion of the audio input…”, which reads on a human recognizing that part of the heard speech 

	With respect to claim 13, the claim recites “receiving a user input…”, which reads on a human being asked by another human if a specific word has been spoken. No additional limitations are present.

	With respect to claim 14, the claim recites “providing an output…, which reads on a human saying aloud the part of the speech that includes the requested keyword. No additional limitations are present.

These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.
		
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-4, 6, 8-12, 15, and 16 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Tsiartas et al. (US PG Pub No. 2017/0084295), hereinafter Tsiartas.

Regarding claims 1, 15, and 16, Tsiartas teaches
(claim 1) A computer-enabled method for obtaining a diagnosis of a mental disorder or condition, the method comprising ([0018] a method for detecting a speaker’s state):
(claim 15) An electronic device, comprising ([0039] a computing device):
a display ([0039] an output device, such as a display);
one or more processors ([0039] a processor);
a memory ([0039] a memory); and
 one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for ([0056] the speech analytics platforms, i.e. one or more programs, may be implemented as a set of instructions :
(claim 16) A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device having a display, cause the electronic device to ([0039], [0056] the speech analytics platforms, i.e. one or more programs, may be implemented as a set of instructions embodied in one or more computer readable media, that are executable by one or more processors of a computing device, i.e. one or more processors of an electronic device, where the device has a display):

receiving an audio input ([0034:9-24] a person provides speech input captured by a microphone, i.e. audio input, which is converted into a digital audio stream, to be provided to the speech analytics system, i.e. receiving);
converting the audio input into a text string ([0021], [0031:1-15], [0043:8-12] an ASR module can extract features from the audio signal, i.e. converting the audio input, which can include outputting a transcription, i.e. a text string);
identifying a speaker associated with the text string ([0089] the audio is segmented, and speaker models are used to determine the identity of the speaker who spoke the segment, i.e. identifying a speaker, where the segments are further tagged to indicate which speaker spoke particular words or phrases, i.e. associated with the text string, and the tags are further used during feature extraction);
determining, based on at least a portion of the audio input, a predefined audio characteristic of a plurality of predefined audio characteristics ([0031:1-15] features extracted from the audio signal, i.e. determining, based on at least a portion of the audio input, can include a prosodic, such as intonation, timing, pausing, rate, loudness, quality, and variability, and speech/non-speech voicing patterns, i.e. a predefined audio characteristic of a plurality of predefined audio characteristics);
identifying, based on the determined audio characteristic of the plurality of predefined audio characteristics corresponding to the portion of the audio input, an emotion corresponding to the portion of the audio input ([0031:29-36] extracted features, i.e. based on the determined audio characteristic…, may be used to detect the speaker’s emotional state, such as happy or nervous, i.e. identifying…an emotion corresponding to the portion of the audio input);
generating a set of structured data based on the text string, the speaker, the predefined audio characteristic, and the identified emotion ([0031:1-15], [0032-3], [0043:8-12], [0047], [0049] data summaries are provided, i.e. generating a set of structured data, that can include feature outputs, such as the transcription by the ASR, i.e. based on the text string, speaker state information, i.e. based on…the speaker…the identified emotion, and visualizations of raw feature output such as prosodic characteristics, i.e. predefined audio characteristic);
providing an output for obtaining the diagnosis of the mental disorder or condition, wherein the output is indicative of at least a portion of the set of structured data ([0025], [0031], [0038] the speaker state may be output, i.e. providing an output, where the speaker state is based on analysis of the various features .  

	Regarding claim 2, Tsiartas teaches claim 1, and further teaches
wherein the mental condition is PTSD ([0025] the mental health state, i.e. mental condition, may include PTSD).  	

Regarding claim 3, Tsiartas teaches claim 1, and further teaches 
audio input includes one or more audio files ([0004:11-13], [0116], [0119], [0123] data storage may store a digital form of speech data, i.e. audio files, which can then be retrieved for segmentation, speaker diarization, and feature extraction by the analytics module, i.e. audio input).  

Regarding claim 4, Tsiartas teaches claim 1, and further teaches
converting the audio input into a predetermined format different from the first format ([0034:9-24] a person provides speech input captured by a microphone, i.e. audio input…the first format, which is converted into a digital audio stream, i.e. a predetermined format different from the first, to be provided to the speech analytics system).  
Regarding claim 6, Tsiartas teaches claim 1, and further teaches
detecting an indicator of the mental health condition based on a portion of the text string ([0021], [0025], [0031:1-15] a speaker’s state, i.e. mental health condition, can be detected by analyzing the extracted features, i.e. indicator, including words extracted from the audio signal, i.e. based on a portion of the text string).  

Regarding claim 8, Tsiartas teaches claim 6, and further teaches
providing the portion of the text string to a classifier ([0031:1-25], [0037:1-13] [0090], [0115:1-3] the ASR may feed transcriptions, i.e. providing the portion of the text string, to a machine learning model, such as a classifier, that is used to interpret the features extracted from the speech signal); and
receiving, from the classifier, the indicator of the mental disorder or condition ([0025], [0027:1-8], [0037:1-13], [0086:1-14] the machine learning model, such as a classifier, is used to interpret the features extracted from the speech signal, where the speaker state prediction is provided by the model, i.e. receiving…the indicator, and the speaker state may be a mental health state, such as PTSD, i.e. mental disorder or condition).  

Regarding claim 9, Tsiartas teaches claim 6, and further teaches
the indicator includes a speech pattern ([0031:1-21] the speaker state is determined by analyzing extracted features, i.e. indicator, including voice quality patterns and voicing patterns, i.e. speech pattern).  

Regarding claim 10, Tsiartas teaches claim 1, and further teaches
the predefined acoustic characteristic comprises speech rate, pitch, intonation, energy level, or a combination thereof ([0031:1-15], [0041:1-9] the extracted features, i.e. predefined acoustic characteristic, includes prosodic features such as rate, i.e. speech rate, intonation, loudness, i.e. energy level, and pitch).  

Regarding claim 11, Tsiartas teaches claim 1, and further teaches
associating a portion of the text string with the speaker, the predefined audio characteristic, the identified emotion, or a combination thereof (Figs. 3A and 3B, [0049:9-15], [0050-1] the feedback, i.e. set of structured data, display includes a visual representation of the detected words, i.e. portion of the text string, a display showing the speaker, i.e. associating…with the speaker, a visual representation of speech-related characteristics, i.e. predefined audio characteristic, and emotions, i.e. identified emotion).  

Regarding claim 12, Tsiartas teaches claim 1, and further teaches
associating the portion of the audio input with the speaker, the predefined audio characteristic, the identified emotion, or a combination thereof (Figs. 3A and 3B, [0049:9-15], [0050-1] the feedback, i.e. set of structured data, display includes a visual representation of the speech signal, i.e. portion of the audio input, a display showing the speaker, i.e. associating…with the speaker, a visual representation of speech-related characteristics, i.e. predefined audio characteristic, and emotions, i.e. identified emotion).  
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tsiartas, in view of Bastide et al. (U.S. PG Pub No. 2018/0331990), hereinafter Bastide.

Regarding claim 5, Tsiartas teaches claim 1. 
While Tsiartas provides the use of ASR to extract transcripts from audio signals, Tsiartas does not specifically teach that the ASR algorithm is commercial off-the-shelf or open source, and thus does not teach
the conversion of the audio input into the text string is based on a commercial off-the-shelf algorithm, an open-source ASR software development kit ("SDK"), or a combination thereof  
Bastide, however, teaches the conversion of the audio input into the text string is based on a commercial off-the-shelf algorithm, an open-source ASR software development kit ("SDK"), or a combination thereof ([0036:22-26] speech to plain text via speech recognition, i.e. conversion of the audio input into the text string, may be performed via the commercially available IMB Watson Speech to Text application, i.e. commercial off-the-shelf algorithm).
Tsiartas and Bastide are analogous art because they are from a similar field of endeavor in processing human speech to recognize emotion. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of ASR teachings of Tsiartas with the ASR algorithm being specifically a commercial off-the-shelf algorithm as taught by Bastide. The motivation to do so would have been to substitute similar elements to achieve a predictable result of enabling linguistic analysis on the plain text of audible speech (Bastide [0036:22-29]).

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tsiartas, in view of Doerflinger  (U.S. PG Pub No. 2019/0180871), hereinafter Doerflinger.

Regarding claim 7, Tsiartas teaches claim 6.
While Tsiartas provides determining speaker states by analyzing the content of what was said, Tsiartas does not specifically teach the recognition of specific words, and thus does not teach
detecting the indicator of the mental disorder or condition comprises detecting one or more predefined words in the portion of the text string.  
detecting the indicator of the mental disorder or condition comprises detecting one or more predefined words in the portion of the text string ([0025], [0030], [0041-2], [0058] the synthesis function analyzes supplied data, such as text, i.e. portion of the text string, to determine the presence of inappropriate language, i.e. detecting one or more predefined words, and where the synthesis function assists in determining if a patient is having a mental episode based on the cues identified in the analysis, i.e. indicator of the mental disorder).  
Tsiartas and Doerflinger are analogous art because they are from a similar field of endeavor in evaluation human input to determine signs of mental health issues. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the determination of speaker states by analyzing the content of what was said teachings of Tsiartas with the specific identification of inappropriate language as taught by Doerflinger. The motivation to do so would have been to achieve a predictable result of using all available data, including discovered words, to classify the emotions of speakers and plot the relationships between words (Doerflinger [0055-6]).

Claim(s) 13 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tsiartas, in view of Okabe et al. (U.S. PG Pub No. 2015/0287402), hereinafter Okabe.




While Tsiartas provides the ability to search for and retrieve audio data, Tsiartas does not specifically teach the use of a query for a keyword, and thus does not teach
receiving a user input indicative of a query for a keyword.  
Okabe, however, teaches receiving a user input indicative of a query for a keyword ([0039-40], [0067:14-17], [0096] a call analysis server accepts inputs from a user, i.e. receiving a user input, such as a search criteria, i.e. indicative of a query, where the call search criteria may be a keyword, i.e. query for a keyword).  
Tsiartas and Okabe are analogous art because they are from a similar field of endeavor in processing speech to determine text and emotion. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the ability to search for and retrieve audio data teachings of Tsiartas with the ability to search call data for specific keywords as taught by Okabe. The motivation to do so would have been to substitute similar elements to achieve a predictable result of enabling the extraction and display of utterance sections that include a specific keyword (Okabe [0096]).

Regarding claim 14, Tsiartas in view of Okabe teaches claim 13, and Okabe further teaches 
in response to receiving the user input, providing an output indicative of a segment of the audio input ([0093], [0096] the call analysis server acquires the search criteria keyword, i.e. receiving the user input, and extracts the call data containing the keyword, i.e. in response, and displays the calls and utterance sections .  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Shrivastav et al. (U.S. PG Pub No. 2012/0116186): Using physiological and acoustic data in human speech to screen for emotional status or PTSD.
Degani et al. (U.S. PG Pub No. 2004/0249634): Analyzing speech to identify emotional arousal.	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474.  The examiner can normally be reached on 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for 






/NICOLE A K SCHMIEDER/Examiner, Art Unit 2659                                                                                                                                                                                                        
/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
03/15/2021