Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant's amendment filed on July 22, 2022 has been entered.
In view of the amendment to the specification, the amendment(s) to paragraph(s) [0044] have been entered.
In view of the amendment(s) to paragraph(s) [0044] as well as the accompanying explanations of record, the objections to the specification have been withdrawn.
In view of the amendment to the claim(s), the amendment of claim(s) 1, 2, 4, 5, 9, 10, 13, 17, 20 and 21 and the cancellation of claim(s) 14 have been acknowledged and entered.
In view of the amendment to claim(s) 5, 13, and 20, and the cancellation of claim 14, the rejection of claim(s) 5-8, 13-16 and 20 under 35 U.S.C. 112 is withdrawn.
In view of the amendment to claim(s) 21, the objection of claim(s) 21 is withdrawn.
In view of the amendment to claim(s) 1, 2, 4, 5, 9, 10, 13, 17, 20 and 21, and the cancellation of claim 14, the rejection of claim(s) 1-21 under 35 U.S.C. 103 is withdrawn.
In light of the amended to claims, new grounds for rejection for claims 1-13 and 15-22 under 35 U.S.C. 103 are provided in the response below.
Regarding the objection to claims 3, 11 and 18, the issue has not been corrected and the objection is maintained. 
Regarding the rejection of claims 1, 2, 9 ,10 and 17 under 35 U.S.C. 101, the issue has not been corrected and the rejection is maintained.

Response to Argument
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. 103, received on July 22, 2022, have been fully considered.
Applicant’s arguments with respect to claim(s) 1-13 and 15-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
Claims objected to because of the following informalities:  
two labels are used for the single claim element regarding vocal actions – 
“pre-defined vocal actions” = "vocal actions".
A single label should be used for each throughout the claims to ensure clear and proper antecedent basis.
Claims objected to because of the following informalities:  
two labels are used for the single claim element regarding de-noised inhale segments -
“de-noised respiratory event inhale segments” = “de-noised inhale segments”
and two labels are used for the single claim element regarding de-noised exhale segments: - 
“de-noised respiratory event exhale segments” = “de-noised exhale segments”.
A single label should be used for each throughout the claims to ensure clear and proper antecedent basis.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, 9 ,10 and 17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to techniques of analysis and assessment, and as presented, the analysis and assessment can be performed as a mental process in a human mind. The claim recites, a computer-implemented method, comprising: receiving input audio data that includes a representation of a human voice, the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker; extracting one or more voiced segments, one or more non-voiced segments and one or more respiratory event segments from the input audio data; and predicting a physical state of the speaker of the human voice based on respective attributes of the extracted segments.  

The limitation of receiving input audio data that includes a representation of a human voice, under its broadest reasonable interpretation, covers an action performed by a person. That
is, nothing in the claim element precludes the action from practically being performed as a mental process by a person. For example, “receiving input audio data” in the context of this claim encompasses a physician listening to a patient’s voice and breathing. 

Similarly, the limitation of extracting one or more voiced segments, one or more non-voiced segments and one or more respiratory event segments from the input audio data, under its broadest reasonable interpretation, covers an action performed as a mental process by a person. For example, “extracting one or more segments” in the context of this claim encompasses a physician specifically and separately listening to the patient inhale, exhale, wheeze and/or cough. 

Similarly, the limitation of a predicting a physical state of the speaker of the human voice based on respective attributes of the extracted segments, under its broadest reasonable interpretation, covers an action performed as a mental process by a person. For example, “predicting a physical state of the speaker” in the context of this claim encompasses a physician determining a patient has a certain diagnosis such as a cold, or flu, or pneumonia. 

The claims do not recite any limitation which preclude the claimed technique from being performed in the human mind. The claims hereby recite a mental process as a concept performed in the human mind (including an observation, evaluation, judgement and opinion).

This judicial exception is not integrated into a practical application. In particular,
the claim only recites one additional element — a computer-implemented method. The computer-implemented method is recited at a high-level of generality such that it amounts to no more than describe general purpose hardware that is being used as a tool to implement the mental process.

Accordingly, these additional elements do not integrate the mental process into a practical
application because they do not impose any meaningful limits on practicing the mental process. The claim is directed to an abstract idea.

The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to
integration of the mental process into a practical application, the additional element of
a computer-implemented method amounts to no more than mere instructions to apply the exception using general purpose hardware as a tool to implement the mental process. Mere instructions to apply an exception using general purpose hardware cannot provide an inventive concept. Thus, claim 1 is not patent eligible.

Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to techniques of analysis and assessment, and as presented, the analysis and assessment can be performed as a mental process in a human mind. The claim recites, the computer-implemented method of claim 1, wherein extracting the respective segments comprises: identifying respective audio portions in the input audio data that correspond to the one or more instances of the pre-defined vocal actions ("vocal actions") performed by the speaker in response to instructions provided by one or more prompts; wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non-speech prompts; and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the instances of the pre-defined vocal actions ("vocal actions") performed by the speaker.

The additional limitation of identifying respective audio portions in the input audio data that correspond to vocal actions performed by the speaker in response to instructions provided by one or more prompts, under its broadest reasonable interpretation, covers an action performed by a person. That is, nothing in the claim element precludes the action from practically being performed as a mental process by a person. For example, “identifying respective audio portions” in the context of this claim encompasses a physician listening and detecting inhale, exhale, speak, cough and/or wheeze. 

The claims do not recite any limitation which preclude the claimed technique from being performed in the human mind. The claims hereby recite a mental process as a concept performed in the human mind (including an observation, evaluation, judgement and opinion).

The wherein clauses “wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts; and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the instances of the pre-defined vocal actions ("vocal actions") performed by the speaker” further limit the identifying process and extracting process, respectively, but the limitations of identifying and extracting can still be performed mentally by a person with these further limitations. The wherein clauses further describe the prompts and the respiratory events, but they do not change the determination that the identifying and extracting processes can be done mentally.

This judicial exception is not integrated into a practical application. In particular,
the claim recites one additional element — the computer-implemented method. The computer-implemented method is recited at a high-level of generality such that it amounts to no more than general purpose hardware that is being used as a tool to implement the mental process.

Accordingly, this additional element does not integrate the mental process into a practical
application because it does not impose any meaningful limits on practicing the mental process. The claim is directed to an abstract idea.

The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the mental process into a practical application, the additional element of being a computer-implemented method amounts to no more than mere instructions to apply the exception using general purpose hardware as a tool to implement the mental process. Mere instructions to apply an exception using general purpose hardware cannot provide an inventive concept. Thus, claim 2 is not patent eligible.

Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to techniques of analysis and assessment, and as presented, the analysis and assessment can be performed as a mental process in a human mind. The claim recites, a system comprising: one or more processors; and a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to: receive input audio data that includes a representation of a human voice, the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker; extract one or more voiced segments, one or more non-voiced segments and one or more respiratory event segments from the input audio data; and predict a physical state of the speaker of the human voice based on respective attributes of the extracted segments.  

The limitation of receive input audio data that includes a representation of a human voice, under its broadest reasonable interpretation, covers an action performed by a person. That
is, nothing in the claim element precludes the action from practically being performed as a mental process by a person. For example, “receive input audio data” in the context of this claim encompasses a physician listening to a patient’s voice and breathing. 

Similarly, the limitation of extract one or more voiced segments, one or more non-voiced segments and one or more respiratory event segments from the input audio data, under its broadest reasonable interpretation, covers an action performed as a mental process by a person. For example, “extract one or more segments” in the context of this claim encompasses a physician specifically and separately listening to the patient inhale, exhale, speak, wheeze and/or cough. 

Similarly, the limitation of a predict a physical state of the speaker of the human voice based on respective attributes of the extracted segments, under its broadest reasonable interpretation, covers an action performed as a mental process by a person. For example, “predict a physical state of the speaker” in the context of this claim encompasses a physician determining a patient has a certain diagnosis such as a cold, or flu, or pneumonia. 

The claims do not recite any limitation which preclude the claimed technique from being performed in the human mind. The claims hereby recite a mental process as a concept performed in the human mind (including an observation, evaluation, judgement and opinion).

This judicial exception is not integrated into a practical application. In particular,
the claim only recites two additional elements — one or more processors; and a non-transitory computer readable medium storing a plurality of instructions. The processors and computer readable medium are recited at a high-level of generality such that they amount to no more than describe general purpose hardware that is being used as a tool to implement the mental process.

Accordingly, these additional elements do not integrate the mental process into a practical
application because they do not impose any meaningful limits on practicing the mental process. The claim is directed to an abstract idea.

The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the mental process into a practical application, the additional elements of processors and computer readable medium amount to no more than mere instructions to apply the exception using general purpose hardware as a tool to implement the mental process. Mere instructions to apply an exception using general purpose hardware cannot provide an inventive concept. Thus, claim 9 is not patent eligible.

Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to techniques of analysis and assessment, and as presented, the analysis and assessment can be performed as a mental process in a human mind. The claim recites, the system of claim 9, wherein extract the respective segments comprises: identify respective audio portions in the input audio data that correspond to the one or more instances of the pre-defined vocal actions ("vocal actions") performed by the speaker-ia response to instructions provided by one or more prompts; wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts; and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the instances of the pre-defined vocal actions ("vocal actions") performed by the speaker.

The additional limitation of identify respective audio portions in the input audio data that correspond to vocal actions performed by the speaker in response to instructions provided by one or more prompts, under its broadest reasonable interpretation, covers an action performed by a person. That is, nothing in the claim element precludes the action from practically being performed as a mental process by a person. For example, “identify respective audio portions” in the context of this claim encompasses a physician listening and detecting inhale, exhale, speak, cough and/or wheeze. 

The claims do not recite any limitation which preclude the claimed technique from being performed in the human mind. The claims hereby recite a mental process as a concept performed in the human mind (including an observation, evaluation, judgement and opinion).

Wherein clauses “wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts; and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the instances of the pre-defined vocal actions ("vocal actions") performed by the speaker” further limit the identifying process and the extracting process, respectively, but the limitations of identifying and extracting can still be performed mentally by a person with these further limitations. The wherein clauses further describe the prompts and the respiratory events, but they do not change the determination that the identifying and extracting processes can be done mentally.

This judicial exception is not integrated into a practical application. In particular, the claim recites no additional elements.

As described above, the claim does not include any additional elements.  Thus, the claim does not describe additional elements that are sufficient to amount to significantly more than the judicial exception. Claim 10 is not patent eligible.

Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to techniques of analysis and assessment, and as presented, the analysis and assessment can be performed as a mental process in a human mind. The claim recites, a computer program product comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein to be executed by one or more processors, the program code including instructions to: receive input audio data that includes a representation of a human voice, the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker; extract one or more voiced segments, one or more non-voiced segments and one or more respiratory event segments from the input audio data; and predict a physical state of the speaker of the human voice based on respective attributes of the extracted segments.  

The limitation of receive input audio data that includes a representation of a human voice, under its broadest reasonable interpretation, covers an action performed by a person. That
is, nothing in the claim element precludes the action from practically being performed as a mental process by a person. For example, “receive input audio data” in the context of this claim encompasses a physician listening to a patient’s voice and breathing. 

Similarly, the limitation of extract one or more voiced segments, one or more non-voiced segments and one or more respiratory event segments from the input audio data, under its broadest reasonable interpretation, covers an action performed as a mental process by a person. For example, “extract one or more segments” in the context of this claim encompasses a physician specifically and separately listening to the patient inhale, exhale, speak, wheeze and/or cough. 

Similarly, the limitation of a predict a physical state of the speaker of the human voice based on respective attributes of the extracted segments, under its broadest reasonable interpretation, covers an action performed as a mental process by a person. For example, “predict a physical state of the speaker” in the context of this claim encompasses a physician determining a patient has a certain diagnosis such as a cold, or flu, or pneumonia. 

The claims do not recite any limitation which preclude the claimed technique from being performed in the human mind. The claims hereby recite a mental process as a concept performed in the human mind (including an observation, evaluation, judgement and opinion).

This judicial exception is not integrated into a practical application. In particular,
the claim only recites one additional element — a computer program product comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein to be executed by one or more processors. The computer program product is recited at a high-level of generality such that it amounts to no more than describe general purpose hardware that is being used as a tool to implement the mental process.

Accordingly, these additional elements do not integrate the mental process into a practical
application because they do not impose any meaningful limits on practicing the mental process. The claim is directed to an abstract idea.

The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the mental process into a practical application, the additional element of a computer program product amounts to no more than mere instructions to apply the exception using general purpose hardware as a tool to implement the mental process. Mere instructions to apply an exception using general purpose hardware cannot provide an inventive concept. Thus, claim 17 is not patent eligible.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Karankevich et al. (US Patent Pub. No. 2021/0145306), hereinafter Karankevich, in view of Stamatopoulos et al. (US Patent Pub. No. 2019/0088367), hereinafter Stamatopoulos.

Regarding claim 21, Karankevich teaches a system comprising: one or more processors (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors); 

and a non-transitory computer readable medium storing a plurality of instructions (Karankevich [0032] To a great extent, the acquisition and presentation of respiratory information is effected through user interfaces of applications running on smart phones, tablets, other mobile devices, computers, and other user devices that can be considered part of the respiratory condition technology), 

which when executed, cause the one or more processors to: 

convert input audio to a spectrogram representation of the input data (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram); 

and predict a physical state of the speaker of a human voice represented in the input audio based on the extracted voiced segment (Karankevich [0062] An expert system (whose inputs are connected to the outputs of the subsystems) determines a respiratory condition based on the respiratory sound classes, the answers to the diagnostic questionnaire and the periodic questionnaire, and the durations of the inspiration and expiration phases).

Karankevich does not teach

analyze one or more regions in the spectrogram representation according to respective differences in frequency signal intensities; 

detect at least one region of the spectrogram representation that exceeds an intensity threshold; 

extract a voiced segment from the input audio that maps to the detected region of the input spectrogram.

Stamatopoulos teaches

analyze one or more regions in the spectrogram representation according to respective differences in frequency signal intensities (Stamatopoulos [0360] The HRM is a modeling of the spectrum and, by extension, a modeling of the spectrogram. The modeling process receives a spectrum or spectrogram as an input. The HRM block 2725 may receive either the magnified spectrogram 2723 or the original spectrogram 2724 as an input. A peak detection algorithm is employed to detect the locations and the values of the magnitude spectrum peaks. The peaks that are above a threshold (e.g., the threshold can be set at −12 dB) are interpolated with a Blackman-Harris window. The interpolated spectrogram is the harmonic part of the model. In other words, the interpolated spectrogram comprising the harmonic part of the spectrum is the wheeze-only spectrogram. The residual part is obtained by subtracting the interpolated spectrum from the original one. The residual part comprises the residual airflow energies—subtracting out the residual part from the original spectrogram yields the wheeze-only or interpolated spectrogram);  

detect at least one region of the spectrogram representation that exceeds an intensity threshold (Stamatopoulos [0360] The HRM is a modeling of the spectrum and, by extension, a modeling of the spectrogram. The modeling process receives a spectrum or spectrogram as an input. The HRM block 2725 may receive either the magnified spectrogram 2723 or the original spectrogram 2724 as an input. A peak detection algorithm is employed to detect the locations and the values of the magnitude spectrum peaks. The peaks that are above a threshold (e.g., the threshold can be set at −12 dB) are interpolated with a Blackman-Harris window. The interpolated spectrogram is the harmonic part of the model. In other words, the interpolated spectrogram comprising the harmonic part of the spectrum is the wheeze-only spectrogram. The residual part is obtained by subtracting the interpolated spectrum from the original one. The residual part comprises the residual airflow energies—subtracting out the residual part from the original spectrogram yields the wheeze-only or interpolated spectrogram);

extract a voiced segment from the input audio that maps to the detected region of the input spectrogram (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.


Claims 1, 9 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Karankevich in view of Stamatopoulos in further view of Odinak et al. (US Patent Pub. No. 2018/0130473), hereinafter Odinak.

Regarding claim 1, Karankevich teaches a computer-implemented method (Karankevich [0029] In addition to being received from and provided to such interested parties, respiratory information generated by and used by the respiratory condition technology can be electronically received from or delivered to and used by mobile devices, tablets, computers, databases, or other electronic equipment for the purpose of analysis, aggregation, storage, and distribution, among other things), 

comprising: receiving input audio data that includes a representation of a human voice (Karankevich [0020] Here we describe technology that we refer to as “respiratory condition technology”. This respiratory condition technology is useful for, among other things, sensing, analyzing, characterizing, reporting, and managing conditions of the human respiratory system based on respiratory sounds; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm); 

and one or more respiratory event segments (Karankevich [0061] In some implementations, a processor (e.g., a respiration phase determination subsystem) segments the respiratory signal into inspiration and expiration phases and time periods when the respiratory sounds of particular respiratory classes (such as wheezing) were recorded, although in some implementations such a step may not be necessary or useful)

from the input audio data (Karankevich [0020] Here we describe technology that we refer to as “respiratory condition technology”. This respiratory condition technology is useful for, among other things, sensing, analyzing, characterizing, reporting, and managing conditions of the human respiratory system based on respiratory sounds);

and predicting a physical state of the speaker of the human voice based on respective attributes of the extracted segments (Karankevich [0062] An expert system (whose inputs are connected to the outputs of the subsystems) determines a respiratory condition based on the respiratory sound classes, the answers to the diagnostic questionnaire and the periodic questionnaire, and the durations of the inspiration and expiration phases).  

Karankevich teaches extracting one or more respiratory event segments. However, Karankevich does not teach
the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker;

extracting one or more voiced segments, one or more non-voiced segments.

Stamatopoulos teaches

extracting one or more voiced segments (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced), 

one or more non-voiced segments (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Stamatopoulos teaches voiced and non-voiced segments, however Karankevich in view of Stamatopoulos does not teach

the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker.

Odinak teaches

the representation of the human voice (Odinak [0008] In one embodiment, a system and method for performing caller identity verification using multi-step voice analysis with the aid of a digital computer are provided)

including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set).

Odinak is considered to be analogous to the claimed invention because it is in the same field of analyzing the voice characteristics of a user. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of Odinak to allow for prompting the user to speak a specific text or phrase. Doing so would allow for comparison of the current state of a user to a known set of states for the purpose of diagnostics.

Regarding claim 9, Karankevich teaches a system comprising: one or more processors (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors); 

and a non-transitory computer readable medium storing a plurality of instructions (Karankevich [0032] To a great extent, the acquisition and presentation of respiratory information is effected through user interfaces of applications running on smart phones, tablets, other mobile devices, computers, and other user devices that can be considered part of the respiratory condition technology), 

which when executed, cause the one or more processors to: 

receive input audio data that includes a representation of a human voice (Karankevich [0020] Here we describe technology that we refer to as “respiratory condition technology”. This respiratory condition technology is useful for, among other things, sensing, analyzing, characterizing, reporting, and managing conditions of the human respiratory system based on respiratory sounds; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm);

and one or more respiratory event segments from the input audio data (Karankevich [0061] In some implementations, a processor (e.g., a respiration phase determination subsystem) segments the respiratory signal into inspiration and expiration phases and time periods when the respiratory sounds of particular respiratory classes (such as wheezing) were recorded, although in some implementations such a step may not be necessary or useful); 

and predict a physical state of the speaker of the human voice based on respective attributes of the extracted segments (Karankevich [0062] An expert system (whose inputs are connected to the outputs of the subsystems) determines a respiratory condition based on the respiratory sound classes, the answers to the diagnostic questionnaire and the periodic questionnaire, and the durations of the inspiration and expiration phases).  

Karankevich teaches extract one or more respiratory event segments. However, Karankevich does not teach
the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker;

extract one or more voiced segments, one or more non-voiced segments.

Stamatopoulos teaches

extract one or more voiced segments (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced), 

one or more non-voiced segments (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Stamatopoulos teaches voiced and non-voiced segments, however Karankevich in view of Stamatopoulos does not teach

the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker.

Odinak teaches

the representation of the human voice (Odinak [0008] In one embodiment, a system and method for performing caller identity verification using multi-step voice analysis with the aid of a digital computer are provided)

including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set).

Odinak is considered to be analogous to the claimed invention because it is in the same field of analyzing the voice characteristics of a user. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of Odinak to allow for prompting the user to speak a specific text or phrase. Doing so would allow for comparison of the current state of a user to a known set of states for the purpose of diagnostics.


Regarding claim 17, Karankevich teaches a computer program product comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein (Karankevich [0032] To a great extent, the acquisition and presentation of respiratory information is effected through user interfaces of applications running on smart phones, tablets, other mobile devices, computers, and other user devices that can be considered part of the respiratory condition technology)

to be executed by one or more processors (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors),

the program code including instructions to: 

receive input audio data that includes a representation of a human voice (Karankevich [0020] Here we describe technology that we refer to as “respiratory condition technology”. This respiratory condition technology is useful for, among other things, sensing, analyzing, characterizing, reporting, and managing conditions of the human respiratory system based on respiratory sounds; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm);

and one or more respiratory event segments from the input audio data (Karankevich [0061] In some implementations, a processor (e.g., a respiration phase determination subsystem) segments the respiratory signal into inspiration and expiration phases and time periods when the respiratory sounds of particular respiratory classes (such as wheezing) were recorded, although in some implementations such a step may not be necessary or useful);

and predict a physical state of the speaker of the human voice based on respective attributes of the extracted segments (Karankevich [0062] An expert system (whose inputs are connected to the outputs of the subsystems) determines a respiratory condition based on the respiratory sound classes, the answers to the diagnostic questionnaire and the periodic questionnaire, and the durations of the inspiration and expiration phases).

Karankevich teaches extract one or more respiratory event segments. However, Karankevich does not teach
the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker;

extract one or more voiced segments, one or more non-voiced segments.

Stamatopoulos teaches

extract one or more voiced segments (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced), 

one or more non-voiced segments (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Stamatopoulos teaches voiced and non-voiced segments, however Karankevich in view of Stamatopoulos does not teach

the representation of the human voice including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker.

Odinak teaches

the representation of the human voice (Odinak [0008] In one embodiment, a system and method for performing caller identity verification using multi-step voice analysis with the aid of a digital computer are provided)

including one or more instances of pre-defined vocal actions ("vocal actions") performed by a speaker (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set).

Odinak is considered to be analogous to the claimed invention because it is in the same field of analyzing the voice characteristics of a user. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of Odinak to allow for prompting the user to speak a specific text or phrase. Doing so would allow for comparison of the current state of a user to a known set of states for the purpose of diagnostics.


Claims 2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Karankevich in view of Stamatopoulos, in view of Odinak, in further view of deCharms et al. (US Patent Pub. No. 2016/0005320), hereinafter deCharms.

Regarding claim 2, Karankevich in view of Stamatopoulos in view of Odinak teaches the computer-implemented method of claim 1.

Karankevich further teaches wherein extracting the respective segments comprises: 

and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the vocal actions performed by the speaker (Karankevich [0061] In some implementations, a processor (e.g., a respiration phase determination subsystem) segments the respiratory signal into inspiration and expiration phases and time periods when the respiratory sounds of particular respiratory classes (such as wheezing) were recorded, although in some implementations such a step may not be necessary or useful; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded).  

Karankevich does not teach

identifying respective audio portions in the input audio data that correspond to the one or more instances of the pre-defined vocal actions ("vocal actions") performed by the speaker; wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts; 
and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the instances of the pre-defined vocal actions ("vocal actions") performed by the speaker.

Stamatopoulos teaches 

identifying respective audio portions in the input audio data that correspond to vocal actions performed by the speaker (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Stamatopoulos teaches identifying respective audio portions, however Karankevich in view of Stamatopoulos does not teach

identifying respective audio portions in the input audio data that correspond to the one or more instances of the pre-defined vocal actions ("vocal actions") performed by the speaker; 

wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts; 

and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the instances of the pre-defined vocal actions ("vocal actions") performed by the speaker.

Odinak teaches

the one or more instances of pre-defined vocal actions ("vocal actions") performed by the speaker (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set);

requesting performance of the pre-defined vocal actions (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set);

the instances of the pre-defined vocal actions ("vocal actions") (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set).

Odinak is considered to be analogous to the claimed invention because it is in the same field of analyzing the voice characteristics of a user. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of Odinak to allow for prompting the user to speak a specific text or phrase. Doing so would allow for comparison of the current state of a user to a known set of states for the purpose of diagnostics.

Odinak teaches instances of the pre-defined vocal actions, however Karankevich in view of Stamatopoulos in view of Odinak does not teach

wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts.

deCharms teaches

wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts and one or more types of non-speech prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).

deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of Odinak further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.


Regarding claim 10, Karankevich in view of Stamatopoulos in view of Odinak teaches the system of claim 9.

Karankevich further teaches wherein extract the respective segments comprises: 

and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the vocal actions performed by the speaker (Karankevich [0061] In some implementations, a processor (e.g., a respiration phase determination subsystem) segments the respiratory signal into inspiration and expiration phases and time periods when the respiratory sounds of particular respiratory classes (such as wheezing) were recorded, although in some implementations such a step may not be necessary or useful; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded).  

Karankevich does not teach

identify respective audio portions in the input audio data that correspond to the one or more instances of the pre-defined vocal actions ("vocal actions") performed by the speaker; wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts; 
and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the instances of the pre-defined vocal actions ("vocal actions") performed by the speaker.

Stamatopoulos teaches 

identify respective audio portions in the input audio data that correspond to vocal actions performed by the speaker (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Stamatopoulos teaches identifying respective audio portions, however Karankevich in view of Stamatopoulos does not teach

identify respective audio portions in the input audio data that correspond to the one or more instances of the pre-defined vocal actions ("vocal actions") performed by the speaker; 

wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts; 

and wherein respective types of respiratory events include respective inhale occurrences and respective exhale occurrences during the instances of the pre-defined vocal actions ("vocal actions") performed by the speaker.

Odinak teaches

the one or more instances of pre-defined vocal actions ("vocal actions") performed by the speaker (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set);

requesting performance of the pre-defined vocal actions (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set);

the instances of the pre-defined vocal actions ("vocal actions") (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set).

Odinak is considered to be analogous to the claimed invention because it is in the same field of analyzing the voice characteristics of a user. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of Odinak to allow for prompting the user to speak a specific text or phrase. Doing so would allow for comparison of the current state of a user to a known set of states for the purpose of diagnostics.

Odinak teaches instances of the pre-defined vocal actions, however Karankevich in view of Stamatopoulos in view of Odinak does not teach

wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts requesting performance of the pre-defined vocal actions and one or more types of non- speech prompts.

deCharms teaches

wherein each prompt is one of: a prompt to remain silent, one or more types of speech prompts and one or more types of non-speech prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).

deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of Odinak further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.


Claims 3-7, 11-13, 15, 18-20 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Karankevich in view of Stamatopoulos, in view of Odinak, in view of deCharms in further view of Cho et al. (US Patent Pub. No. 2020/0365168), hereinafter Cho.

Regarding claim 3, Karankevich in view of Stamatopoulos in view of Odinak teaches the computer-implemented method of claim 1.

Karankevich further teaches wherein extracting the respective segments comprises: 

determining a background noise calibration in the input audio data (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram), 

applying one or more machine learning segmentation models to the input audio data with respect to the background noise calibration (Karankevich [0060] A processor (e.g., a respiratory sound automatic classification subsystem) classifies the respiratory sound using a machine learning model that includes neural network or other machine learning algorithms. The classification model generates values representing probabilities for correctly classifying the respiratory sound in each of the five respiratory sound classes: normal breathing, wheezes, rhonchi, fine crackles, and coarse crackles and in classes of artifacts including: heartbeat, membrane skin rubbing, and interference); 

and receiving, from the one or more machine learning segmentation models, the respective extracted segments isolated from the background noise calibration (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 [which executes a machine learning model, as described in [0060]] to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later), 

the respective extracted segments further comprising: 

one or more de-noised respiratory event inhale segments ("de-noised inhale segments"), one or more de-noised respiratory event exhale segments ("de-noised exhale segments") (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded).

Karankevich teaches input audio data representing a recording of a session. However, Karankevich in view of Stamatopoulos does not teach the speaker responds to prompts and one or more de-noised voiced segments.

“the input audio data representing a recording of a session during which the speaker responded to one or more prompts”

“one or more de-noised voiced segments.”

deCharms teaches

the input audio data representing a recording of a session during which the speaker responded to one or more prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 

deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.

Karankevich in view of Stamatopoulos in view of deCharms does not teach

one or more de-noised voiced segments.

Cho teaches

one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal).

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.

Regarding claim 4, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the computer-implemented method of claim 3.

Karankevich further teaches wherein applying one or more machine learning segmentation models to the input audio data with respect to the background noise calibration comprises: 

converting the voiced input audio to a spectrogram representation of the voiced input data (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram).

Karankevich does not teach

identifying voiced input audio that corresponds to vocal actions performed in response to one or more types of speech prompts

analyzing one or more regions in the voiced input spectrogram representation according to respective differences in frequency signal intensities indicated in the voiced input spectrogram representation

detecting at least one region of the voiced input spectrogram representation that exceeds an intensity threshold

extracting a portion of the voiced input audio that maps to the detected region of the voiced input spectrogram

and labeling the extracted portion as a respective de-noised voiced segment.

Stamatopoulos teaches

identifying voiced input audio that corresponds to vocal actions (Stamatopoulos [0149] By using this information, at step 1006, the WDC module 750 is able to classify the blocks as voiced and unvoiced); 

analyzing one or more regions in the voiced input spectrogram representation according to respective differences in frequency signal intensities indicated in the voiced input spectrogram representation (Stamatopoulos [0360] The HRM is a modeling of the spectrum and, by extension, a modeling of the spectrogram. The modeling process receives a spectrum or spectrogram as an input. The HRM block 2725 may receive either the magnified spectrogram 2723 or the original spectrogram 2724 as an input. A peak detection algorithm is employed to detect the locations and the values of the magnitude spectrum peaks. The peaks that are above a threshold (e.g., the threshold can be set at −12 dB) are interpolated with a Blackman-Harris window. The interpolated spectrogram is the harmonic part of the model. In other words, the interpolated spectrogram comprising the harmonic part of the spectrum is the wheeze-only spectrogram. The residual part is obtained by subtracting the interpolated spectrum from the original one. The residual part comprises the residual airflow energies—subtracting out the residual part from the original spectrogram yields the wheeze-only or interpolated spectrogram);  

detecting at least one region of the voiced input spectrogram representation that exceeds an intensity threshold (Stamatopoulos [0360] The HRM is a modeling of the spectrum and, by extension, a modeling of the spectrogram. The modeling process receives a spectrum or spectrogram as an input. The HRM block 2725 may receive either the magnified spectrogram 2723 or the original spectrogram 2724 as an input. A peak detection algorithm is employed to detect the locations and the values of the magnitude spectrum peaks. The peaks that are above a threshold (e.g., the threshold can be set at −12 dB) are interpolated with a Blackman-Harris window. The interpolated spectrogram is the harmonic part of the model. In other words, the interpolated spectrogram comprising the harmonic part of the spectrum is the wheeze-only spectrogram. The residual part is obtained by subtracting the interpolated spectrum from the original one. The residual part comprises the residual airflow energies—subtracting out the residual part from the original spectrogram yields the wheeze-only or interpolated spectrogram); 

extracting a portion of the voiced input audio that maps to the detected region of the voiced input spectrogram (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409); 

and labeling the extracted portion as a respective de-noised voiced segment (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Karankevich in view of Stamatopoulos teaches identifying voiced input audio that corresponds to vocal actions. However, Karankevich in view of Stamatopoulos does not teach in response to one or more types of speech prompts.

“identifying voiced input audio that corresponds to vocal actions performed in response to one or more types of speech prompts.”

deCharms teaches

identifying voiced input audio that corresponds to vocal actions performed in response to one or more types of speech prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 

deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.

Regarding claim 5, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the computer-implemented method of claim 3.

Karankevich further teaches wherein receiving, from the one or more machine learning segmentation models, the respective segments, further comprises: 

identifying forced exhale input audio that corresponds to one or more forced exhales (Karankevich [0061] In some implementations, a processor (e.g., a respiration phase determination subsystem) segments the respiratory signal into inspiration and expiration phases and time periods when the respiratory sounds of particular respiratory classes (such as wheezing) were recorded, although in some implementations such a step may not be necessary or useful); 

feeding the forced exhale input audio into one or more rule-based segmentation machine learning models (Karankevich [0060] A processor (e.g., a respiratory sound automatic classification subsystem) classifies the respiratory sound using a machine learning model that includes neural network or other machine learning algorithms. The classification model generates values representing probabilities for correctly classifying the respiratory sound in each of the five respiratory sound classes: normal breathing, wheezes, rhonchi, fine crackles, and coarse crackles and in classes of artifacts including: heartbeat, membrane skin rubbing, and interference); 

receiving, from the one or more rule-based segmentation machine learning models, one or more de-noised forced exhale segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm)

and one or more inhale-background segments (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later)

wherein each inhale-background segment includes background noise (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later); 

receiving the one or more pause segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm); 

wherein each pause segment is based on audio of respective inhale and exhale occurrences with the background noise (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm); 

and wherein each inhale-background segment is based on audio of one or more inhale occurrences with the background noise (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later).

Karankevich teaches identifying exhale input audio however, Karankevich in view of Stamatopoulos in view of Odinak does not teach

identifying forced exhale input audio that corresponds to one or more forced exhales performed in response to one or more forced exhale prompts;

 wherein the one or more inhale-background segments are based on respective portions of input audio that corresponds to inhalations performed for the one or more forced exhales, 

receiving the one or more de-noised voiced segments and one or more pause segments. 

deCharms teaches

performed in response to one or more forced exhale prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase));Page 28 of 388720-059298

performed for the one or more forced exhales (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 


deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of Odinak further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.

Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms does not teach

receiving the one or more de-noised voiced segments and one or more pause segments.

Cho teaches

one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal).

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.


Regarding claim 6, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the computer-implemented method of claim 5.

Karankevich further teaches wherein predicting a physical state of the speaker of the human voice based on respective attributes of the extracted segments comprises: 

one or more de-noised forced exhale segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm); 

one or more pause segments (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded); 

one or more inhale-background segments (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later)

and predicting the physical state of the speaker based at least on the first plurality of features (Karankevich [0062] An expert system (whose inputs are connected to the outputs of the subsystems) determines a respiratory condition based on the respiratory sound classes, the answers to the diagnostic questionnaire and the periodic questionnaire, and the durations of the inspiration and expiration phases).

Karankevich does teach the one or more de-noised forced exhale segments, the one or more pause segments and one or more inhale-background segments. However, Karankevich does not teach extracting a first plurality of features from the one or more de-noised voiced segments.

“extracting a first plurality of features from the one or more de-noised voiced segments, the one or more de-noised forced exhale segments, the one or more pause segments and one or more inhale-background segments.”

Stamatopoulos teaches

extracting a first plurality of features from the one or more de-noised voiced segments, the one or more de-noised forced exhale segments, the one or more pause segments and one or more inhale-background segments (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Stamatopoulos teaches extracting a plurality of features. However, Karankevich in view of Stamatopoulos does not teach one or more de-noised voiced segments.

“extracting a first plurality of features from the one or more de-noised voiced segments, the one or more de-noised forced exhale segments, the one or more pause segments and one or more inhale-background segments.”

Cho teaches

one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal),  

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.

Regarding claim 7, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the computer-implemented method of claim 5.

Karankevich further teaches wherein receiving, from the one or more machine learning segmentation models, the respective segments, further comprises: 

sending the one or more pause segments, the one or more inhale-background segment and the background noise audio to the one or more machine learning segmentation models (Karankevich [0164] In some implementations, the classification model can use a SqueezeNET-based architecture (https://en.wikipedia.org/wiki/SqueezeNet) to reduce the size of the neural network and make the operation of the model faster. In such implementations, the input can be Mel spectrograms using less information); 

and receiving, from the one or more machine learning segmentation models, the one or more de-noised inhale segments and the one or more de-noised exhale segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded)).  

Regarding claim 11, Karankevich in view of Stamatopoulos in view of Odinak teaches the system of claim 9.

Karankevich teaches

wherein extract the respective segments comprises: 

determine a background noise calibration in the input audio data (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram), 

apply one or more machine learning segmentation models to the input audio data with respect to the background noise calibration (Karankevich [0060] A processor (e.g., a respiratory sound automatic classification subsystem) classifies the respiratory sound using a machine learning model that includes neural network or other machine learning algorithms. The classification model generates values representing probabilities for correctly classifying the respiratory sound in each of the five respiratory sound classes: normal breathing, wheezes, rhonchi, fine crackles, and coarse crackles and in classes of artifacts including: heartbeat, membrane skin rubbing, and interference); 

and receive, from the one or more machine learning segmentation models, the respective extracted segments isolated from the background noise calibration (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later), 

the respective extracted segments further comprising: 

one or more de-noised respiratory event inhale segments ("de-noised inhale segments"), one or more de-noised respiratory event exhale segments ("de-noised exhale segments") (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded).  

Karankevich in view of Stamatopoulos does not teach

the input audio data representing a recording of a session during which the speaker responded to one or more prompts; 

one or more de-noised voiced segments.

deCharms teaches

the input audio data representing a recording of a session during which the speaker responded to one or more prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 

deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.

Karankevich in view of Stamatopoulos in view of deCharms does not teach

one or more de-noised voiced segments.

Cho teaches

one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal).

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.

Regarding claim 12, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the system of claim 11, 

Karankevich teaches

wherein apply one or more machine learning segmentation models to the input audio data with respect to the background noise calibration comprises: 

convert the voiced input audio to a spectrogram representation of the voiced input data (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram).Page 32 of 388720-059298

Karankevich does not teach

identify voiced input audio that corresponds to vocal actions performed in response to one or more types of speech prompts; 

analyze one or more regions in the voiced input spectrogram representation according to respective differences in frequency signal intensities indicated in the voiced input spectrogram representation; 

detect at least one region of the voiced input spectrogram representation that exceeds an intensity threshold; 

extract a portion of the voiced input audio that maps to the detected region of the voiced input spectrogram; 

and label the extracted portion as respective de-noised voiced segment.  

Stamatopoulos teaches

analyze one or more regions in the voiced input spectrogram representation according to respective differences in frequency signal intensities indicated in the voiced input spectrogram representation (Stamatopoulos [0360] The HRM is a modeling of the spectrum and, by extension, a modeling of the spectrogram. The modeling process receives a spectrum or spectrogram as an input. The HRM block 2725 may receive either the magnified spectrogram 2723 or the original spectrogram 2724 as an input. A peak detection algorithm is employed to detect the locations and the values of the magnitude spectrum peaks. The peaks that are above a threshold (e.g., the threshold can be set at −12 dB) are interpolated with a Blackman-Harris window. The interpolated spectrogram is the harmonic part of the model. In other words, the interpolated spectrogram comprising the harmonic part of the spectrum is the wheeze-only spectrogram. The residual part is obtained by subtracting the interpolated spectrum from the original one. The residual part comprises the residual airflow energies—subtracting out the residual part from the original spectrogram yields the wheeze-only or interpolated spectrogram); 

detect at least one region of the voiced input spectrogram representation that exceeds an intensity threshold (Stamatopoulos [0360] The HRM is a modeling of the spectrum and, by extension, a modeling of the spectrogram. The modeling process receives a spectrum or spectrogram as an input. The HRM block 2725 may receive either the magnified spectrogram 2723 or the original spectrogram 2724 as an input. A peak detection algorithm is employed to detect the locations and the values of the magnitude spectrum peaks. The peaks that are above a threshold (e.g., the threshold can be set at −12 dB) are interpolated with a Blackman-Harris window. The interpolated spectrogram is the harmonic part of the model. In other words, the interpolated spectrogram comprising the harmonic part of the spectrum is the wheeze-only spectrogram. The residual part is obtained by subtracting the interpolated spectrum from the original one. The residual part comprises the residual airflow energies—subtracting out the residual part from the original spectrogram yields the wheeze-only or interpolated spectrogram); 

extract a portion of the voiced input audio that maps to the detected region of the voiced input spectrogram (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409); 

and label the extracted portion as respective de-noised voiced segment (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Karankevich in view of Stamatopoulos does not teach

identify voiced input audio that corresponds to vocal actions performed in response to one or more types of speech prompts.  

deCharms teaches

the input audio data representing a recording of a session during which the speaker responded to one or more prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 

deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.


Regarding claim 13, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the system of claim 11, 

Karankevich teaches

wherein receive, from the one or more machine learning segmentation models, the respective segments further, comprises: 

identifying forced exhale input audio that corresponds to one or more forced exhales (Karankevich [0061] In some implementations, a processor (e.g., a respiration phase determination subsystem) segments the respiratory signal into inspiration and expiration phases and time periods when the respiratory sounds of particular respiratory classes (such as wheezing) were recorded, although in some implementations such a step may not be necessary or useful); 

feeding the forced exhale input audio into one or more rule-based segmentation machine learning models (Karankevich [0060] A processor (e.g., a respiratory sound automatic classification subsystem) classifies the respiratory sound using a machine learning model that includes neural network or other machine learning algorithms. The classification model generates values representing probabilities for correctly classifying the respiratory sound in each of the five respiratory sound classes: normal breathing, wheezes, rhonchi, fine crackles, and coarse crackles and in classes of artifacts including: heartbeat, membrane skin rubbing, and interference); 

receiving, from the one or more rule-based segmentation machine learning models, one or more de-noised forced exhale segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm)

wherein the one or more inhale-background segments (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later)

wherein each inhale-background segment includes background noise (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later); 

receiving the one or more pause segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm); 

wherein each pause segment is based on audio of respective inhale and exhale occurrences with the background noise (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm); 

and wherein each inhale-background segment is based on audio of one or more inhale occurrences with the background noise (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later);

wherein predict a physical state of the speaker of the human voice (Karankevich [0062] An expert system (whose inputs are connected to the outputs of the subsystems) determines a respiratory condition based on the respiratory sound classes, the answers to the diagnostic questionnaire and the periodic questionnaire, and the durations of the inspiration and expiration phases)

based on respective attributes of the extracted segments (Karankevich [0062] An expert system (whose inputs are connected to the outputs of the subsystems) determines a respiratory condition based on the respiratory sound classes, the answers to the diagnostic questionnaire and the periodic questionnaire, and the durations of the inspiration and expiration phases).

comprises: 

the features comprising one or more de-noised forced exhale segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm), 

the one or more pause segments (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded)

and one or more inhale-background segments (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later)

and predict the physical state of the speaker based at least on the first plurality of features (Karankevich [0062] An expert system (whose inputs are connected to the outputs of the subsystems) determines a respiratory condition based on the respiratory sound classes, the answers to the diagnostic questionnaire and the periodic questionnaire, and the durations of the inspiration and expiration phases).

Karankevich does teach the one or more de-noised forced exhale segments, the one or more pause segments and one or more inhale-background segments. However, Karankevich does not teach 

identifying forced exhale input audio that corresponds to one or more forced exhales performed in response to one or more forced exhale prompts;

wherein the one or more inhale-background segments are based on respective portions of input audio that corresponds to inhalations performed for the one or more forced exhales, 

receiving the one or more de-noised voiced segments and one or more pause segments;

extract a first plurality of features from the one or more de-noised voiced segments.

Stamatopoulos teaches

extract a first plurality of features (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Karankevich teaches identifying exhale input audio and Stamatopoulos teaches extracting a plurality of features however, Karankevich in view of Stamatopoulos in view of Odinak does not teach

identifying forced exhale input audio that corresponds to one or more forced exhales performed in response to one or more forced exhale prompts;

 wherein the one or more inhale-background segments are based on respective portions of input audio that corresponds to inhalations performed for the one or more forced exhales, 

receiving the one or more de-noised voiced segments and one or more pause segments;

extract a first plurality of features from the one or more de-noised voiced segments.

deCharms teaches

performed in response to one or more forced exhale prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase));Page 28 of 388720-059298

performed for the one or more forced exhales (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 


deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of Odinak further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.

Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms does not teach

receiving the one or more de-noised voiced segments and one or more pause segments

extract a first plurality of features from the one or more de-noised voiced segments.

Cho teaches

one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal).

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.



Regarding claim 15, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the system of claim 13, 

Karankevich teaches

wherein receive, from the one or more machine learning segmentation models, the respective segments further, comprises: 

send the one or more pause segments, the one or more inhale-background segment and the background noise audio to the one or more machine learning segmentation models (Karankevich [0164] In some implementations, the classification model can use a SqueezeNET-based architecture (https://en.wikipedia.org/wiki/SqueezeNet) to reduce the size of the neural network and make the operation of the model faster. In such implementations, the input can be Mel spectrograms using less information);

and receive, from the one or more machine learning segmentation models, the one or more de- noised inhale segments and the one or more de-noised exhale segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm).  

Regarding claim 18, Karankevich in view of Stamatopoulos in view of Odinak teaches the computer program product of claim 17, 

Karankevich teaches

wherein extract the respective segments comprises: 

determine a background noise calibration in the input audio data (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram), 

apply one or more machine learning segmentation models to the input audio data with respect to the background noise calibration (Karankevich [0060] A processor (e.g., a respiratory sound automatic classification subsystem) classifies the respiratory sound using a machine learning model that includes neural network or other machine learning algorithms. The classification model generates values representing probabilities for correctly classifying the respiratory sound in each of the five respiratory sound classes: normal breathing, wheezes, rhonchi, fine crackles, and coarse crackles and in classes of artifacts including: heartbeat, membrane skin rubbing, and interference); 

and receive, from the one or more machine learning segmentation models, the respective extracted segments isolated from the background noise calibration (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later), 

the respective extracted segments further comprising: 

one or more de-noised respiratory event inhale segments ("de-noised inhale segments"), one or more de-noised respiratory event exhale segments ("de-noised exhale segments") (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded).  

Karankevich in view of Stamatopoulos does not teach

the input audio data representing a recording of a session during which the speaker responded to one or more prompts; 

one or more de-noised voiced segments.

deCharms teaches

the input audio data representing a recording of a session during which the speaker responded to one or more prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 

deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.

Karankevich in view of Stamatopoulos in view of deCharms does not teach

one or more de-noised voiced segments.

Cho teaches 
one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal).

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.

Regarding claim 19, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the computer program product of claim 18, 

Karankevich teaches

wherein apply one or more machine learning segmentation models to the input audio data with respect to the background noise calibration comprises: 

convert the voiced input audio to a spectrogram representation of the voiced input data (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram).

Karankevich does not teach

identify voiced input audio that corresponds to vocal actions performed in response to one or more types of speech prompts; 

analyze one or more regions in the voiced input spectrogram representation according to respective differences in frequency signal intensities indicated in the voiced input spectrogram representation; 

detect at least one region of the voiced input spectrogram representation that exceeds an intensity threshold; 

extract a portion of the voiced input audio that maps to the detected region of the voiced input spectrogram; 

and label the extracted portion as respective de-noised voiced segment.  

Stamatopoulos teaches

analyze one or more regions in the voiced input spectrogram representation according to respective differences in frequency signal intensities indicated in the voiced input spectrogram representation (Stamatopoulos [0360] The HRM is a modeling of the spectrum and, by extension, a modeling of the spectrogram. The modeling process receives a spectrum or spectrogram as an input. The HRM block 2725 may receive either the magnified spectrogram 2723 or the original spectrogram 2724 as an input. A peak detection algorithm is employed to detect the locations and the values of the magnitude spectrum peaks. The peaks that are above a threshold (e.g., the threshold can be set at −12 dB) are interpolated with a Blackman-Harris window. The interpolated spectrogram is the harmonic part of the model. In other words, the interpolated spectrogram comprising the harmonic part of the spectrum is the wheeze-only spectrogram. The residual part is obtained by subtracting the interpolated spectrum from the original one. The residual part comprises the residual airflow energies—subtracting out the residual part from the original spectrogram yields the wheeze-only or interpolated spectrogram);  

detect at least one region of the voiced input spectrogram representation that exceeds an intensity threshold (Stamatopoulos [0360] The HRM is a modeling of the spectrum and, by extension, a modeling of the spectrogram. The modeling process receives a spectrum or spectrogram as an input. The HRM block 2725 may receive either the magnified spectrogram 2723 or the original spectrogram 2724 as an input. A peak detection algorithm is employed to detect the locations and the values of the magnitude spectrum peaks. The peaks that are above a threshold (e.g., the threshold can be set at −12 dB) are interpolated with a Blackman-Harris window. The interpolated spectrogram is the harmonic part of the model. In other words, the interpolated spectrogram comprising the harmonic part of the spectrum is the wheeze-only spectrogram. The residual part is obtained by subtracting the interpolated spectrum from the original one. The residual part comprises the residual airflow energies—subtracting out the residual part from the original spectrogram yields the wheeze-only or interpolated spectrogram);

extract a portion of the voiced input audio that maps to the detected region of the voiced input spectrogram (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409); 

and label the extracted portion as respective de-noised voiced segment (Stamatopoulos [0432] The next step is to store all the extracted spectrograms and descriptor, wherein the values for each of the respiratory recordings are stored separately in the extracted features database at block 3409).

Stamatopoulos is considered to be analogous to the claimed invention because it is in the same field of using neural networks to determine lung pathology. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich further in view of Stamatopoulos to allow for analyzing voiced and unvoiced input signals. Doing so would allow for performing respiratory acoustic analysis that uses inexpensive and readily available means for capturing and reporting breathing activity.

Karankevich in view of Stamatopoulos does not teach

identify voiced input audio that corresponds to vocal actions performed in response to one or more types of speech prompts.

deCharms teaches

identify voiced input audio that corresponds to vocal actions performed in response to one or more types of speech prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 

deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.


Regarding claim 20, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the computer program product of claim 18, 

Karankevich teaches

wherein receive, from the one or more machine learning segmentation models, the respective segments further comprises: 

identifying forced exhale input audio that corresponds to one or more forced exhales (Karankevich [0061] In some implementations, a processor (e.g., a respiration phase determination subsystem) segments the respiratory signal into inspiration and expiration phases and time periods when the respiratory sounds of particular respiratory classes (such as wheezing) were recorded, although in some implementations such a step may not be necessary or useful); 

feeding the forced exhale input audio into one or more rule-based segmentation machine learning models (Karankevich [0060] A processor (e.g., a respiratory sound automatic classification subsystem) classifies the respiratory sound using a machine learning model that includes neural network or other machine learning algorithms. The classification model generates values representing probabilities for correctly classifying the respiratory sound in each of the five respiratory sound classes: normal breathing, wheezes, rhonchi, fine crackles, and coarse crackles and in classes of artifacts including: heartbeat, membrane skin rubbing, and interference); 

receiving, from the one or more rule-based segmentation machine learning models, one or more de-noised forced exhale segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm)

and one or more inhale-background segments (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later)

wherein each inhale-background segment includes background noise (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later); 

receiving the one or more pause segments (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm); 

wherein each pause segment is based on audio of respective inhale and exhale occurrences with the background noise (Karankevich [0076] Using an audio pre-processing subsystem, noise suppression, frequency filtering, and frequency-time representation visualization is performed by calculating Mel spectrum and Mel frequency cepstral coefficients; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, [pause segment maps to the inspiration and expiration phases with background noise] as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded; [0081] In some implementations, the system for screening, monitoring, and diagnosing respiratory organs diseases (containing series-connected signal sampling unit for recording the respiratory sound audio signal with an auscultation tool, an audio signal pre-processing subsystem, a respiratory sounds automatic classification subsystem, a respiration phase determination subsystem based on the voice activity detection algorithm); 

and wherein each inhale-background segment is based on audio of one or more inhale occurrences with the background noise (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later).

Karankevich teaches identifying exhale input audio however, Karankevich in view of Stamatopoulos in view of Odinak does not teach

identifying forced exhale input audio that corresponds to one or more forced exhales performed in response to one or more forced exhale prompts;

 wherein the one or more inhale-background segments are based on respective portions of input audio that corresponds to inhalations performed for the one or more forced exhales, 

receiving the one or more de-noised voiced segments and one or more pause segments. 

deCharms teaches

performed in response to one or more forced exhale prompts (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase));Page 28 of 388720-059298

performed for the one or more forced exhales (deCharms [0046] Once the user selects a level (or the software selects one for them based on their progress to that point), the user may be provided with a programmed sequence of instructions, or stimuli intended to convey something that the user should do. In this simple example, the user may be provided with the instruction to engage in a sequence of two alternating mental exercises, and/or the user may be instructed to concurrently breathe in on one phase of the sequence and out on the next (or to breathe in and out on each phase)).Page 28 of 388720-059298 


deCharms is considered to be analogous to the claimed invention because it is in the same field of breathing analysis software. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of Odinak further in view of deCharms to allow for providing prompts to users regarding breathing instructions. Doing so would allow for using a software platform which may also have the ability to perform optional continuous, automated treatment plan monitoring.

Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms does not teach

receiving the one or more de-noised voiced segments and one or more pause segments.

Cho teaches

one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal).

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.

Regarding claim 22, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the computer-implemented method of claim 3.
Karankevich teaches
comprising: 
and wherein extracting the respective segments comprises: 
performing concurrent 
audio segmentation (Karankevich [0060] A processor (e.g., a respiratory sound automatic classification subsystem) classifies the respiratory sound using a machine learning model that includes neural network or other machine learning algorithms. The classification model generates values representing probabilities for correctly classifying the respiratory sound in each of the five respiratory sound classes: normal breathing, wheezes, rhonchi, fine crackles, and coarse crackles and in classes of artifacts including: heartbeat, membrane skin rubbing, and interference)
and background noise cancellation (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram), 
according to the background noise calibration (Karankevich [0059] A processor (e.g., an audio pre-processing subsystem) performs noise suppression, frequency filtering, and frequency-time representation charting by calculating Mel spectrogram), 
by applying one or more machine learning segmentation models to the input audio data (Karankevich [0060] A processor (e.g., a respiratory sound automatic classification subsystem) classifies the respiratory sound using a machine learning model that includes neural network or other machine learning algorithms. The classification model generates values representing probabilities for correctly classifying the respiratory sound in each of the five respiratory sound classes: normal breathing, wheezes, rhonchi, fine crackles, and coarse crackles and in classes of artifacts including: heartbeat, membrane skin rubbing, and interference).
Karankevich does not teach

wherein a respective instance of a pre-defined vocal action performed by a speaker comprises one of a pre-defined sustained syllable sound or a reading of a pre-defined passage of words.
Odinak teaches
wherein a respective instance of a pre-defined vocal action performed by a speaker comprises one of a pre-defined sustained syllable sound or a reading of a pre-defined passage of words (Odinak [0008] The purported identity of the caller as the customer is verified by the processor, including: prompting the caller to speak a text of the at least one phrase; receiving the text-prompted speech from the caller; determining one or more of the features of the text-prompted speech; comparing the text-prompted speech features to one of the stored speech features sets associated with the customer; prompting the caller to speak any additional phrase when the purported identity is not verified as the matching customer based on the comparison; receiving an additional speech comprising the additional phrase from the caller; determining one or more of features of the additional phrase; combining the additional speech features with the predefined speech features; comparing the combined speech features to the one stored speech feature set; and verifying the purported identity of the caller as the customer based on the comparison of the combined features to the one stored speech features set).

Odinak is considered to be analogous to the claimed invention because it is in the same field of analyzing the voice characteristics of a user. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos further in view of Odinak to allow for prompting the user to speak a specific text or phrase. Doing so would allow for comparison of the current state of a user to a known set of states for the purpose of diagnostics.



Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Karankevich in view of Stamatopoulos, in view of Odinak, in view of deCharms in view of Cho in further view of Kumar et al. (US Patent Pub. No. 2018/0247715), hereinafter Kumar.

Regarding claim 8, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the computer-implemented method of claim 7.

Karankevich further teaches wherein predicting a physical state of the speaker of the human voice based on respective attributes of the extracted segments comprises: 

applying a machine learning classifier model to the one or more denoised inhale segments, the one or more de-noised exhale segments (Karankevich [0004] Implementations may include one or a combination of two or more of the following features. The time-frequency domain graphical representation includes a Mel spectrogram. The time-frequency domain graphical representation includes a color Mel spectrogram. The classifier model includes a neural network model. An expert system is used for inferring the respiratory condition of the subject based at least on the sound class determined by the classifier model; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded);

receiving, from the machine learning classifier model, classified segments comprising: 

one or more speech segments, one or more cough segments and one or more wheezing segments (Karankevich [0164] In some implementations, the classification model can use a SqueezeNET-based architecture (https://en.wikipedia.org/wiki/SqueezeNet) to reduce the size of the neural network and make the operation of the model faster. In such implementations, the input can be Mel spectrograms using less information. For example, for pre-processing and classification of a segment of a sound record, the pre-processing and classification can be achieved within the timeframe of about 80 ms and the size of the file can be in the order of 2.85 Mb; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference).

Karankevich in view of Stamatopoulos in view of deCharms does not teach

the one or more de-noised voiced segments,

extracting a second plurality of features from the respective classified segments

and predicting the physical state of the speaker based at least on the second plurality of features.

Cho teaches 

the one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal).

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.

Karankevich in view of Stamatopoulos in view of deCharms in view of Cho does not teach

extracting a second plurality of features from the respective classified segments

and predicting the physical state of the speaker based at least on the second plurality of features.

Kumar teaches

extracting a second plurality of features from the respective classified segments (Kumar [0011] 6) using the measurement values of the second plurality of event features for the second plurality of events of interest to define locations for the second plurality of events of interest in the second feature coordinate space to form a second distribution in the second feature coordinate space indicative of a second event population of interest); 

and predicting the physical state of the speaker based at least on the second plurality of features (Kumar [0011] (c) applying, by the computer, an artificial neural network detection structure to the distribution in the second feature coordinate space indicative of the second event population of interest, the detection structure employing an artificial neural network to correlate the distribution in the second feature coordinate space indicative of the second event population of interest with a distribution of a second reference event population).

Kumar is considered to be analogous to the claimed invention because it is in the same field of neural network analysis classification and diagnosis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of deCharms in view of Cho further in view of Kumar to allow for a second plurality of features. Doing so would allow for performing an algorithmic calculation to improve a medical diagnostic result.


Regarding claim 16, Karankevich in view of Stamatopoulos in view of Odinak in view of deCharms in view of Cho teaches the system of claim 15.

Karankevich teaches

wherein predict a physical state of the speaker of the human voice based on respective attributes of the extracted segments comprises: 

apply a machine learning classifier model to the one or more denoised inhale segments, the one or more de-noised exhale segments (Karankevich [0004] Implementations may include one or a combination of two or more of the following features. The time-frequency domain graphical representation includes a Mel spectrogram. The time-frequency domain graphical representation includes a color Mel spectrogram. The classifier model includes a neural network model. An expert system is used for inferring the respiratory condition of the subject based at least on the sound class determined by the classifier model; [0078] Using a respiration phase determination subsystem, in some implementations, the source signal is split into inspiration and expiration phases, as well as into the time periods when the characteristic respiratory sound, such as wheezing, was recorded); 

receive, from the machine learning classifier model, classified segments (Karankevich [0037] As shown in FIG. 2, the respiratory condition technology also includes a processor 210 that has one or more respiratory sound processors 212 to process the respiratory signals from the sound sensors. The sound processor 212 can perform a variety of signal processing functions including noise reduction using, for example, forward and inverse wavelet transforms, frequency filtering using, for example, a finite impulse response filter (FIR filter), signal normalization, elimination of constant amplitude offset (DC offset), time-frequency representation charting, such as Mel spectrogram calculation, and other techniques, and combinations of them. In general, the sound processor 212 can perform any processing that is necessary or desirable to produce high quality low-noise respiratory sound data 214 in a form useful for later processing steps, including those performed by a machine learning model 216, an expert system 218, and a database manager 220, among others. An example of a sound processor is the audio pre-processing subsystem mentioned later)

comprising: 

one or more speech segments, one or more cough segments and one or more wheezing segments (Karankevich [0164] In some implementations, the classification model can use a SqueezeNET-based architecture (https://en.wikipedia.org/wiki/SqueezeNet) to reduce the size of the neural network and make the operation of the model faster. In such implementations, the input can be Mel spectrograms using less information. For example, for pre-processing and classification of a segment of a sound record, the pre-processing and classification can be achieved within the timeframe of about 80 ms and the size of the file can be in the order of 2.85 Mb; [0077] Using a respiratory sound automatic classification subsystem, classification evaluation of the respiratory sound is carried out using neural network machine learning algorithms, which results in the probability of assigning the respiratory sound to one of the five classes: normal breathing, wheezes, rhonchi, coarse crackles, and fine crackles, for example, and also to classes of artifacts including: heartbeat, membrane skin rubbing, and interference). 

Karankevich in view of Stamatopoulos in view of deCharms does not teach

the one or more de-noised voiced segments, 

extract a second plurality of features from the respective classified segments; 

and predict the physical state of the speaker based at least on the second plurality of features.  

Cho teaches 

the one or more de-noised voiced segments (Cho [0050] According to an embodiment, the canceler module 144 may include a plurality of filters. The canceler module 144 may use the plurality of filters and obtain a noise-removed voice signal among audio signals based on the first and second signals. For example, the canceler module 144 may remove components of the first signal from the second signal. Because the second signal includes a voice signal and noise, and the first signal includes only noise, the canceler module 144 may obtain a noise-removed voice signal by removing the first signal from the second signal).

Cho is considered to be analogous to the claimed invention because it is in the same field of obtaining audio signals corresponding to a human voice. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of deCharms further in view of Cho to allow for removing noise from a voice signal. Doing so would provide for obtaining a usable voice signal that was otherwise obscured by environmental noise.

Karankevich in view of Stamatopoulos in view of deCharms in view of Cho does not teach

extract a second plurality of features from the respective classified segments; 

and predict the physical state of the speaker based at least on the second plurality of features.  

Kumar teaches

extract a second plurality of features from the respective classified segments (Kumar [0011] 6) using the measurement values of the second plurality of event features for the second plurality of events of interest to define locations for the second plurality of events of interest in the second feature coordinate space to form a second distribution in the second feature coordinate space indicative of a second event population of interest); 

and predict the physical state of the speaker based at least on the second plurality of features (Kumar [0011] (c) applying, by the computer, an artificial neural network detection structure to the distribution in the second feature coordinate space indicative of the second event population of interest, the detection structure employing an artificial neural network to correlate the distribution in the second feature coordinate space indicative of the second event population of interest with a distribution of a second reference event population).

Kumar is considered to be analogous to the claimed invention because it is in the same field of neural network analyisis classification and diagnosis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Karankevich in view of Stamatopoulos in view of deCharms in view of Cho further in view of Kumar to allow for a second plurality of features. Doing so would allow for performing an algorithmic calculation to improve a medical diagnostic result.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 7:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL J MUELLER/Examiner, Art Unit 2657                                                                                                                                                                                                        
/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657