Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 9 and 16 are independent.  The independent Claims and dependent Claims 2, 10, and 17 have been amended.  
This Application was published as U.S. 20200098384.  
Priority 20 September 2018.
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection.
The instant Application establishes a contrast between collecting Command words directed at a PDA as the words to be used for detecting the pulmonary condition of a particular user of the PDA and passive collection of voice of the same user.  [0032] – [0034].  Accordingly to the instant Application, Commands are not private, are few, and are usually uttered close to the PDA and therefore may have less noise associated with the audio.  Thus, the use of a PDA is key in this Application because the “phrase bank/dictionary,” that is used for Word Spotting, is made up from the Commands used for functions of the PDA and because the use of PDA makes the training possible for variations of these same commands as spoken by the particular patient who owns and uses the PDA.  See [0054] that says because the number of COMMANDS is limited, the system is TRAINED to store the variations of the COMMAND as pronounced by the patient who owns and uses the PDA.  
The current amendments establish a vague correlation to commands of a PDA whereas the central role of the PDA, as set forth in the Specification, does not come across in the Claims.  There is one person with his personal device which is getting 
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 10/25/2021 has been entered.  This is a second RCE.
Claim Objections
Claims 1, 9, and 16 are objected to because of the following informalities.  

1.  A method for pulmonary condition monitoring, comprising: 
receiving initial phrase criteria comprising jitter or shimmer for selecting uttered phrases meeting the initial phrase criteria; 
selecting, a phrase from an utterance of a user of an electronic device during interaction with a voice assistant, the selection being based on using the received initial phrase criteria;
generating a phrase bank and storing the selected phrase from the utterance in the phrase bank;
training a baseline model using the stored selected phrase from the phrase bank, wherein the selected phrase is a voice assistant command;
capturing, by the baseline model, correlation and dependencies between one or [[a condition]] one or more pulmonary conditions;
model training
use of the trained model
spotting the stored selected phrase from the phrase bank [[for]] in an additional utterance during another interaction with the voice assistant;
identifying at least one speech feature that is associated with the one or more pulmonary conditions within the spotted [[stored selected]] phrase;
determining a pulmonary condition based on analysis of the at least one speech feature; and

model training
updating the baseline model based on capturing the correlation and dependencies between the at least one feature and the additional utterance for the determined pulmonary condition.

Appropriate correction is required.
Response to Arguments
Applicant’s arguments are directed to the material added by amendment which are mapped to a new reference Hauptman.

Note that while 4 references are currently combined, it is only to cover every detail of the Claim.  Otherwise, most of the features of the independent Claims are taught by Kim.  Additionally, a combination is warranted when the Claim broadly and loosely states the various limitations instead of establishing a concrete tie among the limitations.  For example, as is, the Claim is not directed to a method of detecting a pulmonary condition by the use of a PDA, that belongs to and is used by a single patient, and where all the various steps of the Claims rely on the collection of speech by the PDA from the same single user and subsequent update of the model based on further speech from the same single user where the model is developed on only the Commands issued by the user to the PDA and updated by the detection of further Commands by the same use to the same PDA. 
Kim (U.S. 20180322961) teaches using shimmer or jitter as acoustic features that are used for selection of prompts to be used for diagnosis of a medical condition.  The “prompts” in Kim are defined as: 
“[0164] In certain embodiments, the interface module 1108 (e.g., in cooperation with the query module 1102), may prompt a user to recite a passage (e.g., a passage including sentences, sets of words, letters, numbers, monosyllables, or the like)….”
In other words, generating a phrase bank or corpus of phrases that can be used for diagnosis of a medical condition based on the fact that these phrases have particular shimmer or jitter features is taught by Kim.
Kim teaches in Figures 5 and 7 that Acoustic Features (210, 520) and Language Features (230, 530) are determined to yield Features (“feature selection 540” “select a plurality of features … 760”) that are best for determining a physical condition of the speaker who provided the speech.  These selected features (540, 760) have the most diagnostic value (Figures 6A and 6B).  The “prompts” selected at 860 in Figure 8 are selected/generated in order to elicit speech from the user that includes the selected acoustic features (namely particular shimmer or jitter).  In some situations, like shown in Figure 4, the “prompts” are questions that cause the user to answer the prompt by providing speech.  In other situations, the prompt tells the user to read from a passage. “[0115] … Prompts that ask a user to read a presented passage will generally result in speech of the words in the passage ….”  In this situation, “selecting a prompt 860” and using it (870) is selecting “prompts” / “phrases” that are most revealing in terms of their acoustic and language features (shimmer or jitter or both) for diagnosing a medical condition.  
Further, presenting the “prompts” / “phrases” to the patient so that he utters the “prompt” / “phrase” and then his particular condition can be diagnosed is also taught by Kim.  See Figure 12 of Kim.  In Figure 12, when the model trained with such prompts/ phrases is used to diagnose the patient, “Question the user 1202” can be asking the user to say a particular phrase and then “receive response 1204” analyzes the received phrase for particular acoustic and language features including shimmer and jitter.  “[0172] FIG. 12 depicts one embodiment of a method 1200 for medical assessment based on voice. The method 1200 begins, and a query module 1102 questions 1202 a user (e.g., audibly from a speaker of a computing device 102, textually on an electronic screen of a computing device 102, or the like).”  “[0173] A response module 1104 receives 1204 a response of the user (e.g., a verbal response from a microphone of a computing device 102 ….”
Kim does not teach the “phrase spotting” of the instant Application which occurs sua sponte by the device and upon the user issuing a command to the device.  Kim, instead of looking for the particularly appropriate phrases in the speech of the user, presents the proper prompts that include the appropriate phrase to the user/patient for the user/patient to repeat it. 
Kim, Figure 8, 820, obtains speech/utterance from the training corpus. [0120].  Kim teaches the user of a PDA (Figure 1A, one of the devices 102 is a PDA) but the Training utterance in Kim is not from the user of the PDA.  Kim trains its model using a corpus of labeled speech data and does not collect the speech of the user for continuous training. (Kim, “[0055] … Training corpus 510 may include any appropriate information, such as speech data of multiple people with and without the medical condition, a label indicating whether or not person has the medical condition, and any other information described herein.”)
Kim does not teach that the model is trained on the commands directed to the assistant by the user.
Also, while Kim is directed to detection of medical conditions, which would arguably include pulmonary conditions, it does not specify detection of pulmonary conditions.
Lotan has significant overlap with Kim and is specifically directed to the detection of pulmonary conditions from the voice of the patient and further strongly suggests continuous training based on the collection of the speech of the same single user (as opposed to a corpus of different users).  Lotan teaches the use of smartphones as the user device (Figure 1, [0051]) but does not specify that the commands directed to the PDAs are exclusively used for the training.  
Hauptman teaches that “commands” by the user to a speakerphone or a car can be used for training of a model for detection of cardiac issues.
Hoffmeister is directed to “Local speech recognition of frequent utterances” where the “frequent utterances” include “commands” to the user’s PDA (see Abstract) and expressly teaches “updating the speech model.”  See, e.g. claim 10 Figure 10, 1014.

Arguments regarding other independent Claims 9 and 16 and the various dependent Claims rely on the arguments with respect to Claim 1 and are addressed by the modified mapping.

Support:
Claim 1 as amended provides:
1.  A method for pulmonary condition monitoring, comprising: 
receiving initial phrase criteria comprising jitter or shimmer for selecting uttered phrases meeting the initial phrase criteria; 
selecting, a phrase from an utterance of a user of an electronic device during interaction with a voice assistant, the selection being based on using the received initial phrase criteria;
generating a phrase bank and storing the selected phrase from the utterance in the phrase bank;
training a baseline model using the stored selected phrase from the phrase bank, wherein the selected phrase is a voice assistant command;
capturing, by the baseline model, correlation and dependencies between one or more speech features and the stored selected phrase for a condition;
model training
use of the trained model
spotting the stored selected phrase from the phrase bank for an additional utterance during another interaction with the voice assistant;
identifying at least one speech feature that is associated with one or more pulmonary conditions within the spotted stored selected phrase;
determining a pulmonary condition based on analysis of the at least one speech feature; and

model training
updating the baseline model based on capturing the correlation and dependencies between the at least one feature and the additional utterance for the determined pulmonary condition.

Other independent Claims have similar scope.
Support for the Claim may be found in the following parts of the instant Application:
 [0052] FIG. 3 shows an architecture 300 for voice assistant based health (e.g., pulmonary, etc.) monitoring (e.g., the voice assistant health monitoring processing 131, FIG. 2) and analysis, according to some embodiments. The architecture 300 includes phrase spotting processing 310, a baseline (longitudinal) model 320, phrase bank/dictionary (knowledge base) 330, feature generator 340 and longitudinal analysis processing 350. In some embodiments, consistent non-sensitive voice data from a user 305 is received by the voice assistant included in device 120. The voice assistant converts the speech to voice data that is input to the phrase spotting processing 310, and determines context 360 (e.g., device usage, command content, environment, weather, etc.), which is input to the longitudinal analysis processing 350. The phrase spotting processing 310 determines a selected (spotted) phrase 315 that is stored in the phrase bank/dictionary 330. An array of phrases and features 317 (such as speech, sound event, vowel placement, etc.; provided from the feature generator 340) are input to the baseline model 320. The feature generator 340 includes vowel detection processing 341, shimmer processing 342, pause detection processing 343, jitter processing 344 and other pulmonary (or other health issue) speech feature(s) detection processing 345. The baseline model 320 provides phrase data 316 input to the feature generator 340. The features 317 output from the feature generator are input to the longitudinal analysis processing 350. The longitudinal analysis processing outputs a longitudinal outcome variation that is input to the phrase spotting processing 310, and outputs patient outcomes 370. The processing for the architecture 300 is described below in further detail.

    PNG
    media_image1.png
    578
    784
    media_image1.png
    Greyscale
 
	[0054] Returning to FIG. 3, as previously discussed above, the commands or set of commands communicated with a voice assistant is limited. Therefore, some embodiments store the common important phrases (combination of words) in the phrase bank/dictionary 330 as a reference for the growing baseline model 320. In other words, some embodiments enable capture of not only the regular patterns, but also the deviation from the pattern baselines for speech and sound events of the patient with respect to each recorded phrase in the phrase bank/dictionary 330. 

    PNG
    media_image2.png
    321
    583
    media_image2.png
    Greyscale

[0055] FIG. 5 shows a process flow for phrase spotting criteria (for the phrase spotting processing 310, FIG. 3) for health assessment, according to some embodiments. In some embodiments, the focus is for pulmonary patients. Therefore, specific phrase 316 needs to be spotted (by the phrase spotting processing 310, FIG. 3) that are beneficial for feature extraction (by the feature generator 340, FIG. 3) and identifying the user's pulmonary condition. In one or more embodiments, specific initial phrase criteria 520 (such as pause time, pronunciation, breathing time, volume, speech features (jitter and shimmer) is used for selecting the phrases spotted (by the phrase spotting processing 310) using the voice assistant. It should be noted that shimmer refers to the consistency of localized vocal amplitude for voiced sounds, and jitter refers to consistency of localized periodicity of the same. These initial phrase criteria 520 are to be designed in a way that selection of phrases (from the phrase bank/dictionary 330) are most informative for pulmonary assessment. Examples could be phrases with high percentage of vowels or pause time. In some embodiments, the phrase criteria 520 are updated over time as the longitudinal model 530 grows over time.

See also:
[0032] Some embodiments take advantage of a voice assistant (e.g., personal assistant (PA), virtual assistant, etc.), employing features, such as recorded speech, usage information and contextual data to generate a sparse but private and consistent data source for semi-continuous monitoring of patients (such as pulmonary patients, etc.). Data mining and machine learning processing are employed to analyze theses sources of data and track the changes for health symptoms and alert the patient and caregiver when needed. In one or more embodiments, audio commands that are used to communicate and interact with voice assistants are used as a resource for assessment of health status of the patient, and prediction of, for example, pulmonary-related deadly events such as asthma attack and COPD exacerbation. Voice assistants provide a unique source of data. In some embodiments, the processing for "phrase spotting" selects specific phrases based on an initial criterion and the feedback that is retrieved from longitudinal analysis of patient status. These selected phrases are used as the sources for speech feature generation and data analysis. The characteristics of vowels, pause patterns and syllable segments in these selected phrases are stored (e.g., in a phrase bank, knowledge base, etc.) and are used to generate a model for location of interesting parts of the phrase. The model continuously updates itself when receiving the same phrase over and over. Having a consistent continuous stream of these phrases ensures that this model converges to a consistent state for each subject (i.e., a user of an electronic device, a pulmonary patient, etc.), thus realizing a personalized model. 
[0033] In some embodiments, a model for location continuously updates itself using an online learning scheme. The continuously updating model is also utilized for detecting vowels, pause patterns and syllable segments of the audio, which then is fed to speech feature generation processing. These features are then fed to longitudinal analysis processing where patient status deviation is estimated. If these variations in the patient's status appear to be alarming, a notification will be sent to the clinician, and if necessary, clinicians will intervene any probable dangerous health-related events. In one or more embodiments, some of the issues of passive audio collection are resolved: voice assistant audio data is not privacy-sensitive in comparison to passive audio data that is continuously recorded. It is an audio data that the subject is willing to release and the contents are usually multitude of voice commands, which are not private and sensitive material. The passively recorded audio data comes with a variety of speech phrases and words. However, the proportion of consistent, repeated phrases is much lower when compared to the set of voice assistant audio commands. Therefore, tracking speech feature deviation is easier when it comes to voice assistant audio data. Running a passive continuous audio collection can also be relatively high demand in terms of battery lifetime, processing and data storage. On the other hand, if voice assistant data is utilized, only a limited set of smartly selected phrases are needed. Moreover, collection of information about some specific parts of the audio, the entire audio stream is not required to be collected. It is noted that voice assistant audio data is not privacy-sensitive in comparison to passive audio recorded data. Voice assistant audio includes audio data that a subject is willing to release and the contents are usually some voice command, which is not private and sensitive material. Unlike conventional systems, one or more embodiments take advantage of the fact that users are already giving consent for the recording of the voice for voice assistants. Therefore, no additional permission for recording is needed (e.g., reliance on the recording of existing voice commands instead of recording patients all the time). 

Claims 2, 10, and 17 are amended similarly:

2. The method of claim 1, wherein:
the at least one speech feature includes one of: pause within the phrase, vowel within the phrase, or a combination thereof;
the analysis of the at least one speech feature includes comparing the speech feature with a knowledge base;
the initial phrase criteria further comprising one or more of pronunciation or breathing time,
the phrase selected from the plurality of phrases is obtained from interaction with the voice assistant;
the selected phrase from the utterance includes at least a portion of the initial phrase criteria; and
the phrase bank captures voice assistant command patterns and deviations from the voice assistant command patterns for speech and sound events of the user with respect to each captured phrase in the phrase bank.

See for support:
[0054] Returning to FIG. 3, as previously discussed above, the commands or set of commands communicated with a voice assistant is limited. Therefore, some embodiments store the common important phrases (combination of words) in the phrase bank/dictionary 330 as a reference for the growing baseline model 320. In other words, some embodiments enable capture of not only the regular patterns, but also the deviation from the pattern baselines for speech and sound events of the patient with respect to each recorded phrase in the phrase bank/dictionary 330.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 6-7, 9-10, 13-14, 16-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (U.S. 2018/0322961) in view of Lotan (U.S. 2019/0080803) and further in view of Hauptman (U.S. 20190362740) and Hoffmeister (U.S. 9070367).
Regarding Claim 1, Kim teaches:
1.  A method for pulmonary condition monitoring, [Kim does not mention lungs or breathing, but it does refer to diagnosis of “medical conditions” in a generic fashion.  “[0002] This invention relates to voice analysis and more particularly relates to the automated assessment and diagnosis of one or more medical conditions based on collected voice samples.”] comprising: 
receiving initial phrase criteria comprising jitter or shimmer for selecting uttered phrases meeting the initial phrase criteria;  [Kim expressly teaches the use of shimmer or jitter as pertinent acoustic features that are used for selection of verbal prompts / phrases that the user must speak in order for the system to analyze his medical condition.  Figure 5 shows “feature selection 540.”   “[0063] … For example, acoustic features may include mel-frequency cepstral coefficients, perceptual linear prediction features, jitter, or shimmer.”  “[0075] Now presented are examples of features that may be used to diagnose a medical condition.”  “[0077] The following are non-limiting examples of short-time segment features: … voice quality features (such as jitter, jitter of jitter, shimmer, or harmonics-to-noise ratio) ….”]
selecting, a phrase from an utterance of a user of an electronic device during interaction with a voice assistant, [Kim, Figure 8, 820, obtains speech/utterance from the training corpus.  [0120].  Kim teaches that its device 102 (Figure 1A) can be a PDA:  “[0050] …The hardware devices 102 … may include one or more of … a personal digital assistant ….”  The Training utterance is not from the user of the PDA.] the selection being based on using the received initial phrase criteria; [Kim in Figure 8, 860 selects “prompts” which are selected based on their “acoustic features,” (Figure 5, “feature selection 540,” Figure 7, “select a plurality of features using feature selection scores 760”) being suitable for diagnosis a particular medical condition (Figure 8, 830: “computer medical diagnosis scores for the speech data items of the prompt.”)  Some of the prompts are questions (Figure 4) which elicit an answer from the patient.  Other prompts may be words or numbers for the patient to repeat.  “[0144] … For example, the query module 1102 may audibly and/or textually question the user "what venue are we at today?" … may audibly list words and/or numbers for the user and ask the user to repeat them back, may display a series of pictures to the user and ask the user to repeat back a description of the series of pictures, or the like….”  The prompts are selected based on criteria including “acoustic features 730,” which include Jitter or Shimmer, as provide in [0063] above.]
generating a phrase bank and storing the selected phrase from the utterance in the phrase bank; [Kim, Figure 4 shows a list of prompts and teaches the “phrase bank” of the Claim.  “[0086] . . . The prompts of FIG. 4 may be stored using any appropriate storage technology, such as a database.”  Figure 8, “860: select a plurality of prompts using the prompt selection score.”  As provided in [0144] and [0164], the prompts may be words or numbers or sets of words (i.e. phrases) to be spoken by the patient.  “[0164] In certain embodiments, the interface module 1108 (e.g., in cooperation with the query module 1102), may prompt a user to recite a passage (e.g., a passage including sentences, sets of words, letters, numbers, monosyllables, or the like). ….”]
training a baseline model using the stored selected phrase from the phrase bank, wherein the selected phrase is a voice assistant command; [Kim teaches training a model but does not teach that the model is trained on the commands of the user directed to the assistant.  Figure 5, “model training 550.”]
capturing, by the baseline model, correlation and dependencies between one or more speech features and the stored selected phrase for a condition; [Kim, Figures 6A and 6B show the correlation between the “feature value” of speech and “diagnosis value.”  Figure 8, “830: compute medical diagnosis score for the speech data item of the prompt.”  A trained model establishes a correlation between an observation and a conclusion.  The observation is the speech (acoustic features of the speech) and the conclusion is the existence of a medical condition.  This is expressly shown in Figures 6A and 6B.]
(training of the model /\)
(use of the trained model \/)
spotting the stored selected phrase from the phrase bank for an additional utterance during another interaction with the voice assistant; [Kim, Figure 12, “Receive response 1204.” This response is speech of the patient and the system is looking for the “prompt” which may be a “word or number” / “stored selected phrase” in the speech.  “[0173] A response module 1104 receives 1204 a response of the user (e.g., a verbal response from a microphone of a computing device 102, …”  Kim, Figure 1A, “hardware device 102” is shown as a “personal digital assistant” and described as such in [0050] and [0191].  The “hardware device 102” has a “voice module 104a” installed.  “[0164] In certain embodiments, the interface module 1108 (e.g., in cooperation with the query module 1102), may prompt a user to recite a passage (e.g., a passage including sentences, sets of words, letters, numbers, monosyllables, or the like). ….”  Kim does not do spontaneous word spotting and rather prompts the patient for input.  However, the scope of this limitation does not require spontaneous word spotting either.]
identifying at least one speech feature that is associated with one or more pulmonary conditions within the spotted stored selected phrase; [Kim, Figure 11, “Detection Module 1106” identifies the condition of the patient by analyzing his speech and identifying “acoustic features” that are associated with a particular condition.  “[0161] . . .  detection module 1106 is configured to base an assessment at least partially on one or more biomarkers of received response data (e.g., verbal response data, sensor data, or the like) indicating a quality of life for the user. ….  a measurable indicator from a user of some biological state and/or condition of the user (e.g., presence of a disease and/or injury, presence of one or more symptoms, a current quality of life of a user, or the like). A biomarker, in certain embodiments, may comprise a characteristic objectively identifiable by the detection module 1106 in response data from a user, such as an acoustic feature, a language feature, a characteristic identifiable in sensor data, or the like.”]
determining a pulmonary condition based on analysis of the at least one speech feature; and [Kim, Figure 12, “Assess medical condition 1206.”]

model training (update of trained model)
updating the baseline model based on capturing the correlation and dependencies between the at least one feature and the additional utterance for the determined pulmonary condition. [Kim does not teach continuous training with the inputs of the user and its training is done with a corpus of labeled speech data.  The rest, i.e. capturing the correlation and dependencies is the definition of model development and training and is taught by Kim in Figures 6A and 6B.]

Following features are not taught by Kim:
Kim, Figure 8, 820, obtains speech/utterance from the training corpus. [0120].  Kim teaches the user of a PDA (Figure 1A, one of the devices 102 is a PDA) but the Training utterance is not from the user of the PDA.  Kim trains its model using a corpus of labeled speech data and does not collect the speech of the user for continuous training. (Kim, “[0055] … Training corpus 510 may include any appropriate information, such as speech data of multiple people with and without the medical condition, a label indicating whether or not person has the medical condition, and any other information described herein.”)
Kim does not teach that the model is trained on the commands of the user directed to the assistant.
Kim also does not teach the update of the model.
Also, while Kim is directed to detection of medical conditions, which would arguably include pulmonary conditions, it does not specify detection of pulmonary conditions.

Lotan is expressly directed to the detection of pulmonary conditions from speech and teaches:
1.  A method for pulmonary condition monitoring, [Lotan, Title:  Assessment of Pulmonary Condition by Speech Analysis.]
comprising: 
receiving initial phrase criteria comprising jitter or shimmer for selecting uttered phrases meeting the initial phrase criteria; [Lotan  in Figure 2 shows the 3 types of parameters/criteria that are used to evaluate speech for a pulmonary condition.  Formant is related to shimmer and jitter.  See [0021] to [0029].]
selecting, a phrase from an utterance of a user of an electronic device during interaction with a voice assistant, the selection being based on using the received initial phrase criteria; [Lotan, Figure 1, “smartphone 24.”  [0049] and [0051].  Lotan collects natural speech of the user with a smartphone for evaluation of the pulmonary condition.  “[0049] It is emphasized that embodiments of the present invention do not necessarily require the identification of cough sounds, wheezing sounds, breathing sounds (e.g., cackles), or any other non-speech-related parameters, in order to assess the status of the pulmonary condition. Rather, the assessment may be based only on speech-related parameters, such as formant frequencies of the subject's speech, and/or any of the other speech-related parameters described herein. Moreover, the subject need not necessarily be prompted, at any particular time, to utter any particular sounds, to say any particular words, or even to speak naturally. Rather, the system may analyze spoken sentences obtained from the subject's normal, daily conversations on his smartphone. (Notwithstanding the above, in some embodiments, the subject may be prompted to utter particular sounds, and/or say particular words, particularly if the analysis of the subject's "natural" speech is inconclusive.)”  “[0060] Typically, as noted above, processor 30 analyzes natural, "free" speech of the subject. In some cases, however, the subject may be prompted, e.g., by processor 34, to recite particular words or sentences that have been found to facilitate the analysis described herein, and these words or sentences may then be communicated to the server for analysis. (The same particular words or sentences may be used each time the analysis is performed, in order to facilitate a more effective assessment of the subject's condition.)”]
generating a phrase bank and storing the selected phrase from the utterance in the phrase bank; [Not taught but suggested by the parenthetical teachings in [0049].]
training a baseline model using the stored selected phrase from the phrase bank, wherein the selected phrase is a voice assistant command; [Lotan in [0084] - [0086] teaches training of models on a “training set” of N recording from the same subject over a period of time:  “[0086] Typically, a separate model is learned for each subject, by acquiring training data from the subject over a suitable period of time, and then applying a suitable machine-learning algorithm to these data.”  Lotan teaches the use of smartphones as the user device (Figure 1, [0051]) but does not specify that the commands directed to the PDAs are exclusively used for the training.  “[0061] … For example, the subject may be prompted, e.g., by the smartphone, to provide the speech by orally responding to a particular question. The subject's oral response to the question is then be analyzed both for the identification of the speech-related parameters described herein--which describe the vocal properties, such as spectral and/or cepstral properties, of the response--and for the meaning of the response. (The question may be posed visually, on the display of the smartphone, and/or auditorily.) Examples of such questions include "Did you take your medicine today?," "How do you feel?," and "How active were you today?" By identifying both the vocal properties of, and meaning of, the subject's answer, the status of the subject's condition may be better assessed…”]
capturing, by the baseline model, correlation and dependencies between one or more speech features and the stored selected phrase for a condition; [Lotan, “[0048] Alternatively or additionally, the system may assess the status of the condition, by applying a machine-learned model to particular parameters, or "features," extracted from the speech signal. Such a model may include a regressor, which outputs a number indicative of the status. For example, the regressor may output a number that indicates, relative to a baseline, an estimated amount of fluid inside, or near, the lungs. …”]
model training
use of the trained model
spotting the stored selected phrase from the phrase bank for an additional utterance during another interaction with the voice assistant; [Lotan selects “portions of speech” that include the indicator parameters.  “[0029] selecting at least one portion of the speech for analysis, based on one or more metrics …”  [0066] to [0069] describe the criteria used for selecting portions of speech suitable for diagnosis.  Lotan does not rely on a word bank but teaches that it may use it:  “[0049] … (Notwithstanding the above, in some embodiments, the subject may be prompted to utter particular sounds, and/or say particular words, particularly if the analysis of the subject's "natural" speech is inconclusive.)”   Prompting the user to “say particular words” suggests “spotting the stored selected phrase from the phrase bank” of the Claim because these are words stored somewhere that are deemed to be more suitable for detecting the pulmonary condition.]
identifying at least one speech feature that is associated with one or more pulmonary conditions within the spotted stored selected phrase; [Lotan, Figure 2, identified speech parameters.  “1 … to identify, by analyzing the speech, one or more speech-related parameters of the speech,…”  The “spotted stored phrase” is not taught but suggested.]
determining a pulmonary condition based on analysis of the at least one speech feature; and [Lotan, “1 … to assess, in response to the speech-related parameters, a status of the pulmonary condition, and to generate, in response thereto, an output indicative of the status of the pulmonary condition.”]

model training
updating the baseline model based on capturing the correlation and dependencies between the at least one feature and the additional utterance for the determined pulmonary condition. [Lotan teaches that the voice of each subject himself can be used for the training of the diagnostic model and that this voice may be collected over a period of time which at the least suggests that the model is updated during use:  “[0086] Typically, a separate model is learned for each subject, by acquiring training data from the subject over a suitable period of time, and then applying a suitable machine-learning algorithm to these data.”]
Kim and Lotan pertain to determination of health conditions from speech and it would have been obvious to modify the system of Kim which does not mention “pulmonary condition” as one of the conditions that it monitors for and further the use of the speech of the particular subject for training of the detector model in order to have a model that is tailored to a particular subject.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Kim and Lotan do not teach training on a command word.
Hauptman teaches:
training a baseline model using the stored selected phrase from the phrase bank, wherein the selected phrase is a voice assistant command; [Hauptman teaches that user-specific models may be trained on the voice of a single speaker, outputting voice commands and in order to detect a cardiac condition of the speaker/user.  “[0346] In some embodiments, vehicles are used for embedding with sampling and/or analyzing circuitry. For example, a voice input and/or analyzer can be embedded in a car multimedia system and/or speakerphone system; for example, by sampling when a subject is talking on the phone/speaker in the car, and/or outputting voice commands, and/or spontaneously talking inside the car.”  “[0341] (2) Calculating periodicity of speech features (autocorrelation and/or spectral domain) and comparing it to a periodicity known to be pathological/healthy.”  “[0342] (3) Using machine learning strategy to train statistical models from voice features from healthy speakers and/or from speakers having pathology.”  “[0344] In some embodiments, for unknown speakers--the pathology will be detected by comparing, for example by probability calculations, the acoustic features to each of the pathological and/or healthy models (such as the models that were calculated in the training phase) and taking the highest probability model as the result.”  “[0617] FIG. 18B shows speaker-dependent results, that is, results of a system trained by one or more voice recording(s) of a single speaker, taken in specific known heart conditions, healthy or AF, used to detect cardiac condition on the same speaker. The results shown are considered as successful in estimating cardiac conditions based on human voice.”]

Kim/Lotan and Hauptman which pertain speech analysis by the use of devices such as PDAs or smart phones.  It would have been obvious to combine the models of Haupman which are trained on speech of a specific user and on words including Commands for detecting a cardiac condition with the system of combination in order to arrive at a system that uses the Command words to train a personal model for detecting a medical condition.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Kim/Lotan/Hauptman is not express in teaching the “updating” limitation.
Hoffmeister teaches:
training a baseline model using the stored selected phrase from the phrase bank, wherein the selected phrase is a voice assistant command; [Hoffmeister’s devices include PDAs and Hoffmeiseter teaches training the local speech recognizer for commands: “The teachings of the present disclosure may be applied within a number of different devices and computer systems, including, for example, general-purpose computing systems, server-client computing systems, mainframe computing systems, telephone computing systems, laptop computers, cellular phones, personal digital assistants (PDAs), tablet computers, other mobile devices, etc.”  Col. 3, lines 15-22.  Hoffemister is directed to “local speech recognition of frequent utterances” and one type frequently spoken utterance is the command: “6. The method of claim 5, wherein the frequently spoken utterance comprises a command, and wherein the method further comprises executing the command.”]
…
updating the baseline model based on capturing the correlation and dependencies between the at least one feature and the additional utterance for the determined pulmonary condition. [Hoffmeister teaches updating/adjusting a speech recognition model, including an acoustic component, with may be updated to recognize specific frequently uttered words by a particular user’s individualized input to a Personal Digital Assistant.  Col. 3, 15-27.  Figure 9, “local device 900.”  Figure 10, “update or replace the speech models 1014” as new data is collected.  PDA: Col. 3, line 21.  Storage of words for word-spotting: “Other information may also be stored in the speech storage 220 for use in speech recognition. The contents of the speech storage 220 may be prepared for general ASR use or may be customized to include sounds, words, and phrases that are likely to be used in a particular application. For example, for ASR processing at an ATM, the speech storage 220 may include customized data specific to banking transactions. In certain instances, the speech storage 220 may be customized for an individual user based on his/her individualized speech input. To improve performance, the ASR module 214 may revise /update the contents of the speech storage 220 based on feedback of the results of ASR processing.” Col. 6, lines 16-27.  Update of the ASR model:   “The device identifies a number of the most frequently spoken utterances or words … as shown in block 1008. … The speech recognition models may also be updated or replaced as new information is collected, as shown in block 1014, and the updated speech recognition models may, if necessary, be transmitted to the local device, as shown in block 1016. The speech recognition models may be updated or replaced in real-time or on a periodic basis, for example, based on a number of minutes, hours, days, weeks, months, etc., to reflect current spoken utterances frequently spoken by the user. ….”  Col. 13, lines 1-25.  Hoffmeister teaches that the particular words that are identified may be commands to the PDA: “… When the audio signal matches one of the speech models, the local device processes the utterance, for example by executing a command….”  Abstract.  “In one aspect of the present disclosure, the local device is a mobile device or cellular phone. Speech model matching may be performed by the local device, as illustrated in FIG. 11. As shown in FIG. 11, the local device receives an audio signal, as shown in block 1102. The audio signal may be a command word or phrase, for example, to be performed by the local device or remote device.… In one aspect, the processing includes executing a command included to the utterance. For example a "call mom" command may be included in one of the speech recognition models as a frequently spoken utterance by the user. The processing of the "call mom" phrase may include starting a phone application of the local device and calling the desired phone number….”  Col. 14, line 48 to Col. 15, line 9.  Note that the ASR model 214 includes “acoustic model” which would be suitable for detection of voice-modifying conditions (pulmonary):  “The ASR module 214 includes an acoustic front end (AFE) 216, a speech recognition engine 218, and speech storage 220. The AFE 216 transforms audio data into data for processing by the speech recognition engine 218. The speech recognition engine 218 compares the speech recognition data with the acoustic, language, and other data models and information stored in the speech storage 220 for recognizing the speech contained in the original audio data. …”  Col. 5, lines 11-24.]
Kim/Lotan/Hauptman and Hoffmeister pertain speech analysis by the use of devices such as PDAs.  It would have been obvious to combine the real-time or periodic update of the ASR model that is taught by Hoffmeister and is directed to word-spotting of specific words such as commands directed at the PDA with the system of combination in order to arrive at a system that updates the model that is being used.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 2, Kim teaches:
the at least one speech feature includes one of: pause within the phrase, vowel within the phrase, or a combination thereof; [Kim teaches that the “language features” that it uses for determining a condition include the pauses and vowels in the speech:  “[0067] Language feature computation component 230… For example, the speech recognition results may include a word lattice that includes multiple possible sequences of words, information about pause fillers, and the timings of words, syllables, vowels, pause fillers, or any other unit of speech.”]
the analysis of the at least one speech feature includes comparing the speech feature with a knowledge base; [Kim, Figure 12, “assess medical condition 1206.”  The analysis step of Claim 1 is for determination of the medical condition.  “[0044] A voice module 104 may compare a user's answers to previous answers from when the user was healthy (e.g., to baseline answers). …”  “[0045] … In some embodiments, a voice module 104 may compare results of a voice assessment and/or model for medical trial and/or study participants with results of a questionnaire or other test, may provide a score similar to and/or on the same scale as a questionnaire or other test, or the like….”   (In the instant Specification, the “knowledge base” is the “phrase bank/dictionary 330” of Figure 3.  See [0082] of the published Application.)]
the initial phrase criteria further comprising one or more of pronunciation or breathing time, [Kim, “[0069] …another example, features relating to the pronunciation of vowels, syllables, or words may be important for Parkinson's disease ….”]
the phrase selected from the plurality of phrases is obtained from interaction with the voice assistant; and [Kim, Figure 1, “voice module 104A” teaches the “voice assistant.”  “[0043] A voice module 104 may interact with a user, asking questions verbally, recording the user's vocal responses, determining whether a response is accurate, or the like…..”]
the selected phrase from the utterance includes at least a portion of the initial phrase criteria; and [Kim teaches that the prompts are provided to the user such that the response would include the language or acoustic features that provide the most diagnostic information.  What the user says in response to prompt would include the “selected phrase” which will be selected to be most informational.]
the phrase bank captures voice assistant command patterns and deviations from the voice assistant command patterns for speech and sound events of the user with respect to each captured phrase in the phrase bank.  [Kim, Figure 4 shows a list of prompts and teaches the “phrase bank” of the Claim.   Figure 8, “860: select a plurality of prompts using the prompt selection score.”  As provided in [0144] and [0164], the prompts may be words or numbers or sets of words (i.e. phrases) to be spoken by the patient.]

Kim teaches:
the phrase bank captures voice assistant command patterns and deviations from the voice assistant command patterns for speech and sound events of the user with respect to each captured phrase in the phrase bank. [Kim teaches that certain words may be used for its phrase bank and also teaches training a model on a corpus of training data.  Training means capturing and analyzing and modeling the patterns and deviations from the patterns.  See [0091]-[0092].  “[0043] … A voice module 104 may extract one or more verbal queues and/or features and pass the extracted verbal queues and/or features to one or more machine learning models trained for a certain disease and/or other medical condition.”  “[0045] … For example, a voice module 104 may collect audio from medical trial and/or study participants (e.g., at a doctor's visit, at home, or the like) and may create one or more models for a placebo group and/or for a test group. ….”  “[0082] To train a model for diagnosing a medical condition, a corpus of training data may be collected. The training corpus may include examples of speech where the diagnosis of the person is known. For example, it may be known that the person had no concussion, or a mild, moderate, or severe concussion.”]
Kim teaches training of the model on a corpus and thus does not teach that the patterns and deviations are obtained from the commands of one particular user directed at his personal PDA although the user of Kim does have and use a PDA.

Lotan teaches:
the phrase bank captures voice assistant command patterns and deviations from the voice assistant command patterns for speech and sound events of the user with respect to each captured phrase in the phrase bank. [Lotan teaches that the voice of each subject himself can be used for the training of the diagnostic model and that this voice may be collected over a period of time which at the least suggests that the variations of the voice over time are collected and used (one single voice sample would be sufficient if detection of the variations was not intended):  “[0086] Typically, a separate model is learned for each subject, by acquiring training data from the subject over a suitable period of time, and then applying a suitable machine-learning algorithm to these data.”]

Rationale for combination of Kim and Lotan as provided for Claim 1.  The feature of training and updating the model by the voice of the particular user was brought in from Lotan and the details come from Lotan under the same rationale.

Regarding Claim 6 Kim teaches that “context” may be considered in the evaluation “[0140] . . . The query module 1102 may question a user in order to collect test case voice recordings or other test case data in response to a user requesting a medical assessment, based on data from a sensor of a computing device 102 such as a wearable or mobile device, and/or based on receiving another trigger indicating that an injury may have occurred, that one or more symptoms of a disease have been detected, or the like. For example, in response to a hit, a fall, an accident, and/or another potential concussion event (e.g., at a sporting event or other activity), a user (e.g., an injured player or other person, a coach, a parent, a medical professional, an administrator, or the like) may request a medical assessment (e.g., using a graphical user interface of the interface module 1108 to trigger one or more questions from the query module 1102, collection of voice data and/or other data from the response module 1104, and/or a medical assessment from the detection module 1106, or the like)….”
Lotan teaches:
6. The method of claim 1, further comprising: 
receiving a context accompanying the utterance; and [See [0082] teaching the types of other sensor data (Pulse, ECG, Weight) provided to the device.  See [0061] and the use of the semantics of the speech as indicator of illness.]
determining the pulmonary condition based on analysis of the at least one speech feature in view of the context, [Lotan, See [0082] below.  See also the examples of [0061].]
wherein the context comprises at least one of air quality information, command content information, opportunistic physiological data or a combination thereof, [Lotan’s sensors measure different physiological parameters:  “[0082] Alternatively or additionally, the processor may base the assessment on other, non-speech-related parameters, such as a picture of the subject (which may be acquired, for example, while the subject is speaking, by a camera belonging to the smartphone, or by an external camera), the subject's weight, an echocardiogram (ECG) recording taken from the subject, the subject's arterial oxygen saturation, and/or the subject's level of activity (quantified, for example, as a number of steps taken during a preceding period of time). Any of these parameters may be input to the smartphone, e.g., by the subject. In some embodiments, the smartphone is functionally linked, e.g., wirelessly, to an external device that is configured to measure one or more of these or any other parameters, such as, for example, a camera, a spirometer, a pulse oximeter, an electrocardiograph, a heart rate monitor, a blood pressure monitor, a skin-conductivity sensor, or a posture monitor.”]
the command content comprises content of a search command, and  [This is not taught by Lotan but is not necessary either because of the “at least one” in the Claim.  Use of the “command content” is taught by Hauptman and Hoffmeister.]
the one or more speech features capture a state of the user in different contexts. [ Lotan teaches that the additional context/sensor data collected in [0082] are considered together with the speech in arriving at a conclusion:  “[0083] In some embodiments, the processor applies a machine-learned model, such as a classifier or regressor, to some or all of the aforementioned parameters. …..”]

Regarding Claim 7, Kim uses a corpus and not individualized data.
Lotan teaches:
7. The method of claim 6, wherein: 
the utterance is obtained passively or proactively from the user of the electronic device; [Lotan, Figure 1 shows that the speech of the human “subject 22” is obtained while he is speaking on the “smartphone 24” during the course of his normal daily activities which teaches the “obtained passively” of the Claim.  “[0051] …. A subject 22, who suffers from a pulmonary condition such as pulmonary edema or pleural effusion, speaks into smartphone 24, typically during the course of the subject's normal day-to-day conversational activities. ….”  See also: “[0057] …Upon the subject beginning to speak into the phone, the application--without the subject necessarily noticing --begins to record, and/or transmit, the subject's speech. ….”  The subject 22 may also manually activate the device which teaches the “obtained … proactively” of the Claim:  “[0057] … Alternatively, the application may be manually activated by the subject, and/or may be automatically activated by the processor, upon the subject initiating or answering a call.”  Lotan may also prompt the user for particular speech which is proactive.” 5. The apparatus according to claim 1, wherein the processor is further configured to prompt the subject to provide, by orally responding to a question, the speech.”]
the baseline model is personalized as a result of interaction between the user of the electronic device and the voice assistant coupled to the electronic device; and [Lotan in paragraphs [0083]-[0086] discusses the aspects of machine learning and concludes:  “[0086] Typically, a separate model is learned for each subject, by acquiring training data from the subject over a suitable period of time, and then applying a suitable machine-learning algorithm to these data.”]
the opportunistic physiological data is received from a smart electronic device. [Lotan can collect other types of data: “[0082] such as a picture of the subject … the subject's weight, an echocardiogram (ECG) recording taken from the subject, the subject's arterial oxygen saturation, and/or the subject's level of activity …   Any of these parameters may be input to the smartphone, e.g., by the subject. …such as, for example, a camera, a spirometer, a pulse oximeter, an electrocardiograph, a heart rate monitor, a blood pressure monitor, a skin-conductivity sensor, or a posture monitor.”]
Rationale as provided for Claim 1.

Claim 9 is a device claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally, the processor and memory of the Claim are taught by Kim at “[0029] Modules may also be implemented in software for execution by various types of processors….”  “[0030] Indeed, a module of program code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices….”

Claim 10 is a device claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.  

Claim 13 is a device claim with limitations corresponding to the limitations of Claims 6 and is rejected under similar rationale.  
Claim 14 is a device claim with limitations corresponding to the limitations of Claims 7 and is rejected under similar rationale.  

Claim 16 is a computer program product system claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.  Additionally, the “non-transitory processor-readable medium” of the Claim is taught by Kim: “[0031] The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.”

Claim 17 is a computer program product system claim with limitations corresponding to the limitations of method Claim 2 and the top limitation of Claim 3 (mapped to Kim) and is rejected under similar rationale. Claim 3 is rejected below with an additional reference that includes the definitions of Shimmer and Jitter for which another reference had to be added.
17. The non-transitory processor-readable medium of claim 16, wherein the method further comprises:
updating the knowledge base as a result of the analysis of the at least one speech feature; and [First limitation of Claim 3 mapped to Figure 5, model training of Kim]
adjusting statistical parameters and dependencies between existing phrases stored in the phrase bank and related contextual information; [Claim 1 update of the baseline model.]
wherein: [The remaining part of this Claim is Claim 2]
the at least one speech feature includes one of:
pause within the phrase, vowel within the phrase, or a combination thereof; and
the analysis of the at least one speech feature includes comparing the speech feature with a knowledge base;
the initial phrase criteria further comprising one or more of pronunciation, or breathing time;
the phrase selected from the plurality of phrases is obtained from interaction with the voice assistant; and
the selected phrase includes at least a portion of the initial phrase criteria; and
the phrase bank captures voice assistant command patterns and deviations from the voice assistant command patterns for speech and sound events of the user with respect to each captured phrase in the phrase bank.

Claim 19 is a computer program product system claim with limitations corresponding to the limitations of method Claims 6 and 7 and is rejected under similar rationale.
19. The non-transitory processor-readable medium of claim 16, wherein the method further comprises:
receiving a context accompanying the utterance; and [Claim 6]
determining the pulmonary condition based on analysis of the at least one speech feature in view of the context; [Claim 6]
wherein:
the utterance is obtained passively or proactively from the user of the electronic device; [Claim 7]
the baseline model is personalized as a result of interaction between the user of the electronic device and a voice assistant coupled to the electronic device; and [Claim 7]
the context further comprises at least one of air quality information, command content information, opportunistic physiological data or a combination thereof and the command content comprises content of a search command, and the one or more speech features capture a state of the user in different contexts. [Claim 6]

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over KIim, Lotan, Hauptman, Hoffmeister in view of Farrus (M. Farrús, J. Hernando, and P. Ejarque, "Jitter and Shimmer Measurements for Speaker Recognition," Eurospeech, Antwerp, Belgium, 2007.)
Regarding Claim 3, Kim teaches:
3. The method of claim 2, further comprising: 
updating the knowledge base as a result of the analysis of the at least one speech feature; and [Kim, Figure 5, “model training 550” takes “feature selection 540” as input and operates in a training loop which means that it keeps getting updated.]
adjusting statistical parameters and dependencies between existing phrases stored in the phrase bank and related contextual information; [Kim uses a statistical model which is based on correlation.  “[0091] Accordingly, for each feature, a pair of numbers may be obtained for each speech data item of the training corpus. FIGS. 6A and 6B illustrate two conceptual plots of the pairs of numbers for a first feature and a second feature. For FIG. 6A, there does not appear to be a pattern or correlation between the values of the first feature and the corresponding diagnosis values, but for FIG. 6B, there does appear to be a pattern or correlation between the values of the second feature and the diagnosis values. Accordingly, one may conclude that the second feature is likely a useful feature for determining whether a person has the medical condition and that the first feature is not.”  See also [0180].  For context data see the use of “sensor data” in [0142] –[0146] in diagnosis.]
wherein jitter comprises consistency of localized periodicity for voice sounds, and shimmer comprises consistency of localized vocal amplitude for the voiced sounds. [Kim teaches the use of Shimmer and Jitter as “acoustic features” used for determination of a condition but does not provide a definition for them.  [0063] and [0077].]
Kim and Lotan and Hauptman all include training and the limitations of this Claim set forth the known standard steps of training a model.  Hoffmeister was cited for a continual training based on individualized data from a user.
Hoffmeister teaches:
3. The method of claim 2, further comprising: 
updating the knowledge base as a result of the analysis of the at least one speech feature; and [Hoffmeister, Figure 10, step 1014 update of the model includes update of the parameters and perhaps even the list of words/commands.]
adjusting statistical parameters and dependencies between existing phrases stored in the phrase bank and related contextual information; [Hoffmeister the updating of the model includes adjusting the statistical dependencies between the existing or replaced parameters.  Role of context: “While a recognition score may represent a probability that a portion of audio data corresponds to a particular phoneme, word, or phrase, the recognition score may also incorporate other information which indicates the ASR processing quality of the scored audio data relative to the ASR processing of other audio data….”  Col. 4, lines 53-63.]
wherein jitter comprises consistency of localized periodicity for voice sounds, and shimmer comprises consistency of localized vocal amplitude for the voiced sounds.
The last limitation provides a definition of jitter and shimmer. This concept is understood in reference such as Kim which points to using shimmer and jitter.
However separate references added which includes the express definition.
Farus teaches:
wherein jitter comprises consistency of localized periodicity for voice sounds, and shimmer comprises consistency of localized vocal amplitude for the voiced sounds.  [This limitation provides a definition for jitter and shimmer.  This definition is available in the literature as provided below.  The definition of jitter may also be found in Wikipedia as provided in the conclusion section of this Action.]

    PNG
    media_image3.png
    463
    573
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    524
    564
    media_image4.png
    Greyscale

Kim/Lotan/Hauptman/Hoffmeister and Farrus all pertain to analysis of speech and it would have been obvious to include the particular definition of jitter and shimmer from the cited publication with the combination for completeness.  This definition is understood from the Kim reference which teaches the use of jitter and shimmer in its analysis.

Claims 4 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Kim, Lotan, Hauptman, Hoffmeister in view of Shrivastav (U.S. 2012/0265024).
Regarding Claim 4, Lotan teaches that the meaning of the speech is determined to assess condition of the patient.  See [0061].  But it is not particular with respect to erratic speech, due to lack of oxygen, for example.
Kim and Lotan and Hauptman teach model training.  Hoffmeister was cited for teaching the individualized model training.
Hoffmeister teaches:
4. The method of claim 1, further comprising: 
Inserting new selected phrases and new contextual information into a personalized longitudinal model to capture additional scenarios, wherein the personalized longitudinal model captures personalized correlation and dependencies between selected phrases; [Hoffmeister collects data of a particular user over time/longitudinal to generate and update its model.  See Figures 1 and 10 for update of the list of the frequently spoken command and update of the speech model based on the new data.  Col. 14, line 48 to col. 15, line 9.  The capture of correlation and dependency is the purpose and definition of training a model.]
…
Analysis of speech to find erratic speech is not express in the references Kim, Lotan, Hauptman, Hoffmeister.
Shrivastav teaches:
4. The method of claim 1, further comprising: 
inserting new selected phrases and new contextual information into a personalized longitudinal model to capture additional scenarios, wherein the personalized longitudinal model captures personalized correlation and dependencies between selected phrases; [Shrivastav teaches the training of a model which includes the specifics of this limitation.  Context is taught by the “various biomarkers.”  Data of a patient/user is collected over time/longitudinally to observe the variations over time.  “[0083] According to one embodiment, an audio (conversational) stream received via a phone/microphone (e.g., mobile phone, VoIP, internet, etc.) is analyzed by segmenting the audio stream into short windows, computing specific acoustic measures from each window (e.g. mel-frequency cepstral coefficients), comparing the acoustic measures across successive windows, developing and training a machine learning pattern recognition engine to identify acoustic patterns of a cough, and determining the likelihood of a particular window (or set of windows) to contain an instance of cough.”  “[0081] After performing the speech analysis, modeling and coding (511) may optionally be performed via statistical approaches, machine learning, pattern recognition, or other algorithms to combine information from various biomarkers before reaching a diagnostic decision.”  “[0070] … In yet another embodiment, the biomarker information of the signal is compared to previous biomarker information from the user to compare changes in biomarkers over time….”  “[0032] Using acoustic measures as a biomarker involves evaluating changes in various aspects (or subsystems of speech) over time….”]
retrieving an expected position of at least one speech segment based on a knowledge base; and [Shrivastav, teaches that speech recognition is performed to recognize words, phrases, or sentences (segments) and phrase or sentence complexity is determined based on the number and order/position of syntactical categories found in the speech sample.  “8. The method according to claim 1, wherein performing the signal analysis to identify acoustic measures or language markers from the speech sample comprises identifying the language markers by: performing an automatic speech recognition process to recognize words, phrases, or sentences; classifying the recognized words, phrases, or sentences into syntactical categories; and determining phrase and/or sentence complexity by evaluating a number and order of syntactical categories found in the speech sample.”  “[0067] For language analysis, the user's language is analyzed for language patterns in a language marker module 515. The language marker module 515 can include an automatic speech recognition (ASR) module 507 and a language module 508. As shown in FIG. 7, according to one embodiment, the user's language is transcripted via the ASR module 507, which can incorporate large vocabulary systems, word spotting, and phoneme recognition. Then, once the words (language) are determined by ASR, recognized words (and phrases and sentences) can be classified into syntactical categories in the language module 508. For example, recognized words can be classified as nouns, verbs, and adjectives. Then, phrase and/or sentence complexity can be determined by, for example, evaluating the number and order of various syntactical categories that occur in someone's speech. In one embodiment, a primary analysis 509 for syntax coding can be performed to classify the recognized words/language. The syntax coding can be accomplished by a dictionary look-up. A secondary analysis 510 for sentence/phrase complexity can be performed to determine the complexity and usage of the language. A reduction in sentence complexity can be an indicator of a neurological disease. In addition, certain neurological diseases, such as Alzheimer's, cause particular language patterns to emerge. Such language patterns can be determined via the secondary analysis.”]
determining position of the at least one speech segment based on the expected position and generate the at least one speech feature for the at least one speech segment. [Shrivastav, a particular syntax is expected and deviations from this syntax, on in the case some disorders conforming to a disease-indicating syntax, indicates presence or absence of a medical condition.  “[0032] Using acoustic measures as a biomarker involves evaluating changes in various aspects (or subsystems of speech) over time. … The analyses may also include, but is not limited to, analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.) grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words. The analysis may also evaluate, as an alternative or in addition, the frequency (i.e. the number of occurrences), the intensity (i.e. the strength), or other characteristics of cough during a conversation. Of course, embodiments are not limited to these listed, and other measures from a speech sample can be taken and analyzed.”  “[0042] Similarly to PD (Parkinson’s Disease), the biomarkers for Alzheimer's disease may include the measures described above as well as detailed analyses of the speaker's language characteristics….”]
Kim/Lotan/Hoffmeister and Shrivastav pertain to determining pulmonary conditions that may impact cognitive ability or neurological diseases that directly impact cognitive ability and it would have been obvious to combine the use of sentence structure (position and location of phrases and sentence segments) from Shrivastav with the system of combination that evaluates the language of the user for content and context in order to add another measure for determining some type of impairment and as combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 11 is a device claim with limitations corresponding to the limitations of Claims 3 and 4 and is rejected under similar rationale.  
11. The electronic device of claim 9, wherein the process is further configured to:
update the knowledge base as a result of the analysis of the at least one speech feature; (Claim 3, first limitation that is mapped to Kim.)
retrieve an expected position of at least one speech segment based on a knowledge base; and (Claim 4)
determine position of the at least one speech segment based on the expected position and generate the at least one speech feature for the at least one speech segment. (Claim 4)

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Kim, Lotan, Hauptman, Hoffmeister in view of Michaelis (U.S. 7,529,670).
Regarding Claim 5, Kim teaches:
5. The method of claim 2, further comprising: 
identifying at least one pause within the phrase; and [Kim, “[0064] The features may include language features where language features are computed using the results of a speech recognition. For example, language features may include a speaking rate (e.g., the number of vowels or syllables per second), a number of pause fillers (e.g., "ums" and "ahs"), the difficulty of words (e.g., less common words), or the parts of speech of words following pause fillers.”]
determining the pulmonary condition based on analysis of pause duration and pause pattern of the at least one pause. [Kim does not mention pulmonary conditions.]
Lotan teaches:
identifying at least one pause within the phrase; and [Lotan considers the number of pauses as an additional parameter in evaluation of the pulmonary condition:  “[0081] Alternatively or additionally, the processor may assess the status of the subject's condition in response to other speech-related parameters extracted from signal 36. … Various other parameters that may be identified from the speech of the subject, and used to assess the subject's condition, include a period of time between breaths, a number of pauses in the speech, loudness of the speech, vocal irregularities, a number of stutters, and an exhalation length, as well as any relevant frequency or pitch parameters. As an example of the latter, the subject's condition may be assessed based on the subject's inability to produce low frequencies, or based on observed changes in distortion.”]
determining the pulmonary condition based on analysis of pause duration and pause pattern of the at least one pause. [Lotan suggests this limitation because pause duration and pattern are characteristics of pause which is evaluated by Lotan but are not expressly mentioned.  The number of pauses can be taken to teach “pause pattern.”]]

Hauptman pertains to cardiac conditions.
Hoffmeister does not pertain to pulmonary conditions.

Michaelis more expressly teaches:
determining the pulmonary condition based on analysis of pause duration and pause pattern of the at least one pause. [Michaelis teaches that a pause of particular duration / length indicates a pulmonary condition:  “9. The method of claim 1, wherein the first and second states reflect a pulmonary state of the user and wherein step (b) comprises the substep: (B1) detecting a pause of at least a predetermined length between temporally adjacent utterances of the user, the detected pause being assumed to be associated with a transition between temporally adjacent inhalation and exhalation cycles, wherein the approximate time is measured relative to a selected part of the pause based on the assumption that, at the selected part of the pause, the user's lungs are at or near full capacity.”  Col. 10, lines 4-14.]
Kim, Lotan, Hauptman, Hoffmeister and Michaelis pertain to determining health conditions based on voice analysis  which would include breathing and pausing patterns (see Kim above) and it would have been obvious to combine the more detailed features of evaluating pause duration with the system of the combination that does not expressly sate this particular feature of a pause for completeness and as combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claims 8, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim, Lotan, Hauptman, Hoffmeister and Schmidt (U.S. 20110184250) in view of Abeyratne (U.S. 2019/0080803).
Regarding Claim 8, Kim or Hoffmeister teaches:
the baseline model outputs one or more estimated features that are provided back to the baseline model as input. [Kim, Figure 5, “model training 550” in a loop that receives “feature selection 540.”  “[0102] Model training component 550 may then train a mathematical model using the selected features. For example, model training component 550 may iterate over the speech data items of the training corpus, obtain the selected features for the speech data items, and then train the mathematical model using the selected features….”] [Hoffmeister as shown with respect to Claim 1 teaches the update of a personalized model and update requires looping that is claimed.]
Kim, Lotan, Hauptman, Hoffmeister discuss the specific case of analyzing coughs in detail.
Schmidt teaches the personalized model and also mentions that chronic cough is one of the types of diseases that it detects.
Schmidt teaches:
8. The method of claim 7, further comprising: 
detecting a cough from the utterance; and [ Schmidt, Figure 3,  “[0012] FIG. 3 is a simplified flow chart illustrating the operation of patient-optimized detection, trending, and training firmware for cough detection”   “[0038] The patient monitoring and/or feedback device 104 can be a medical monitoring device such as cough and wheeze detection devices ….”]
analyzing features of the cough; [Schmidt, examples of features that are analyzed:  “[0047] For example, a 20 second audio capture of breathing and talking a standard assessment sentence can be analyzed, e.g., for amplitude, pitch, shortness of breath, cadence of speech, and compared with population and individual benchmarks as a leading indicator of worsening symptoms for that individual….”  “[0048] The phone input comprises segments of audio, video, motion, or activity, and this information is sent to the remote system for analysis. A voice analysis using pitch and amplitude perturbation features, and a set of measures of the harmonic-to-noise ratio are extracted from the transmitted speech files. Features are extracted and classified using known methods including those developed by http://www.voxpilot.com. These feature sets are used to test and train automatic classifiers ….”]  
wherein determining the pulmonary condition is further based on the analysis of the features of the cough, and [Schmidt determines a disease based on the analysis of the cough.  See [0047] above.  See also:  “[0044] … Cough comprises individual explosive sounds collected with a relative amplitude and frequency for each person over time. This data can be used to train the statistical detection model of the characteristics of cough sounds and audio background sounds….”  “[0045] The coughs per person per time unit are measured and compared to control and chronic cough patients for healthy cough range and an alert range indicative of a loss of healthy range cough frequency. FIG. 3 is a simplified flow chart illustrating the operation of patient-optimized detection, trending, and training firmware for cough detection.” ]
the baseline model outputs one or more estimated features that are provided back to the baseline model as input.  [Schmidt teaches that it trains its model using collected data and feedback from the patient because it develops personalized models for each patient:  “[0044] … This data can be used to train the statistical detection model of the characteristics of cough sounds and audio background sounds. Additionally, the firmware may be updated with further refined detection, thresholds, and analysis routines from the computer system..”  “[0040] …develop an individual profile for normal and abnormal activity. The delta between normal and pathology-induced detected changes is established using a personal best normal longitudinal baseline for the individual using both literature lookup of normal device reports and feedback from the person (e.g., I am feeling good) to establish the normal baseline. Literature and feedback (e.g., I had symptoms and/or a disease episode) is used to identify behavior, activity, voice, cough frequency, sleep pattern, etc., to establish pathology induced variation from normal. The baseline and disease characterized deviations from baseline are used to establish a personal probabilistic model and their conditional dependencies to determine the likelihood that a measured delta is indicative of a future disease exacerbation event.”]
Kim, Lotan, Hauptman, Hoffmeister, and Schmidt pertain to determining pulmonary conditions based on voice analysis and Harper teaches that coughing frequency and severity may be detected from voice biomarkers.  Schmidt expressly teaches analysis of the voice for cough features and it would have been obvious to add the explicit cough analysis of Schmidt to the combination for more detailed steps regarding cough analysis and as combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.


    PNG
    media_image5.png
    755
    360
    media_image5.png
    Greyscale


Claim 15 is a device claim with limitations corresponding to the limitations of Claims 8 and is rejected under similar rationale.  

Claim 20 is a computer program product system claim with limitations corresponding to the limitations of method Claim 8  with one limitation from Claim 7 tucked in there and is rejected under similar rationale.  
20. The non-transitory processor-readable medium of claim 19, wherein the method further comprises: 
detecting a cough from the utterance; and [Claim 8]
analyzing features of the cough; [Claim 8]
wherein determining the pulmonary condition is further based on the analysis of the features of the cough, [Claim 8]
the opportunistic physiological data is received from a smart electronic device, and [Claim 7]
the baseline model outputs one or more estimated features that are provided back to the baseline model as input. [Claim 8]

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Kim/Lotan/Hauptman/Hoffmeister in view of Michaelis and Farrus.
Claim 12 is a device claim with limitations corresponding to the limitations of Claims 5 and adds the Longitudinal model of Claim 4 and shimmer and jitter definitions from Claim 3 and is rejected under similar rationale.  Rationale of combining Farrus with the remaining references is as provided for Claim 3 and to provide an express definition which is implied in Kim.
12. The electronic device of claim 10, wherein;
the process is further configured to:
identify at least one pause within the phrase; [Claim 5, Kim and Lotan both teach pauses.]
determine the pulmonary condition based on analysis of pause duration and pause pattern of the at least one pause; [Claim 5, Lotan determines pulmonary condition.]
insert new selected phrases and new contextual information into a personalized longitudinal model to capture additional scenarios, wherein the personalized longitudinal model captures personalized correlation and dependencies between selected phrases; [ Claim 4, Hoffmeister collects data of a particular user over time/longitudinal to generate and update its model.]
jitter comprises consistency of localized periodicity for voiced sounds; and  [Claim 3, Farrus teaches definition of jitter.]
shimmer comprises consistency of localized vocal amplitude for the voiced sounds. [Claim 3, Farrus teaches definition of jitter.]

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Kim/Lotan/Hauptman/Hoffmeister and further in view of Shrivastav and Farrus.
Claim 18 is a computer program product system claim with limitations corresponding to the limitations of method Claims 4 and 5.  Hoffmeister collects data of a particular user over time/longitudinal to generate and update its model.  Shrivastav teaches both the use of duration of pauses and sentence structure for determining impairment.  (Michaelis which was applied to Claim 5 for the teaching of duration of pause is not required for the rejection.)
Shrivastav teaches:
18. The non-transitory processor-readable medium of claim 17, wherein the method further comprises: 
inserting new selected phrases and new contextual information into a personalized longitudinal model to capture additional scenarios, wherein the personalized longitudinal model captures personalized correlation and dependencies between selected phrases; [Shrivastav teaches the training of a model which includes the specifics of this limitation.  Context is taught by the “various biomarkers.”  Data of a patient/user is collected over time/longitudinally to observe the variations over time.  See 70, 81, 83.  “[0032] Using acoustic measures as a biomarker involves evaluating changes in various aspects (or subsystems of speech) over time….”]
retrieving an expected position of at least one speech segment based on a knowledge base; [Shrivastav performs ASR in order to determine the word and phrases and the syntax of the sentence which indicates the position and location of each part of speech in a sentence and by evaluating the sentence structure determines the presence of a disfluency which indicates impairment.  See [0032] for repetition of the words which is another teaching of “expected position” of the Claim.]
determining position of the at least one speech segment based on the expected position and generate the at least one speech feature for the at least one speech segment; [Shrivastav, “[0042] Similarly to PD, the biomarkers for Alzheimer's disease may include the measures described above as well as detailed analyses of the speaker's language characteristics. These may be determined through analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.), grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words….”]
identifying at least one pause within the phrase; and [Shrivastav, “[0032] … The analyses may also include, but is not limited to, analyses of the number of words spoken, the types of words (e.g. nouns, verbs, adjectives, articles, etc.) grammatical complexity of the phrases and/or sentence, the number of occurrence of specific words/phrases in conversation, or instances of dysfluencies such as pauses, hesitations or repetitions of words or part-words….”  “[0041] In accordance with an embodiment of the invention, one or more acoustic measures for Parkinson's disease can include, but are not limited to, fundamental frequency (F.sub.0), voice onset time, pause duration, and/or changes in F.sub.0; voice onset time, and/or pause duration;….”]
determining the pulmonary condition based on analysis of pause duration and pause pattern of the at least one pause; [Shrivastav, “[0041] In accordance with an embodiment of the invention, one or more acoustic measures for Parkinson's disease can include, but are not limited to, fundamental frequency (F.sub.0), voice onset time, pause duration, and/or changes in F.sub.0; voice onset time, and/or pause duration;….”]
wherein jitter comprises consistency of localized periodicity for voiced sounds, and shimmer comprises consistency of localized vocal amplitude for the voiced sounds.
Kim/Lotan/Hauptman/Hoffmeister and Shrivastav pertain to determining pulmonary conditions that may impact cognitive ability or neurological diseases that directly impact cognitive ability and it would have been obvious to combine the use of disfluencies including pauses and poor sentence structure (position and location of phrases and sentence segments) from Shrivastav with the system of combination that evaluates the language of the user for content and context in order to add another measure for determining some type of impairment and as combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
The definition of jitter and shimmer is not included in Kim.
Farrus teaches:
wherein jitter comprises consistency of localized periodicity for voiced sounds, and shimmer comprises consistency of localized vocal amplitude for the voiced sounds. [Farrus as applied to Claim 3 teaches this limitation which is a mere definition.]
Rationale for combination as provided for Claim 3.  This limitation provides a definition and should be understood from a reference that teaches the use of these two parameters.  An express reference was added for completeness.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
“In electronics and telecommunications, jitter is the deviation from true periodicity of a presumably periodic signal, often in relation to a reference clock signal. In clock recovery applications it is called timing jitter.[1] Jitter is a significant, and usually undesired, factor in the design of almost all communications links.”  Wikipedia, 16:20, 28 December 2017‎.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659