Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-28 are pending. Claims 1-18 are under examination and the remainder withdrawn from consideration pursuant to a Restriction Requirement.   Claims 1 and 10 are independent.
This Application was published as U.S. 20200380957.
            Apparent earliest priority: 30 May 2019 (3 different provisionals with the most recent 1 May 2020).
	Instant Application and Claims refer to a term:  “Functionals.”  Note the definition in the Specification:
[0026] In step 30, the system 10 summarizes the features to generate variables that describe the speaker. For example, the system 10 aggregates the features so that each resultant summary variable (referred to as "functionals" hereafter) is at a speaker level. The functionals are, more specifically, features summarized over an entire record. [0027] In step 32, the system 10 generates the predictive voice model 14. For example, the system 10 can generate a modeling dataset comprising tags together with generated functionals. The tags can indicate a speaker's gender, age, smoker status (e.g., a smoker or a non-smoker), etc. The predictive voice model 14 allows for predictive modeling of a smoker status, by using smoker status tags as the target variables, and other tags (e.g., gender, age, etc.) as predictive variables. The predictive voice model 14 can be a regression model, a support-vector machine ("SVM") supervised learning model, a Random Forest model, a neural network, etc.

From the above, it appears that “Functionals” refer to the person’s demographic information.  Define the term inside the Claim with particularity and consistent with the supporting Disclosure.

Note also:
[0003] In the machine learning space, there is significant interest in developing computer-based machine learning systems which can identify various characteristics of a person's voice. Such systems are of particular interest in the insurance industry. As the life insurance industry moves toward increased use of accelerated underwriting, a major concern is premium leakage from smokers who do not self-identify as being smokers. For example, it is estimated that a 60-year-old male smoker will pay approximately $50,000 more in premiums for a 20-year term life policy than a non-smoker. Therefore, there is clear incentive for smokers to attempt to avoid self-identifying as smokers, and it is estimated that 50% of smokers do not correctly self-identify on life insurance applications. In response, carriers are looking for solutions to identify smokers in real-time, so that those identified as having a high likelihood of smoking can be routed through a more comprehensive underwriting process. 
[0004] An extensive body of academic literature shows that smoking cigarettes leads to irritation of the vocal folds (e.g., vocal cords), which manifests itself in numerous changes to a person's voice, such as changes to the fundamental frequency, perturbation characteristics (e.g., shimmer and jitter), and tremor characteristics. These changes make it possible to identify whether an individual speaker is a smoker or not by analysis of their voice.


    PNG
    media_image1.png
    320
    611
    media_image1.png
    Greyscale



    PNG
    media_image2.png
    435
    665
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    416
    547
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    458
    664
    media_image4.png
    Greyscale

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7, 9-16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Barnett (U.S. 2020/0365275) in view of Aronowitz (U.S. 20200372899).
Please define the terms inside the Claim language particularity when the intent may deviate from the ordinary meaning in the art and the Specification is either vague or provides several contradictory definitions.

Regarding Claim 1, Barnett teaches: 
1. A machine learning system for detecting at least one voice attribute from input data, [Barnett, Figure 5, “learning algorithm” leading to the “classifier/ model.”] (NOTE:  Figure 1 of instant Application outputs voice attributes by the use of a “predictive voice model 14.”  Figure 2, generates the “predictive model.”  See Claims 8 and 9 for examples of “voice attributes.”)
comprising: 
a processor in communication with a database of input data; and [Barnett, teaches a trained classifier whose training data is obtained from various types of public and private databases.  This limitation refers to a database of “input data” that are “inputs” to the model to be evaluated by the model and not “inputs” for the training of the model.  Figure 5 of Barnett shows two types of data:  “Training Data” and “New Input” both of which are ultimately reused for the training/retraining/validation of the model.  The “training data” comes from public and private “databases.”  The “processor” of the device that utilizes the “classifier/model” is “communication” with the “training data” and thus the relevant “database.”  See [0042] below. ]
a predictive voice model executed by the processor, [Barnett, Figure 5, “classifier/ model.”  Figure 4, “AI Classifier 34,” which is executed by a processor on a computing device such as a user mobile device:  “[0117] … Different AI classifiers 34 may be used, for example, a Gaussian classifier, a nearest-neighbour classifier, a neural network or sparse partial least square model may be used for different sets of features….”  “[0042] A third aspect of the invention provides a mobile computer device for use in the method of the second aspect comprising: one or more processors ….”] (Note the supporting Specification:  “[0027] … The predictive voice model 14 can be a regression model, a support-vector machine ("SVM") supervised learning model, a Random Forest model, a neural network, etc.”)
the predictive voice model:
receiving the input data from the database; [Barnett, Figure 4, “Raw Audio Data” as input to the pipeline of analysis.]  [Barnett, Figure 5, “new input.”  Figure 2 shows the “Raw Audio Save in Database” which indicates that the voices of participants are saved in a database. Figure 3 shows the collection of “Raw Audio Data” from a “participant.”] (Instant Specification:  “[0007] …An audio sample of a person is obtained from one or more sources, such as pre-recorded samples (e.g., voice mail samples) or live audio samples recorded from the speaker….”)
processing the input data to identify a speaker of interest from the input data; [Barnett, Figure 5, “new input.” Figure 1, the “participant” / “speaker of interest” is known and therefore the step of identification does no occur in Barnett.] (Instant Specification:  “[0006] …The system then isolates a speaker of interest by performing a speaker diarization which partitions an audio stream into homogeneous segments according to the speaker identity….”)
isolating one or more predetermined sounds corresponding to the speaker of interest; [Barnett, Figure 4, “Speech Parser 31.”  “[0110] As shown in FIG. 4, the audio data may be fed into a speech parser module 31. This module incorporates an algorithm to detect portions with certain durations or features within the full audio (e.g. full utterance, sentence, words, syllables) and either label these or segment these…..”] (Instant Specification:  “[0006] …Next, the system isolates predetermined sounds from the isolated speech of the speaker of interest, such as vowel sounds, to generate features….”)
generating a plurality of vectors from the one or more predetermined sounds; [Barnett does not teach generating vectors.  However, it teaches extracting features and the features are generally extracted as feature vectors.]
generating a plurality of features from the one or more predetermined sounds; [Barnett, Figure 4, “Feature Selector 32.”  “[0112] As shown in FIG. 4, the speech parser module 31 may then feed these segments into a feature selector 32, which extracts acoustic features. These features are extracted and analysed on a frame-by-frame basis as well as at the full utterance level….”  Feature in the art refers to “acoustic features.”  ] [Barnett, Figure 5, “feature extraction and labeling.”]  [Barnett, Figure 1, “feature extraction 32, 33.”] (Instant Specification:  “[0029] … Additionally, the systems and methods disclosed in the present disclosure can be adapted to detect further features of a speaker, such as age, deception, depression, stress, general pathology, mental and physical health, diseases (such as Parkinson's), and other features.”)
processing the plurality of features to generate a plurality of variables that describe the speaker of interest; and [Barnett, Figure 4, “and “Feature Reduction 33.”  Only those features that can lead to useful information are kept:  “[0113] … The features of interest in the first task may relate to the low-level acoustic properties of each discrete word uttered, whereas the features of interest in the second task may include paralinguistic features such as speech rate, number of pauses, breathing, stuttering etc….”] (Instant Specification:  “[0007] …system then summarizes the features to generate variables that describe the speaker. ….”  Additionally, “variables that describe the speaker” are “functionals” or speaker demographics like his age and gender.)
processing the plurality of variables and vectors to detect the at least one voice attribute. [Barnett, Figure 4, the “AI Classifier 34” generates the “output report” which is based on acoustic features (feature vectors) such as: “pitch, intensity, formant frequencies, glottal flow, speech duration, speech rate, and voice quality” and indicates the “voice attributes” that indicate the physiologic state of the speaker:  “[0120] The analysis module 3 may optionally output its determination as a report. The output of the invention may report any or all of the following: [0121] Participant's physiological state at that particular point in time. ….”   “42. The system of claim 37, wherein the characteristics indicative of physiological state include: pitch, intensity, formant frequencies, glottal flow, speech duration, speech rate, and voice quality ….”  “[0060] Alternatively, or additionally, the system may determine the cognitive load of a task based on the response detected by the response detection module 2. For example, the analysis module 3 may determine the cognitive load of a task based on characteristics of the spoken response. The characteristics may include the same characteristics as those used to determine the physiological state of the subject, e.g. pitch, intensity, formant frequencies, glottal flow, speech duration, speech rate, and voice quality….”] [Barnett, Figure 5, “output (classification/prediction).”] [Barnett, Figure 1, “AI Classifier 34.”] (“Voice attributes” are defined as voice characteristics like pitch and frequency in the dependent Claim 6.)

Barnett arguably teaches receiving its input voice from a database but a more express reference is also provided.
Barnett does not teach speaker identification from the voice data.
Barnett does not expressly teach the extraction of features as feature vectors.

Aronowitz teaches:
1. A machine learning system for detecting at least one voice attribute from input data, [Aronowitz is directed to:  “There is provided a computer implemented method of presenting color coded text generated from an audio track of a video, the color coding denoting respective speakers …”  Where “[0051] Referring now back to FIG. 1, at 102, mapping function is provided and/or trained. The mapping function maps a certain feature vector to a color space…” and “[0056] The mapping function is trained using a training dataset that includes different speakers and corresponding feature vectors….”]
comprising: 
a processor in communication with a database of input data; and [Aronowitz, Figure 2, “processors 202” in communication with “memory 206” and “data storage devices 208.”  The “database of input data” is taught by the “training dataset” of Aronowitz.]
a predictive voice model executed by the processor, [Aronowitz, Figure 2, “Mapping Code 206B.”]
the predictive voice model: 
receiving the input data from the database; [Aronowitz, Figure 1, “receive audio track 104.” “[0062] At 104, the audio track is received. The audio track may be of a video that includes a visual component. ….”  The “audio track” is not contemporaneous and must be stored and thus teaches receiving the input data from a database.]
processing the input data to identify a speaker of interest from the input data; [Aronowitz, Figure 1, “divide audio track into segments 106” which may be a process of diarization which identifies the speaker that spoke each segment:  “[0065] At 106, the audio track is divided into multiple audio-segments. Each audio segment represents (i.e., depicts) speech spoken by a respective speaker. There are multiple different speakers speaking in the audio track.”  “[0066] The audio track may be automatically divided into the audio-segments, for example, by a speaker diarization process.”]
isolating one or more predetermined sounds corresponding to the speaker of interest; [Aronowitz teaches diarization (see Figure 3) which separates the speech/sound corresponding to each speaker.  See [0066].]
generating a plurality of vectors from the one or more predetermined sounds; [Aronowitz, Figure 1 and for each segment:  “extract feature vector from audio segment 110.”  “[0069] At 110, a feature vector is extracted from the audio-segment. The feature vectors may be designed for speaker verification. Exemplary feature vectors include: i-vector …  x-vector … and d-vector.”]
generating a plurality of features from the one or more predetermined sounds; [Aronowitz, Figure 1 and for each segment: “receive attributes of speech 116.”  “[0079] At 116, one or more attributes of the speech of the audio-segment are received and/or detected….”  “[0080] The attributes of speech may denote the manner of speech, for example, speaking style, whisper, shouting, fast speech, and slow speech. The attribute of speech may reflect the emotions of the speaker, for example, calm, anger, excitement, sadness, and coldness.”   “[0061] …Exemplary attributes of speakers include: gender, good character, bad character, occupation, rank (e.g., king, commander), and wealth (e.g., rich or poor)….”] [ “Attribute” of Aronowitz fits the definition of “feature” in the instant Specification.]
processing the plurality of features to generate a plurality of variables that describe the speaker of interest; and [Aronowitz, Figure 1, “Map Feature Vector into color space 112.”  “[0061] …The mapping function may be trained according to the respective attribute of the speaker. For example, the color for each speaker may be selected according to the attribute(s) of the respective speaker. For example, an evil character is assigned the color red (i.e., text corresponding to speech by the character is colored red), and a good character is assigned the color blue. In another example, a wealthy character is assigned the color yellow.”]
processing the plurality of variables and vectors to detect the at least one voice attribute. [Aronowitz, Figure 3, the color and font used for each speaker 304A, 304B, indicates their “voice attribute.”  See [0106].]
Barnett and Aronowitz pertain to systems for assessing voice and sound and it would have been obvious to combine the diarization of Aronowitz which is applied to a stored track of audio or video and separates the speakers identified in the track with the system of Barnett to apply the analysis of Barnett to audio that includes mixture of speeches of different speakers, as a precursor, and to modify Barnett to specify that the acoustic features are extracted as feature vectors such as i-vectors which are commonly used for analysis of voice of a speaker in order to provide a specific method of voice analysis better suited to task at hand.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 2, Barnett teaches:
2. The system of claim 1, wherein the predictive model processes one or more of demographic data, voice data, credit data, lifestyle data, prescription data, social media data, or image data. [Barenett teaches that the prior art uses voice in conjunction with demographics and video data to predict disease: “[0020] Similarly, artificial intelligence (AI) based models have been used to infer physiological state from voice alone as well as in conjunction with other variables (e.g. video recordings, demographic variables, disease specific variables, and cognitive performance) in other areas of neurology and psychiatry ….”  Barnett teaches that its model also incorporates demographics:  “[0040] …  Therefore, this signal can be added to cognitive performance scores and other known predictors of physiological state (e.g. clinical scores, demographics) to improve a predictive model of physiological state and neurological or neuropsychiatric disorder.”  “[0116] The analysis module 3 may then combine cognitive and/or clinical scores output by the system, demographic features and any other externally-known information (e.g. diagnosis, biometric data, brain imaging data), technical input features of the devices used to record, and the raw audio signal or the feature selector output (i.e. analysed audio features). This can happen in real time or later; in the device or the cloud.”  “[0146] … An AI algorithm then combines demographic features and any other externally-known information, such as technical input features of the devices used to record.”  See also [0147] and [0169].]

Regarding Claim 3, Barnett teaches extraction of features which are generally feature vectors but does not mention i-vector which is the type of feature vector normally used for speaker identification.
Aronowitz teaches:
3. The system of claim 1, wherein the plurality of vectors comprises a plurality of i-Vectors. [Aronowitz, Figure 1 and for each segment:  “extract feature vector from audio segment 110.”  “[0069] At 110, a feature vector is extracted from the audio-segment. The feature vectors may be designed for speaker verification. Exemplary feature vectors include: i-vector …  x-vector … and d-vector.”]
Rationale for combination as provided for Claim 1.

Regarding Claim 4, Barnett teaches “[0008] … The relevant population may be selected by reference to age, gender and known medical conditions.”  “[0125] The audio data may be pre-processed using normalisation methods to exclude variations related to age, gender and to extract low-level features such as energy, intensity, pitch, formants, glottal flow, speech duration and rate, voice quality and spectral shape descriptors….”  “[0129] In one example, the training data may be obtained by testing participants using a special variant of the front-end system that has been configured to optimise the quality of the training data. Optimisations may include increasing the length of the testing session, the dynamic range of the task parameters, manipulation of the feedback provided to subjects, and specific selection of the testing battery optimised for a subsample of the population characterised by age, gender, education, occupation, physiological and/or disease state.”  Thus, Barnett takes into account what this Application refers to as “Functionals” and generally refers to “demographic data” both for the training of its model and for normalizing the input data to remove impacts of demographics and optimize for cognitive impairment condition that is the target of the voice analysis.  Barnett does not teach that it looks for the demographic information in the voice.
Aronowitz teaches:
4. The system of claim 3, where the plurality of variables comprises a plurality of functionals that describe the speaker of interest. [Aronowitz, Figure 1 and for each segment: “receive attributes of speech 116.”  The “attributes” of Aronowitz include “demographic” information which teaches the “functionals” of the Claim.  “[0079] At 116, one or more attributes of the speech of the audio-segment are received and/or detected….”  “[0080] The attributes of speech may denote the manner of speech, for example, speaking style, whisper, shouting, fast speech, and slow speech. The attribute of speech may reflect the emotions of the speaker, for example, calm, anger, excitement, sadness, and coldness.”   “[0061] …Exemplary attributes of speakers include: gender, good character, bad character, occupation, rank (e.g., king, commander), and wealth (e.g., rich or poor)….” “[0061] …The mapping function may be trained according to the respective attribute of the speaker. For example, the color for each speaker may be selected according to the attribute(s) of the respective speaker. For example, an evil character is assigned the color red (i.e., text corresponding to speech by the character is colored red), and a good character is assigned the color blue. In another example, a wealthy character is assigned the color yellow.”]
Rationale for combination as provided for Claim 1.  Barnett and Aronowitz could be combined such that the combined system actually looks for and generates the other types of information that affects the voice of the speaker.

Regarding Claim 5, Barnett teaches:
5. The system of claim 4, wherein the predictive voice model processes the plurality of iVectors and the plurality of functionals to detect the at least one voice attribute. [Barnett, Figure 4, the “AI Classifier 34” generates the “output report” which is based on acoustic features such as: “pitch, intensity, formant frequencies, glottal flow, speech duration, speech rate, and voice quality” and indicates the “voice attributes” that indicate the physiologic state of the speaker. See [0120] and [0121].]
Barnett does not teach the use of iVectors.
Aronowitz teaches:
wherein the predictive voice model processes the plurality of iVectors and the plurality of functionals to detect the at least one voice attribute. [Aronowitz, Figure 1, “Map Feature Vector into color space 112” is based on the character/functional of the speaker.  Additionally, the vectors of Aronowitz can be ivectors.  “[0061] …The mapping function may be trained according to the respective attribute of the speaker. For example, the color for each speaker may be selected according to the attribute(s) of the respective speaker. For example, an evil character is assigned the color red (i.e., text corresponding to speech by the character is colored red), and a good character is assigned the color blue. In another example, a wealthy character is assigned the color yellow.”]
Rationale for combination as provided for Claim 1.  Barnett and Aronowitz could be combined such that the combined model uses both the voice characteristics (feature vectors) and demographics (functionals) of the speaker to determine his physiologic condition (voice attribute) as reflected in his voice.

Regarding Claim 6, Barnett teaches:
6. The system of claim 1, wherein the at least one voice attribute comprises one or more of frequency, perturbation characteristics, tremor characteristics, duration, or timbre. [Barnett, “[0087] Audio data corresponding to responses may be analysed with respect to three general types of features: 1) paralinguistic features, 2) prosodic features related to pitch, 3) voice quality features. … [0088] Pitch is the psychological perception of changes in frequency. For example, increase in frequency is perceived as rise in pitch. The pitch of a complex tone (speech) corresponds to the fundamental frequency (f0). Pitch also reflects the frequency of vibrations of the vocal chords during speech production. For example, a question has a rising pitch, whereas a statement or declaration has falling pitch. Various statistics of pitch, which correspond to different features of the speech signal, will also be measured: [0089] fundamental frequency (f0) [0090] f0 mean, SD, range, median [0091] f0 slope (e.g. rising, falling, flat)…..”]

Regarding Claim 7, Barnett teaches:
7. The system of claim 1, wherein the plurality of features comprise mel-frequency cepstral coefficients. [Barnett, “[0087] Audio data corresponding to responses may be analysed with respect to three general types of features: 1) paralinguistic features, 2) prosodic features related to pitch, 3) voice quality features. Established methods in the art may be used to extract these signals, such as mel-frequency cepstral coefficients (MFCC) analysis, Perceptual Linear Prediction (PLP), and Linear Predictive Coding (LPC) (Huang, Acero, and Hon, 2001)….”]

Regarding Claim 9, Barnett teaches:
9. The system of claim 1, wherein the at least one voice attribute indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren's syndrome, arthritis, dementia, Parkinson's disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person's voice. [Barnett:  “[0147] In another example, when the dataset comprises of one or multiple data points in a variety of people, some of whom may have a known medical diagnosis (between-subject paradigm), a digital platform (computer, cloud, server) receives audio samples and speech signal processing occurs, which include extraction of audio signals under different conditions that is similar across all participants (e.g. mood, cognitive state) and participant group association (e.g. patient or control). An AI algorithm then combines demographic features and any other externally-known information, such as technical input features of the devices used to record.”  “[0148] The present invention may provide a system and method to classify physiological state, including symptoms of neurological disorder or neuropsychiatric disorder in a subject. ….” “[0149] Accordingly, the physiological state determined by the analysis module may relate to one or more of: pain, dizziness, stress, anxiety, alertness, fatigue or sedation. …..” “[0150] Alternatively, or additionally, the physiological state determined by the analysis module may relate to a neurological or neuropsychiatric disorder. …”  “[0151] Non-limiting examples of neurological or neuropsychiatric diseases, disorders or conditions referenced herein includes without limitation brain cancers, dementia, mild cognitive impairment, epilepsy, Alzheimer disease, Parkinson disease, multiple sclerosis, depression, schizophrenia, ADHD, PTSD, bipolar disorder, tic disorders (including Tourette's syndrome), OCD, anxiety disorders (including phobias and social anxiety disorder), Autism Spectrum Disorder, addiction, eating disorders, neuropathy, aphasia.”]

Claim 10 is a method claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
10. A machine learning method for detecting at least one voice attribute from input data, comprising the steps of: 
receiving input data from a database; 
processing the input data to identify a speaker of interest from the input data; 
isolating one or more predetermined sounds corresponding to the speaker of interest; 
generating a plurality of vectors from the one or more predetermined sounds; 
generating a plurality of features from the one or more predetermined sounds; 
processing the plurality of features to generate a plurality of variables that describe the speaker of interest; and 
processing the plurality of variables and vectors to detect the at least one voice attribute. 

Claim 11 is a method claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 12 is a method claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 13 is a method claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 14 is a method claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 15 is a method claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 16 is a method claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.
Claim 18 is a method claim with limitations corresponding to the limitations of Claim 9 and is rejected under similar rationale.

Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Barnett and Aronowitz in view of Zhang (U.S. 10825564).
Regarding Claim 8, Barnett teaches:
8. The system of claim 1, wherein the at least one voice attribute comprises an indication of whether an individual is a smoker. [Barenett teaches that “smoking” is a factor that can cause variation in speech signal between human speakers: “[0022] Variation in speech signal in humans can be produced by a number of factors: [0023] A) Variation between speakers due to biological variation such as sex, age, characteristics relating to the size and shape of the voice anatomy, voice disorders, smoking”.  Barnett corrects for variations that may result from “smoking” in order to focus on variations that indicate cognitive impairment.  Accordingly, Barnett teaches that the “voice attributes” or demographic data of a speaker include his gage and sex as well as his smoking status.]
Barnett does not teach that it is using the voice data in order to detect whether the speaker is a smoker. 
Neither does Aronowitz.
Zhang teaches:
wherein the at least one voice attribute comprises an indication of whether an individual is a smoker. [Zhang teaches that speaking voice of a person can be used to detect whether this person is a smoker:  “… Moreover, the statistical model may be generated based on voice data from the training subjects. Some of the videos may include an audio component that includes a training subject's voice. Voice data may be indicative of certain biometric characteristics, such as smoking status. The voice data may include several voice components extracted from a training subject's speech such as frequency, pitch, intensity, tone, etc., which may be used as acoustic vectors in the statistical model. For example, a frequency analysis of the voice data may be performed (e.g., using a Fast Fourier Transformation (FFT) or other frequency transform) to identify the voice components. In this example, the statistical model for determining smoking status may be based on feature data and movement data from face frames as well as voice data including acoustic vectors. More specifically, the input nodes to the neural network for determining smoking status may include feature vectors for image features included in face frames, movement vectors indicative of the rate of change of the position or size of image features included in the face frames, and acoustic vectors indicative of the user's voice.”  Col. 13, lines 33-61.]
Barnett, Aronowitz, and Zhang pertain to detecting physiologic condition or identity of a speaker from his voice and it would have been obvious to add the detection of smoking status as a type of physiologic condition from Zhang to the system of combination such that this additional condition may also be detected from the voice of a speaker. This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 17 is a method claim with limitations corresponding to the limitations of Claim 8 and is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hauptman (U.S. 20190362740) detects cardiac condition from analysis of voice.
Quatieri (U.S. 20170053665) assesses disorders through speech and a computational model.
MacAuslan (U.S. 20120071777) detects respiratory illness from cough analysis.
Barnett, training of the classifier using public and private databases:  “[0139] In another example, the training data may be obtained by data-mining existing speech corpora of healthy individuals performing cognitive tasks of varying cognitive load, as well as speech corpora of speech samples from patients with known medical conditions. These speech samples of patients may be found within the public domain (such as YouTube.RTM.) or acquired via collaborations with academic institutions and not-for-profit organisations such as Research and or Patient Support Charities. Physiological state labels in these data sets may be obtained through a combination of any or all of: [0140] Computation of delta features within each individual [0141] Known labels determined during data collection by owners of the databases”  “[0143] As illustrated in FIG. 5, to train a back-end AI system, the AI system receives input from external resources, i.e. data not generated by the front-end module. Examples of such external data include, but is not limited to, speech samples from existing speech corpora of healthy individuals performing cognitive tasks of varying cognitive load, or speech corpora of speech samples from patients with known medical conditions.”  Barnett is directed to a medical study with participants whose voices are recorded and subsequently used for training.  “[0061] In a study directed to this feature, sixty participants aged 21 to 78 completed an automated verbal test of working memory. … Audio features extracted from each response were normalised for each participant, expressing within-subjects differences in vocal features across trials of varying load….”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659