Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1222 are pending. Claims 2, 9, and 18 are independent.
This Application is published as U.S. 20200211555.
Apparent priority 28 February 2018.
This Application is a continuation of application no. 15/908428, issued as U.S. 10,235,998, which is a continuation of application no. 16/286,986 issued as U.S. 10,573,314.  A Terminal Disclaimer over the terms of both patents issued to the parent applications is required as set forth below.
The “natural language module” recited in Claim 1 is not interpreted under 35 U.S.C. 112(f) because it is a part of the software being executed by the “computer readable memory” and is interpreted as software.  The “module” in Claims 7 and 15 is subject to indefiniteness rejection.
In general: please use solid black lines for patent drawings and refrain from the use of dotted lines or shading.
Claim Objections
Claims 2 is objected to for informalities arising from omission of connectors “and” or “or.”  Placement or omission of connectors can change the meaning of the Claim.
2. An electronic device configured to process audible expressions from users, comprising: 
a network interface; 
at least one computing device; and 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: 
receive, over a network via the network interface, a digitized human vocal expression of a first user from a first source; 
process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain; 
use the processed digitized human vocal expression to determine characteristics of the human vocal expression by at least: 
determining a power spectrum of the human vocal expression[[;]] , and
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression; 
use a natural language module to: detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; 
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first 
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a third weight, the detected grammar violations; 
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted third identified change with respect to the detected grammar violations; and 
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken, the first action comprising causing a vehicle to be prevented from being drivable or flyable.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 2-22 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 10,573,314 as shown below. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following mapping:
Instant Application
Reference Patent 10,573,314
2. An electronic device configured to process audible expressions from users, comprising: 
1. An electronic device configured to process audible expressions from users, comprising: 
a network interface; 
a network interface; 
at least one computing device; and 
at least one computing device; and 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to:
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to:  
receive, over a network via the network interface, a digitized human vocal expression of a first user from a first source; 
receive in real time, over a network via the network interface, a digitized human vocal expression of a first user and one or more digital images from a remote device; 
process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain;
process, remotely from the remote device, the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain; 

use the processed digitized human vocal expression to determine characteristics of the human vocal expression by at least: 
determining a power spectrum of the human vocal expression; 
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to


determine how rapidly the first user is speaking in the human vocal expression; 
use the processed digitized human vocal expression to determine characteristics of the human vocal expression, including: 
(determining, using a vocal tract analysis module, a magnitude spectrum of the human vocal expression, and )
(identifying, using a non-speech analysis module, pauses and the length of pauses in speech in the human vocal expression;)


determining, using a volume analysis module a volume of the human vocal expression, 
determining, using a rapidity analysis module that detects quiet time using a power spectrum of the human vocal expression, how rapidly the first user is speaking in the human vocal expression, 
determining, using a vocal tract analysis module, a magnitude spectrum of the human vocal expression, and 
identifying, using a non-speech analysis module, pauses and the length of pauses in speech in the human vocal expression;
use a natural language module to: 







detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; 
use a natural language module to: identify phonemes in the human vocal expression and map the phonemes to words, to convert audible speech in the human vocal expression to text, divide the text into text elements including words, sentences, and paragraphs, understand audible speech in the human vocal expression using semantic analysis that assigns respective logical and grammatical roles to the text elements, and detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes; 
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;

process the received one or more images to detect characteristics of the first user face, including determining the presence of: a sagging lip, a crooked smile, uneven eyebrows, or facial droop;

compare the detected characteristics of the first user face with baseline, historical characteristics of the first user face accessed from a data store, and identify changes in characteristics of the first user face as identified facial changes;
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user;
weight, using a third weight, the detected grammar violations; 
weight, using a third weight, a third identified change, of the identified facial changes, with respect to a first characteristic of the first user face;

weight, using a fourth weight, a fourth identified change, of the identified facial changes, with respect to a second characteristic of the first user face;

weight, using a fifth weight, the detected grammar violations; 
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, 
the weighted second identified change with respect to the second vocal expression characteristic of the first user, and 
the weighted third identified change with respect to the detected grammar violations; and 
infer a change in health status of the first user using the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, the weighted third identified change with respect to the first characteristic of the first user face, the weighted fourth identified change with respect to the second characteristic of the first user face and the weighted detected grammar violations;
based at least in part on the inferred change in health status of the first user, 


based at least in part on the inferred change in health status of the first user determine if a vehicle is to be deployed to the first user; and at least partly in response to a determination that a vehicle is to be deployed to the first user, enable a vehicle to be deployed to a location of the first user.

cause a first action is to be taken, the first action comprising causing a vehicle to be prevented from being drivable or flyable.
9. The electronic device as defined in claim 8, wherein the electronic device comprises a vehicle, and the first action comprises causing the vehicle to be prevented from being drivable or flyable.
(Claim 9 does not depend from claim 1 but from 8 which is similar to claim 1 and further a combination of claim 1 with the reference cited in 103 for the teaching of the above limitation (Singhal) serves the same purpose.)


Claim 3 is taught by claim 2 of the reference patent.
Claim 4 is taught by claim 3 of the reference patent.
Claim 5 is taught by claim 4 of the reference patent.
Claim 6 is taught by claim 5 of the reference patent.
Claim 7 is taught by claim 6 of the reference patent.
Claim 8 is taught by claim 7 of the reference patent.

Instant Application
Reference Patent 10,573,314
9. An electronic device, comprising: 
8. An electronic device, comprising: 
a network interface; 
a network interface; 
at least one computing device; and
at least one computing device; and
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to:
access a digitized human vocal expression of a first user from a first source 


converted from a time domain to a frequency domain;
receive, over a network via the network interface, a digitized human vocal expression of a first user and one or more digital images of the first user from a first source; 
process, remotely from the first source, the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain;
use the converted digitized human vocal expression to determine characteristics of the human vocal expression, by at least:  
use the processed digitized human vocal expression to determine characteristics of the human vocal expression, including: 



detecting quiet time using identified pauses and length of pauses in speech in the human vocal expression, and

determining how rapidly the first user is speaking in the human vocal expression; 
determining a volume, magnitude, and a power spectrum of the human vocal expression, and 
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and the length of pauses in speech in the human vocal expression, and 
to determine how rapidly the first user is speaking in the human vocal expression;
use natural language processing to: 








detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;
use a natural language module to: identify phonemes in the human vocal expression and map the phonemes to words, to convert audible speech in the human vocal expression to text, divide the text into text elements including words, sentences, and/or paragraphs, understand audible speech in the human vocal expression using semantic analysis, and 
detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;

process the received one or more images to detect characteristics of the first user face, including determining the presence of: a sagging lip, a crooked smile, uneven eyebrows, or facial droop;

compare the detected characteristics of the first user face with baseline, historical characteristics of the first user face accessed from a data store, and identify changes in characteristics of the first user face as identified facial changes;
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user; 
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user;
weight, using a third weight, the detected grammar violations; 

weight, using a third weight, a third identified change, of the identified facial changes, with respect to a first characteristic of the first user face;

weight, using a fourth weight, a fourth identified change, of the identified facial changes, with respect to a second characteristic of the first user face; 

weight, using a fifth weight, the detected grammar violations;
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted detected grammar violations; and
 infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, 


the weighted third identified change with respect to the first characteristic of the first user face, the weighted fourth identified change with respect to the second characteristic of the first user face, and 
the weighted detected grammar violations; and 
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken.
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken.



Claim 10 is taught by claim 9 of the reference patent.
Claim 11 is taught by claim 10 of the reference patent.
Claim 12 is taught by claim 11 of the reference patent.
Claim 13 is taught by claim 12 of the reference patent.
Claim 14 is taught by claim 13 of the reference patent.
Claim 15 is taught by claim 14 of the reference patent.
Claim 16 is taught by claim 15 of the reference patent.
Claim 17 is taught by claim 16 of the reference patent.

Claim 18 is an independent method claim with limitations similar to the limitations of Claim 9 and is rejected under similar rationale. 
Claim 19 is a method claim with limitations similar to the limitations of Claim 10 and is rejected under similar rationale. 
Claim 20 is a method claim with limitations similar to the limitations of Claim 13 (or 5) and is rejected under similar rationale. 
Claim 21 is a method claim with limitations similar to the limitations of Claims 16 (or 8) and 17 and is rejected under similar rationale. Claim 21 is also similar to claim 24 of the reference.
Claim 22 is a method claim with limitations similar to the limitations of Claim 17 and is rejected under similar rationale. 

Claims 2-22 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 10,235,998 as shown below. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following mapping:
Instant Application
Reference Patent 10,235,998
2. An electronic device configured to process audible expressions from users, comprising: 

1. An electronic device configured to process audible expressions from users, comprising:
a network interface; 
a network interface; 

a haptic engine configured to provide kinesthetic communication; 
at least one computing device; and 
at least one computing device; and 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to:
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to:
receive, over a network via the network interface, a digitized human vocal expression of a first user from a first source; 
receive in real time, over a network via the network interface, a digitized human vocal expression of a first user and one or more digital images from a remote device; 
process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain;
process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain, and

to perform at least one of dimensionality reduction or warping of two or more frequencies to a first scale thereby reducing an amount of vocal expression data that needs to be processed;
use the processed digitized human vocal expression to determine characteristics of the human vocal expression by at least: 
determining a power spectrum of the human vocal expression; 












detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to
determine how rapidly the first user is speaking in the human vocal expression; 
use the processed digitized human vocal expression to determine characteristics of the human vocal expression, including: 
determine, using a pitch analysis module, a pitch of the human vocal expression, 
determine, using a volume analysis module a volume of the human vocal expression, 
determine, using a rapidity analysis module how rapidly the first user is speaking in the human vocal expression, 
determine, using a vocal tract analysis module, a magnitude spectrum of the human vocal expression, and 
identify, using a non-speech analysis module, pauses and the length of pauses in speech in the human vocal expression;


(determine, using a rapidity analysis module how rapidly the first user is speaking in the human vocal expression, )
use a natural language module to: 
detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; 

use a natural language module to convert audible speech in the human vocal expression to text and to understand audible speech in the human vocal expression;
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes; 
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user;

process the received one or more images to detect characteristics of the first user face, including detecting if one or more of the following are present: a sagging lip, a crooked smile, uneven eyebrows, facial droop;

compare the detected characteristics of the first user face with baseline, historical characteristics of the first user face accessed from a data store, and identify changes in characteristics of the first user face;
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
weight, using a first weight, a first identified change with respect to a first vocal expression characteristic of the first user;
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a second weight, a second identified change with respect to a second vocal expression characteristic of the first user;
weight, using a third weight, the detected grammar violations; 
weight, using a third weight, a third identified change with respect to a first characteristic of the first user face;

weight, using a fourth weight, a fourth identified change with respect to a second characteristic of the first user face;
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, 
the weighted second identified change with respect to the second vocal expression characteristic of the first user, and 
the weighted third identified change with respect to the detected grammar violations; and 
inferring a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, 
the weighted second identified change with respect to the second vocal expression characteristic of the first user, 

the weighted third identified change with respect to the first characteristic of the first user face, 
the weighted fourth identified change with respect to the second characteristic of the first user face;
based at least in part on the inferred change in health status of the first user, 


based at least in part on the inferred change in health status of the first user determine if a vehicle is to be deployed to the first user; and
at least partly in response to a determination that a vehicle is to be deployed to the first user, enable a vehicle to be deployed to a location of the first user.
cause a first action is to be taken, the first action comprising causing a vehicle to be prevented from being drivable or flyable.
10. The electronic device as defined in claim 8, wherein the electronic device comprises a vehicle, and the first action comprises causing the vehicle to be prevented from being drivable or flyable.
(or combination of claim 1 with the reference cited in 103 for the teaching of the above limitation.)


Claim 3 is taught by claim 2 of the reference patent.
Claim 4 is taught by claim 3 of the reference patent.
Claim 5 is taught by claim 4 of the reference patent.
Claim 6 is taught by claim 5 of the reference patent.
Claim 7 is taught by claim 6 of the reference patent.
Claim 8 is taught by claim 7 of the reference patent.

Instant Application
Reference Patent 10,235,998
9. An electronic device, comprising: 
8. An electronic device, comprising: 
a network interface; 
a network interface; 

a haptic engine configured to provide kinesthetic communication; 
at least one computing device; and
at least one computing device; and 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to:
access a digitized human vocal expression of a first user from a first source converted from a time domain to a frequency domain;
receive, over a network via the network interface, a digitized human vocal expression of a first user; 


convert at least a portion of the digitized human vocal expression to text; 

process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain, and 
to perform at least one of dimensionality reduction or warping of two or more frequencies to a first scale; 
use the converted digitized human vocal expression to determine characteristics of the human vocal expression, by at least:  
use the processed digitized human vocal expression to determine characteristics of the human vocal expression, including: 
detecting quiet time using identified pauses and length of pauses in speech in the human vocal expression, and
(determine pauses and the length of pauses in speech in the human vocal expression, and )





determining how rapidly the first user is speaking in the human vocal expression; 

determine a pitch of the human vocal expression, 
determine a volume of the human vocal expression, 
determine how rapidly the first user is speaking in the human vocal expression, 
determine a magnitude and/or power spectrum of the human vocal expression, 
determine pauses and the length of pauses in speech in the human vocal expression, and 
use natural language processing to: detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;
analyze lexicon usage, syntax, semantics, and/or discourse patterns in speech in the human vocal expression
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user;
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user; 
weight, using a first weight, a first identified change with respect to a first vocal expression characteristic of the first user; 
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
  weight, using a second weight, a second identified change with respect to a second vocal expression characteristic of the first user; 
weight, using a third weight, the detected grammar violations; 


infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted detected grammar violations; and
 inferring a change in health status of the first user based at least in part on 
the weighted first identified change with respect to the first vocal expression characteristic of the first user, 
the weighted second identified change with respect to the second vocal expression characteristic of the first user;
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken.
based at least in part on the inferred change in health status of the first user, determine if a first action is to be taken.


Claim 10 is taught by claim 10 of the reference patent.
Claim 11 is taught by claim 11 of the reference patent.
Claim 12 is taught by claim 12 of the reference patent.
Claim 13 is taught by claim 13 of the reference patent.
Claim 14 is taught by claim 14 of the reference patent.
Claim 15 is taught by claim 15 of the reference patent.
Claim 16 is taught by claim 16 of the reference patent.
Claim 17 is taught by claim 17 of the reference patent.

Claim 18 is an independent method claim with limitations similar to the limitations of Claim 9 and is rejected under similar rationale. 
Claim 19 is a method claim with limitations similar to the limitations of Claim 10 and is rejected under similar rationale. 
Claim 20 is a method claim with limitations similar to the limitations of Claim 13 (or 5) and is rejected under similar rationale. 
Claim 21 is a method claim with limitations similar to the limitations of Claims 16 (or 8) and 17 and is rejected under similar rationale. Claim 21 is also similar to claim 24 of the reference.
Claim 22 is a method claim with limitations similar to the limitations of Claim 17 and is rejected under similar rationale. 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 7 and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 7 is indefinite to for lack of clarity.  Claim 15 has similar language and is rejected under similar rationale.  (While the source of indefiniteness is not the use of “module” in the Claim, note, additionally, that each time “module” is used it is best to remove it or define it in terms of structure or software.)
Claim 7 refers to a “non-speech analysis module.”
Because of lack of clarity of the language, we cannot tell if this a module directly a component of the “An electronic device” of Claim 2 in which case it has to be interpreted under 112(f) or it is a software module and part of the “computer readable memory …. configuring at least one computing device to:” in which case 112(f) is not warranted.
Possible interpretations:
7. The electronic device as defined in claim 2, wherein the instructions further configure the least one computing device to use a non-speech analysis module  to identify pauses in speech in the human vocal expression using both the power spectrum and a magnitude spectrum of the human vocal expression.
7. The electronic device as defined in claim 2, wherein the electronic device further comprises a non-speech analysis module [[is]] configured to identify pauses in speech in the human vocal expression using both the power spectrum and a magnitude spectrum of the human vocal expression.
7. The electronic device as defined in claim 2, wherein the electronic device is configured to use a non-speech analysis module  to identify pauses in speech in the human vocal expression using both the power spectrum and a magnitude spectrum of the human vocal expression.

Claims  2-8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Independent Claim 2 is indefinite due to issues arising from omission of connectors “and” or “or.”  Placement or omission of connectors can change the meaning of the Claim.
As is, Examiner cannot tell which of the limitations are performed by the “natural language module.”
If the list includes only one item, then there is no need for a “:” and if the list includes more than one item use “,” between the members of the list and close the list with an “and” between the last two limitations.  
The remaining Claims depend from Claim 2 and inherit the indefiniteness and do not include language that would alleviate the indefiniteness.

This is Examiner’s interpretation for applying art:
2. An electronic device configured to process audible expressions from users, comprising: 
a network interface; 
at least one computing device; and 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: 
receive, over a network via the network interface, a digitized human vocal expression of a first user from a first source; 
process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain; 
use the processed digitized human vocal expression to determine characteristics of the human vocal expression by at least: 
determining a power spectrum of the human vocal expression[[;]] , and
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression; 
use a natural language module to: detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; 
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes; 
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a third weight, the detected grammar violations; 
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted third identified change with respect to the detected grammar violations; and 
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken, the first action comprising causing a vehicle to be prevented from being drivable or flyable.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Kostic (U.S. 2017/0007167) in view of Srivastava (U.S. 2018/0193652) and further in view of Pakhomov (U.S. 2015/0058013) and Singhal (U.S. 2007/0024454).
Regarding Claim 2, Kostic teaches:
2. An electronic device configured to process audible expressions from users, comprising: 
a network interface; [Kostic, Figure 1, “[0025] In some embodiments, the stroke detection device includes a cellular telephone network transceiver within the housing. The cellular telephone network transceiver is in communication with the controller whereby the controller is able to convey audio signals from a microphone of the stroke detection device to a cellular telephone network.”  “[0028] In some embodiments, the stroke detection device includes a WiFi radio in communication with the controller whereby the controller conveys the first audio signals to a remote computer network using the WiFi radio.…”]
at least one computing device; and [Kostic, Figure 1, the mobile/cell phone is a computer and would include a processor/computing device shown as “Controller 22.”  "[0033] In some embodiments, the main device is one of a tablet computer, a laptop computer, a desktop computer, and a cell phone.”  “[0064] In the embodiment shown in FIG. 1, controller 22 is a conventional microcontroller that is found inside of a conventional cell phone that runs the electronics of the cell phone, …, controller 22 may include any one or more microprocessors, microcontrollers, field programmable gate arrays, systems on a chip, volatile or nonvolatile memory, discrete circuitry, and/or other hardware, software, or firmware …”]
computer readable memory [Kostic, Figure 1, “memory 24.”] including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: 
receive, over a network via the network interface, a digitized human vocal expression of a first user from a first source; [Kostic, Figure 1, “speech 48.”  Figure 3 showing the capture of the voice of the user of the cell phone device.  Figure 8 “speech 48” being captured.  See also Figure 13.  Paragraphs 7-8 and 22-23 regarding speech and facial.  Paragraph 25 teaches the connection of the cell phone to a network.  The result of analysis of the data is sent to “monitoring center 86” and “hospitals 84” and “designated individuals 82” of Figure 1.  In paragraph [0028] the raw data is sent over a WiFi network and then the computer that receives this data over the network can perform the analysis.  So this embodiment teaches the “receive ... over a network" of the Claim. ]
process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain; [Kostic does not teach expressly frequency domain conversion.  The mobile devices shown in Kostic operate on digitized signals and also in the frequency domain.  “[0063] …however, can also be implemented on a cell phone that uses Code Division Multiple Access (CDMA) technology, in which case it is not necessary to include a SIM card 42. …” ] 
use the processed digitized human vocal expression to determine characteristics of the human vocal expression by at least: [Kostic."[0028] In some embodiments, the stroke detection device includes a WiFi radio in communication with the controller whereby the controller conveys the first audio signals to a remote computer network using the WiFi radio. The baseline characteristic includes at least one of the following: speed, volume, pitch, emphasis, and pronunciation.”]
determining a power spectrum of the human vocal expression;
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression; 
use a natural language module to: detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; [Kostic, during obtaining the baseline voice sample:  “[0085] … Using conventional speech-to-text software that is included as part of stroke detection app 44, controller 22 displays the text corresponding to the users phrase or greeting and asks the user to confirm and/or correct the text….” And then during the comparison:  “[0087] … Controller 22 determines the words and/or phrases being spoken by the user in the current sound samples and compares those words and/or phrases to baseline samples of the same words and/or phrases. …”  “[0088] … Controller 22 uses speech-to-text technology to recognize the words spoken by the user and store the sound samples in memory 24 according to the spoken words and/or phrases….”  Also the Controller has to conduct speech recognition (convert to text) to find if the speaker is dropping words which requires NLU/understanding:  “[0086] …. Such characteristics include speed, volume, pitch, emphasis, and pronunciation (including the dropping of syllables or whole words in phrases), ...”]
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes; [Kostic, the point of Kostic is comparing the current conditions of the person to a baseline, i.e. previously collected data, to see if there has been a change due to illness/stroke:  "[0022] The test may include the user speaking a phrase into a microphone coupled to the controller wherein the controller compares a current sound sample generated from the user speaking the phrase into the microphone to a past sound sample of the user speaking the phrase. The controller detects if the user omits a word, or a portion of a word, in the phrase in the current sound sample and/or the controller detects if the user slurs one or more words in the phrase.”  “[0087] … That is, controller 22 gathers sound samples when the user is speaking during a phone call (or video call), as well as when the user is using the voice command or dictation features of the cell phone (if the cell phone is so equipped). Controller 22 determines the words and/or phrases being spoken by the user in the current sound samples and compares those words and/or phrases to baseline samples of the same words and/or phrases. Specifically, controller 22 compares one or more of the speed, volume, pitch, emphasis, and pronunciation of the current sound samples with those of the baseline samples….”]
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a third weight, the detected grammar violations; 
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted third identified change with respect to the detected grammar violations; and [Kostic, the point of Kostic is comparing the current conditions of the person to a baseline, i.e. previously collected data, to see if there has been a change due to illness/stroke and infers the stroke.]
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken, [Kostic takes action by reporting the location of the user to the hospital and is able to summon an ambulance/vehicle to that location: See “[0112] In at least one embodiment, the automatic communication of stroke detection device 20 with any of individuals 82, hospital 84 and/or monitoring center 86 also includes information about the current location of the user of stroke detection device 20. …. By automatically forwarding the current location of the user to one or more recipients (e.g. individual 82, hospital 84, and/or monitoring center 86), the recipient is able to summon an ambulance, or other rescue personnel, to the user should the condition of the user warrant such a step.”]
the first action comprising causing a vehicle to be prevented from being drivable or flyable. 

Kostic teaches all of the limitations that generate the framework of the Claim.
Kostic does not expressly teach the determination of a power spectrum. 
Kostic does not teach detection of pauses although speech pattern detection would include the pattern of pauses.
Kostic does not teach the use of weights.
Kostic does not teach inactivating the vehicle.

Srivastava teaches:
2. An electronic device configured to process audible expressions from users, comprising: 
a network interface; [Srivastava, Figure 6, “network interface device 620.”]
at least one computing device; and [Srivastava, Figure 2 or Figure 3 or Figure 6 devices 200, 300, 600.] 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: [Srivastava, Figure 2 or Figure 3, memory 230.  Figure 6, “main memory 604” including “instructions 624.”]
receive, over a network via the network interface, a digitized human vocal expression of a first user from a first source; [Srivastava, Figure 5, “sense information corresponding to patient emotional reaction to pain 510” includes receiving voice of the patient from “Voice recorder 424” of Figure 4.  See also [0108] and [0114] about receiving over a network.]
process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain; or [Srivastava, Figure 2, “signal metrics generation 221” which includes a “speech processor 223.”  [0061] and [0062] teaches transformation of image data and speech data to frequency domain.  “[0062] …The transformation includes mathematically transforming the speech data (e.g., digital speech signal) into representations in a specific temporal or frequency domain to facilitate feature extraction or recognition….”  “[0051] … The sensor circuit 210 may include sense amplifier circuit that may pre-process the sensed signals, including, for example, amplification, digitization, filtering, or other signal conditioning operations….”]
use the processed digitized human vocal expression to determine characteristics of the human vocal expression by at least: [Srivastava, Figure 2, “speech processor 223” and Figure 3, “signal metrics generator 221.”]
determining a power spectrum of the human vocal expression; [Srivastava.  “[0062] …The transformation includes mathematically transforming the speech data (e.g., digital speech signal) into representations in a specific temporal or frequency domain to facilitate feature extraction or recognition….”]
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression; [Srivastava teaches that how rapidly the patient says certain words indicates his physical condition:  “[0063] The vocal expression metric may include speech motor control features corresponding to production of voice and speech, and speech content-based features based on contents of patient speech regarding intensity, duration, or pattern of pain sensation. Examples of the speech motor control features may include speed, volume, pitch, inclination, regularity, and degree of coordination during speech. In an example, the vocal expression metrics may be measured during a supervised session when the patient rapidly pronounces specific syllables or words, an activity that requires fine coordinated movement of jaw, lips, and anterior and posterior tongue. Speech motor slowness, such as slower syllable pronunciation or an increased variability of accuracy in syllable pronunciation, may be correlative to intensity or duration of pain.”]
use a natural language module to: detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; [Srivastava in the above [0063] teaches that “speech content-based features based on contents of patient speech” are also used in determining impairment. But this is not looking for grammatical mistakes due to the disease.]
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;  [Srivastava teaches that the change in speech characteristics indicate and are a measure of pain being experienced by the patient:  “[0062] The speech processor 223 may be configured to analyze the recorded voice or speech, and generate a vocal expression metric from the recorded voice or speech. … Chronic pain can directly or indirectly result in abnormality in speech motor control….”  “[0063] The vocal expression metric may include speech motor control features corresponding to production of voice and speech, and speech content-based features based on contents of patient speech regarding intensity, duration, or pattern of pain sensation. Examples of the speech motor control features may include speed, volume, pitch, inclination, regularity, and degree of coordination during speech. … Speech motor slowness, such as slower syllable pronunciation or an increased variability of accuracy in syllable pronunciation, may be correlative to intensity or duration of pain.”]  
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user; [Srivastava, Figure 2, “weight factors” in memory 230 and Figure 3, “weight generator 322.”  [0070] … In an example, as illustrated in FIG. 2, the memory 230 may store weight factors, which may be used by the pain score generator 225 to generate the pain score. The weight factors may be provided by a system user, or alternatively be automatically determined or adjusted such as based on the corresponding signal metrics' reliability in representing an intensity of the pain. Examples of the automatic weight factor generation are discussed below, such as with reference to FIG. 3.”]
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; [Srivastava, as paragraphs [0065] and [0066] below indicate, a composite pain score is computed which is a weighted combination of signal metrics that pertain to vocal or facial features.  There are more than one vocal feature listed in [0063], for example, and therefore, there will be at least a first weight and a second weight and more.  “[0020] In Example 12, the subject matter of any one or more of Examples 1-11 optionally include the pain analyzer circuit that may be further configured to generate the pain score using a combination of a plurality of the signal metrics each weighted by their respective weight factor.”  “[0028] … The one or more signal metrics may include a plurality of speech features generated from the sensed speech signal.”]
weight, using a third weight, the detected grammar violations; [Srivastava does not teach that grammatical errors are a factor.  But it teaches weighting and combining vocal, vibrational, and facial features.   “[0028] In Example 20, the subject matter of any one or more of Examples 16-19 optionally includes the information corresponding to the patient emotional reaction to pain that may include a speech signal of the patient. The one or more signal metrics may include a plurality of speech features generated from the sensed speech signal.”  “[0026] … The one or more signal metrics may include a plurality of image features of a facial landmark generated from the sensed facial image.”  See also [0030].]
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted third identified change with respect to the detected grammar violations; and  [Srivastava, the “health status” of the Claim is taught by the “pain score” of Srivastava which is obtained as a weighted combination/composite score of facial expression and vocal expression metrics.  “… A pain analyzer circuit may generate a pain score using signal metrics of facial or vocal expression extracted from the sensed information. …”  Abstract.  “[0065] The pain score generator 225 may generate a pain score using the measurements of the signal metrics generated by the signal metrics generator 221. The pain score can be represented as a numerical or categorical value that quantifies the patient's overall pain symptom. In an example, a composite pain score may be generated using a combination of a plurality of facial expression metrics, a combination of a plurality of vocal expression metrics, or a combination of at least one facial expression metric and at least one vocal expression metric. In some examples, the pain score generator 225 may use one or more signals metrics generated from a physiological or functional signal, in addition to the facial or vocal expression metrics, to generate the pain score. The signal metrics may be weighted by their respective weight factors before being combined. The combination can be linear or nonlinear. The pain score generator 225 may compare the composite signal metric to one or more threshold values or range values, and assign a corresponding pain score (such as numerical values from 0 to 10) based on the comparison.”  “[0066] In another example, the pain score generator 225 may compare the signal metrics to their respective threshold values or range values, assign corresponding signal metric-specific pain score based on the comparison, and compute a composite pain score using a linear or nonlinear fusion of the signal metric-specific pain scores weighted by their respective weight factors….”]
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken, the first action comprising causing a vehicle to be prevented from being drivable or flyable. [Srivastava takes action to soothe the pain:  “[0071] …The mobile App may enable a patient to provide self-reported pain episode and quantified pain scales. In an example, the input circuit 241 may enable a user to confirm, reject, or edit the programming of the therapy unit 250, such as parameters for electrostimulation, as to be discussed in the following.”  “[0073] The therapy circuit 250 may be configured to deliver a therapy to the patient in response to the pain score…..”  “[0074] The therapy circuit 250 may additionally or alternatively include a drug delivery system ….”]
Kostic and Srivastava pertain to detecting change in the physiological characteristics manifested by a person as an indicator of some type of stress (a stroke in the case of Kostic or pain in Srivastava).  It would have been obvious to combine the weighted combination of vocal and facial features of Srivastava with Kostic as a method of arriving at an integrated measure that includes the various speech related changes that occur due to stress/pain/illness and assigns each feature/factor a weight according to its perceived importance.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

The above references do not expressly teach the use of pause duration/length as an indicator.
Pakhomov expressly teaches:
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression; [Pakhomov uses the duration of the pauses as a measure of fluency, “[0027] Silence detector 6 sends output 16 to analysis module 12. Analysis module 12 measures fluency of the patient's speech based on output 16 received from silence detector 6. In one example, analysis module 12 measures pause-related information, such as the number of pauses, and the duration of each pause….”  Fluency is related to health status of a person and stress brought about by pain or disease.]
Kostic and Srivastava and Pakhomov include and use speech processing.  It would have been obvious to combine the pause duration measurement of Pakhomov that is considered a measure of fluency with the system of the combination that determines the speech patterns and pauses as an indication of an illness as another added indicator because fluency is affected by illness, pain, and stress.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Kostic in Figure 13 teaches an embodiment that is implemented in a car and “11. The stroke detection device of claim 9 wherein the main device is one of a tablet computer, a laptop computer, a desktop computer, a phone, a camera, a wearable device, and a vehicle.” But does not teach disabling the car when a stroke is detected. 
Singhal teaches:
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken, the first action comprising causing a vehicle to be prevented from being drivable or flyable. [Singhal is directed to a “mental impairment detection determination using reaction time test" which denies ignition to the car if the driver is considered to be impaired. The test of impairment includes an interactive voice response (Figure 1B, 54) that asks questions from the user and asks him to respond to the question or repeat a phrase and measures the response time and compares it to a sober response time of the same user. "[0062] The stimulus is in the form of a simple question that requires a verbal response or a motor response that can be detected and measured. The simple question may be what is your name or what is 2 plus 4, or blow the horn two times. The response is a speech or sound that can be picked up by a microphone and with the help of prior art speech recognition and processing technology be able to measure the response for timeliness and or accuracy. Prior art speech processing technologies and devices provide the ability to be able to receive process and precisely measure speech and sound responses.” Figure 5 and “[0126] At Step 116, Ground station: Send ignition close loop command to vehicle if reaction time is within limits. [0127] At Step 118, Vehicle: Receive Vehicle Ignition command and complete auto ignition.”]
Kostic, Srivastava, and Pakhomov and Singhal are directed to analyzing human speech to compare to a previous value and obtain the change in the physiologic condition of the person. Singhal uses the obtained result to disable a car that the user is driving and it would have been obvious to combine the use and application of Singhal with the method of the combination to use the results of the analysis portion of the Combination for the purpose of disabling a car instead of or in addition to calling an ambulance. This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 7, Kostic does not teach the detection of pauses expressly.  
Pakhomov teaches:
7. The electronic device as defined in claim 2, wherein the electronic device is configured to non-speech analysis module is configured to identify pauses in speech in the human vocal expression using both the power spectrum and a magnitude spectrum of the human vocal expression.  [Pakhomov, Figure 1, “non-speech sound detector 10” measures and finds the pauses in the speech by a silence detector which works based on pitch found from the power spectrum.  “… An example method includes classifying, by a computing device, samples of audio data of speech of a person, based on amplitudes of the samples, into a first class of samples including speech or sound and a second class of samples including silence. …”  Abstract.  “[005] … The system then counts, for example, the number of instances of contiguous silence, i.e., the length and number of pauses in speech. “  “ [0047] For example, silence detector 6 may use pitch estimation based on autocorrelation of the power spectrum generated with a Fast Fourier Transform (FFT) algorithm. In this and other examples, silence detector 6 may implement other pitch tracking techniques known in the art….”]
Kostic, Srivastava, and Pakhomov are directed to analyzing human speech for obtaining characteristics of the speech.  It would have been obvious to combine Pakhomov with the combination because it teaches that pause/silence detection relates to pitch detection and pitch detection uses the power spectrum of the signal and thus correlates pause detection to the power spectrum as one method of detecting pauses in the speech that could have been used by the combination.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.


Claims 9, 15, 18, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Kostic in view of Srivastava and Pakhomov.
Regarding Claim 9, Kostic teaches:
9. An electronic device, comprising: 
a network interface; [Kostic, Figure 1, “[0025] In some embodiments, the stroke detection device includes a cellular telephone network transceiver within the housing….”]
at least one computing device; and Kostic, Figure 1, the mobile/cell phone is a computer and would include a processor/computing device shown as “Controller 22.” See [0033].]
computer readable memory [Kostic, Figure 1, “memory 24.”]  including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to:  
access a digitized human vocal expression of a first user from a first source converted from a time domain to a frequency domain; [Kostic, Figure 1, “speech 48.”  Figure 3 showing the capture of the voice of the user of the cell phone device.  Figure 8 “speech 48” being captured.  See also Figure 13.  Paragraph 25 teaches the connection of the cell phone to a network.  The mobile devices shown in Kostic operate on digitized signals and also in the frequency domain.  “[0063] …however, can also be implemented on a cell phone that uses Code Division Multiple Access (CDMA) technology, in which case it is not necessary to include a SIM card 42. …” ] 
use the converted digitized human vocal expression to determine characteristics of the human vocal expression, by at least: [Kostic."[0028] In some embodiments, the stroke detection device includes a WiFi radio in communication with the controller whereby the controller conveys the first audio signals to a remote computer network using the WiFi radio. The baseline characteristic includes at least one of the following: speed, volume, pitch, emphasis, and pronunciation.”]
detecting quiet time using identified pauses and length of pauses in speech in the human vocal expression, and 
determining how rapidly the first user is speaking in the human vocal expression; 
use natural language processing to: detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; [Kostic, during obtaining the baseline voice sample converts it to text and determines if the speaker is dropping words and thus violating the grammar rules:  “[0085] … Using conventional speech-to-text software that is included as part of stroke detection app 44, ….” And then during the comparison:  “[0087] … Controller 22 determines the words and/or phrases being spoken by the user in the current sound samples and compares those words and/or phrases to baseline samples of the same words and/or phrases. …”  Determing if the speaker is dropping words requires NLU/understanding:  “[0086] …. Such characteristics include speed, volume, pitch, emphasis, and pronunciation (including the dropping of syllables or whole words in phrases), ...”  See also [0022] for omitting words.
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes; [The point of Kostic is comparing the current conditions of the person to a baseline, i.e. previously collected data, to see if there has been a change due to illness/stroke:  "[0022] The test may include the user speaking a phrase into a microphone coupled to the controller wherein the controller compares a current sound sample generated from the user speaking the phrase into the microphone to a past sound sample of the user speaking the phrase. The controller detects if the user omits a word, or a portion of a word, in the phrase in the current sound sample and/or the controller detects if the user slurs one or more words in the phrase.”  “[0087] … Controller 22 determines the words and/or phrases being spoken by the user in the current sound samples and compares those words and/or phrases to baseline samples of the same words and/or phrases. Specifically, controller 22 compares one or more of the speed, volume, pitch, emphasis, and pronunciation of the current sound samples with those of the baseline samples….”]
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user; 
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a third weight, the detected grammar violations; 
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted detected grammar violations; and [Kostic, the point of Kostic is comparing the current conditions of the person to a baseline, i.e. previously collected data, to see if there has been a change due to illness/stroke and infers the stroke.]
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken. [Kostic takes action by reporting the location of the user to the hospital and is able to summon an ambulance/vehicle to that location: See “[0112] In at least one embodiment, the automatic communication of stroke detection device 20 with any of individuals 82, hospital 84 and/or monitoring center 86 also includes information about the current location of the user of stroke detection device 20. …. By automatically forwarding the current location of the user to one or more recipients (e.g. individual 82, hospital 84, and/or monitoring center 86), the recipient is able to summon an ambulance, or other rescue personnel, to the user should the condition of the user warrant such a step.”]

Kostic does not expressly teach the determination of a power spectrum. 
Kostic does not teach detection of pauses although speech pattern detection would include the pattern of pauses.
Kostic does not teach the use of weights.

Srivastava teaches:
9. An electronic device, comprising: 
a network interface; [Srivastava, Figure 6, “network interface device 620.”]
at least one computing device; and [Srivastava, Figure 2 or Figure 3 or Figure 6 devices 200, 300, 600.] 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: [Srivastava, Figure 2 or Figure 3, memory 230.  Figure 6, “main memory 604” including “instructions 624.”]
access a digitized human vocal expression of a first user from a first source converted from a time domain to a frequency domain;  [Srivastava, Figure 5, “sense information corresponding to patient emotional reaction to pain 510” includes receiving voice of the patient from “Voice recorder 424” of Figure 4.  See also [0108] and [0114] about receiving over a network.  Srivastava, Figure 2, “signal metrics generation 221” which includes a “speech processor 223.”  [0061] and [0062] teaches transformation of image data and speech data to frequency domain.  “[0062] …The transformation includes mathematically transforming the speech data (e.g., digital speech signal) into representations in a specific temporal or frequency domain to facilitate feature extraction or recognition….”  “[0051] … The sensor circuit 210 may include sense amplifier circuit that may pre-process the sensed signals, including, for example, amplification, digitization, filtering, or other signal conditioning operations….”  [Srivastava.  “[0062] …The transformation includes mathematically transforming the speech data (e.g., digital speech signal) into representations in a specific temporal or frequency domain to facilitate feature extraction or recognition….”]
use the converted digitized human vocal expression to determine characteristics of the human vocal expression, by at least: [Srivastava, Figure 2, “speech processor 223” and Figure 3, “signal metrics generator 221.”]
detecting quiet time using identified pauses and length of pauses in speech in the human vocal expression, and [Does not mention pause expressly.  Pause is implied from the features taught and provided below.]
determining how rapidly the first user is speaking in the human vocal expression;  [Srivastava teaches that how rapidly the patient says certain words indicates his physical condition:  “[0063] The vocal expression metric may include speech motor control features corresponding to production of voice and speech, and speech content-based features based on contents of patient speech regarding intensity, duration, or pattern of pain sensation. Examples of the speech motor control features may include speed, volume, pitch, inclination, regularity, and degree of coordination during speech. In an example, the vocal expression metrics may be measured during a supervised session when the patient rapidly pronounces specific syllables or words, an activity that requires fine coordinated movement of jaw, lips, and anterior and posterior tongue. Speech motor slowness, such as slower syllable pronunciation or an increased variability of accuracy in syllable pronunciation, may be correlative to intensity or duration of pain.”]
use natural language processing to: detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; [Srivastava in the above [0063] teaches that “speech content-based features based on contents of patient speech” are also used in determining impairment. But this is not looking for grammatical mistakes due to the disease.]
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;  [Srivastava teaches that the change in speech characteristics indicate and are a measure of pain being experienced by the patient:  “[0062] The speech processor 223 may be configured to analyze the recorded voice or speech, and generate a vocal expression metric from the recorded voice or speech. … Chronic pain can directly or indirectly result in abnormality in speech motor control….”  “[0063] The vocal expression metric may include speech motor control features corresponding to production of voice and speech, and speech content-based features based on contents of patient speech regarding intensity, duration, or pattern of pain sensation. Examples of the speech motor control features may include speed, volume, pitch, inclination, regularity, and degree of coordination during speech. … Speech motor slowness, such as slower syllable pronunciation or an increased variability of accuracy in syllable pronunciation, may be correlative to intensity or duration of pain.”]  
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user; [Srivastava, Figure 2, “weight factors” in memory 230 and Figure 3, “weight generator 322.”  [0070] … In an example, as illustrated in FIG. 2, the memory 230 may store weight factors, which may be used by the pain score generator 225 to generate the pain score. The weight factors may be provided by a system user, or alternatively be automatically determined or adjusted such as based on the corresponding signal metrics' reliability in representing an intensity of the pain. Examples of the automatic weight factor generation are discussed below, such as with reference to FIG. 3.”]
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; [Srivastava, as paragraphs [0065] and [0066] below indicate, a composite pain score is computed which is a weighted combination of signal metrics that pertain to vocal or facial features.  There are more than one vocal feature listed in [0063], for example, and therefore, there will be at least a first weight and a second weight and more.  “[0020] In Example 12, the subject matter of any one or more of Examples 1-11 optionally include the pain analyzer circuit that may be further configured to generate the pain score using a combination of a plurality of the signal metrics each weighted by their respective weight factor.”  “[0028] … The one or more signal metrics may include a plurality of speech features generated from the sensed speech signal.”]
weight, using a third weight, the detected grammar violations; [Srivastava does not teach that grammatical errors are a factor.  But it teaches weighting and combining vocal, vibrational, and facial features.   “[0028] In Example 20, the subject matter of any one or more of Examples 16-19 optionally includes the information corresponding to the patient emotional reaction to pain that may include a speech signal of the patient. The one or more signal metrics may include a plurality of speech features generated from the sensed speech signal.”  “[0026] … The one or more signal metrics may include a plurality of image features of a facial landmark generated from the sensed facial image.”  See also [0030].]
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted detected grammar violations; and  [Srivastava, the “health status” of the Claim is taught by the “pain score” of Srivastava which is obtained as a weighted combination/composite score of facial expression and vocal expression metrics.  “… A pain analyzer circuit may generate a pain score using signal metrics of facial or vocal expression extracted from the sensed information. …”  Abstract.  “[0065] The pain score generator 225 may generate a pain score using the measurements of the signal metrics generated by the signal metrics generator 221. The pain score can be represented as a numerical or categorical value that quantifies the patient's overall pain symptom. In an example, a composite pain score may be generated using a combination of a plurality of facial expression metrics, a combination of a plurality of vocal expression metrics, or a combination of at least one facial expression metric and at least one vocal expression metric. In some examples, the pain score generator 225 may use one or more signals metrics generated from a physiological or functional signal, in addition to the facial or vocal expression metrics, to generate the pain score. The signal metrics may be weighted by their respective weight factors before being combined. The combination can be linear or nonlinear. The pain score generator 225 may compare the composite signal metric to one or more threshold values or range values, and assign a corresponding pain score (such as numerical values from 0 to 10) based on the comparison.”  “[0066] In another example, the pain score generator 225 may compare the signal metrics to their respective threshold values or range values, assign corresponding signal metric-specific pain score based on the comparison, and compute a composite pain score using a linear or nonlinear fusion of the signal metric-specific pain scores weighted by their respective weight factors….”]
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken. [Srivastava takes action to soothe the pain:  “[0071] …The mobile App may enable a patient to provide self-reported pain episode and quantified pain scales. In an example, the input circuit 241 may enable a user to confirm, reject, or edit the programming of the therapy unit 250, such as parameters for electrostimulation, as to be discussed in the following.”  “[0073] The therapy circuit 250 may be configured to deliver a therapy to the patient in response to the pain score…..”  “[0074] The therapy circuit 250 may additionally or alternatively include a drug delivery system ….”]
Kostic and Srivastava pertain to detecting change in the physiological characteristics manifested by a person as an indicator of some type of stress (a stroke in the case of Kostic or pain in Srivastava).  It would have been obvious to combine the weighted combination of vocal and facial features of Srivastava with Kostic as a method of arriving at an integrated measure that i---ncludes the various speech related changes that occur due to stress/pain/illness and assigns each feature/factor a weight according to its perceived importance.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

The above references do not expressly teach the use of pause duration/length as an indicator.
Pakhomov expressly teaches:
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression; [Pakhomov uses the duration of the pauses as a measure of fluency, “[0027] Silence detector 6 sends output 16 to analysis module 12. Analysis module 12 measures fluency of the patient's speech based on output 16 received from silence detector 6. In one example, analysis module 12 measures pause-related information, such as the number of pauses, and the duration of each pause….”  Fluency is related to health status of a person and stress brought about by pain or disease.]
Kostic and Srivastava and Pakhomov include and use speech processing.  It would have been obvious to combine the pause duration measurement of Pakhomov that is considered a measure of fluency with the system of the combination that determines the speech patterns and pauses as an indication of an illness as another added indicator because fluency is affected by illness, pain, and stress.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 15 includes limitations similar to the limitations of Claim 7 and is rejected under similar rationale. The combination rationale remains similar only Singhal is no longer required.

Claim 18 is an independent method claim with limitations similar to the limitations of Claim 9 and is rejected under similar rationale. 

Regarding Claim 21, Kostic teaches:
21. The computer implemented method as defined in claim 18,
processing one or more images of the first user to detect occlusion of eyes of the first user by eyelids of the first user; and Kostic, Figure 1, “facial features 46” and Figures 2A and 2B showing the capture of image of the face of the user.  Kostic looks for facial asymmetry that was not previously present in the face of the user, that includes all of the list of the Claim.  “[0023] …The controller compares the current image of the user to a past image of the user and detects if any facial droop is present in the current image of the user. ...”  See [0075] for mouth and lips, [0076] for eyes, and [0077] for any abnormal asymmetry.]
determining whether an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state, [Kostic compares 3 features with their baseline values: speech, image, gait and if something is off it takes action.  “[0072] …. Once these baseline samples are collected, stroke detection app 44 stores them in one or more files in memory 24 and compares them with subsequently gathered images, speech samples, and/or gait samples that are gathered by camera 26, ….”]
wherein the first action is caused to be taken based in part on the determination of whether an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state. [Kostic teaches issuing a notification to the user if stroke is indicated:  “[0072] … If the comparison of any of these subsequently gathered samples with their corresponding baseline samples differs by more than corresponding predefined thresholds, then stroke detection device 20 determines that a stroke may have taken place and issues a notification to the user.”]

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Kostic and Srivastava and Pakhomov and Singhal in view of Ireton (U.S. 6,047,254).
Regarding Claim 3, Kostic, Srivastava, and Pakhomov, all teach determination of pitch and Claim 3 includes just a definition of pitch for human voice. Human voice is quasiperiodic and pitch is the period or the inverse of period depending on whether it is pitch period or pitch frequency.  However, a reference is added that provides a definition for pitch.
Ireton teaches:
3. The electronic device as defined in claim 2, wherein the electronic device is configured to estimate a quasiperiodic signal period of the human vocal expression and determine a pitch using the estimated quasiperiodic signal period and use the determined pitch in inferring the change in health status of the first user.  [Ireton includes “pitch estimation” by an autocorrelation method and includes a description of speech as quasiperiodic.  “Speech sounds can generally be classified into three distinct classes according to their mode of excitation. Voiced sounds are sounds produced by vibration or oscillation of the human vocal chords, thereby producing quasi-periodic pulses of air which excite the vocal tract. Unvoiced sounds are generated by forming a constriction at some point in the vocal tract, typically near the end of the vocal tract at the mouth, and forcing air through the constriction at a sufficient velocity to produce turbulence. This creates a broad spectrum noise source which excites the vocal tract. Plosive sounds result from creating pressure behind a closure in the vocal tract, typically at the mouth, and then abruptly releasing the air.”  “As mentioned above, synthesized speech from the speech production model is approximately periodic over short time-intervals with period equal to the pitch period. For any periodic signal, it is a well known fact that the autocorrelation function achieves an absolute maximum value at time delays equal to the fundamental period and its integer multiples. These facts motivate the use of autocorrelation to detect the pitch period of natural speech. Due to the locally periodic nature of speech, a high value for the correlation function will register at multiples of the pitch period, i.e. at 2, 3, 4, and 5 times the pitch period, producing multiple peaks in the correlation. Ostensibly, the problem of pitch period detection is one of identifying a series of large amplitude correlation peaks which have this regular time-delay structure. Namely, the large amplitude peaks must line up with time-delays that are 2, 3, 4, and 5 times some fundamental time-delay. The pitch period is then equal to this fundamental time-delay.”]
Kostic, Srivastava, Pakhomov, Singhal and Ireton are directed to analyzing human speech for obtaining characteristics of the speech that are later used in some other application.  It would have been obvious to combine Ireton with the combination because it provides a more detailed explanation of pitch estimation which is left out of the more recent references that are mostly directed to the use of the estimated pitch for another purpose and leave out the well-known fundamentals.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Kostic and Srivastava and Pakhomov and in view of Ireton.
Claim 11 includes limitations similar to the limitations of Claim 3 and is rejected under similar rationale. Singhal which was cited for the last limitation of Claim 2 is not necessary because Claim 11 depends from Claim 9 which does not require inactivating the vehicle.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, Pakhamov, and Singhal in view of Hanai (U.S. 2011/0091050).
Regarding Claim 4, Kostic, Srivastava, and Pakhomov all teach determination of pitch and Claim 4 includes cepstral pitch determination which is one way of determining pitch for teaching which another reference is provided.  
Hanai teaches:
4. The electronic device as defined in claim 2, wherein the electronic device is configured to determine a cepstrum pitch using an inverse Fourier transform (IFT) of a logarithm of an estimated spectrum of a human vocal expression signal and use the determined pitch in inferring the change in health status of the first user.   [Hanai, “[0063] In the pitch detection unit 51 of the characteristic calculation unit 22 in FIG. 6, IFFT operation is performed on the logarithm of the power spectrum to convert the power spectrum into a cepstrum in FIG. 7A. The highest peak P is detected in the range of frequencies at which the sound pitch of the cepstrum can exist, the range being indicated by the frame of a solid line in FIG. 7A, and frequency fP of the peak P is adopted as a candidate for a sound pitch. Then, the ratio between the candidate for the sound pitch and the zero order cepstrum is obtained. In the example in FIGS. 7A to 7C, the ratio is equal to or more than the threshold and frequency fP, which is a candidate for a pitch, is adopted as the sound pitch.”]
Kostic, Wasserblat, Srivastava, Pakhomov and Hanai are directed to analyzing human speech for obtaining characteristics of the speech that are later used in some other application.  It would have been obvious to combine Hanai with the combination because it provides a more detailed explanation of pitch estimation by the cepstrum method which is left out of the more recent references that are mostly directed to the use of the estimated pitch for another purpose and leave out the well-known fundamentals.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, and Pakhamov, in view of Hanai.
Claim 12 includes limitations similar to the limitations of Claim 4 and is rejected under similar rationale. Singhal which was cited for the last limitation of Claim 2 is not necessary because Claim 12 depends from Claim 9 which does not require inactivating the vehicle.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, Pakhomov, and Singhal in view of Moriya (U.S. 2018/0090155).
Regarding Claim 5, Kostic, Srivastava, and Pakhomov teach the use of volume as a characteristic of speech.  Claim 5 just provides a definition for Volume.  Volume, loudness, amplitude and power or energy of the sound wave are the same or are closely related.
Moriya teaches:
5. The electronic device as defined in claim 2, wherein the electronic device is configured is configured to determine a volume of the human vocal expression based at least in part on peak heights in the power spectrum of the human vocal expression and use the determined volume in inferring the change in health status of the first user. [Moriya, Figure 5 showing the conversion of the input audio signal to the frequency spectrum.  Thus the mention to “time-series signal” just means the sound signal that has been converted to the frequency domain and its power spectrum is being used in the further analyses.   “[0033] The frequency domain conversion unit 41 converts an audio signal in the time domain, which is the input time-series signal of the predetermined time length, into an MDCT coefficient sequence X(0), X(1), . . . , X(N-1) at point N in the frequency domain in the unit of frame of the predetermined time length. N is a positive integer.”  "[0144] …. If, for example, an average amplitude (the square root of average energy per sample) is used as the index indicating the loudness of a sound of a time-series signal, CE=the maximum amplitude value*( 1/128) holds. For instance, since the maximum amplitude value is 32768 in the case of 16-bit accuracy, CE=256 holds.”]
Kostic, Srivastava, Pakhomov, and Singhal are directed to analyzing human speech for obtaining characteristics of the speech and Moriya includes a judging unit for comparing signals.  It would have been obvious to combine Moriya with the combination because it provides a more detailed explanation of volume or loudness of a sound which is left out of the more recent references that are mostly directed to the use of the estimated pitch for another purpose and leave out the well-known fundamentals.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Claims 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, Pakhomov in view of Moriya.
Claim 13 includes limitations similar to the limitations of Claim 5 and is rejected under similar rationale. 
Claim 20 is a method claim with limitations similar to the limitations of Claim 13 (or 5) and is rejected under similar rationale.
These Claims depend from 9 and 18 which do not include the vehicle inactivation feature of Claim 2 for which Singhal was used.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, Pakhomov, and Singhal in view of Amir (U.S. 2002/0116188).
Regarding Claim 6, Kostic and Srivastava and Singhal mention the speed of speech as a parameter to consider but do not define it.
Amir teaches:
Regarding Claim 6, Kostic teaches:
6. The electronic device as defined in claim 2, wherein the electronic device is configured to determine as to how rapidly the first user is speaking based at least in part on a determination of how many words are spoken by the first user over a first period of time.  [Amir, “[0027] Moving to block 32, … In another embodiment, both speech speed and typing speed are measured, and the speech speed is adapted accordingly. The audio playback rate can be set so that the speech rate is equal to the typing speed, in one embodiment. Speech speed can be measured by counting the number of phonemes per unit time or by counting spoken words per unit time (either using phoneme recognition, phoneme segmentation, speech recognition, or by detecting and counting pauses between words per unit time).”]
Kostic, Srivastava, Pakhomov, and Singhal are directed to analyzing human speech for obtaining characteristics of the speech and Amir is directed to applications that require detecting the speed of speech.  It would have been obvious to modify the combination with Amir that provides a method for measuring the speed and rate of speech which is left out of the other references as one method of implementing a feature that is taught by the combination.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, Pakhomov in view of Amir (U.S. 2002/0116188).
Claim 14 includes limitations similar to the limitations of Claim 6 and is rejected under similar rationale. 
This Claim depends from Claim 9 and therefore does not require addition of Singhal which was cited for the last limitation of Claim 2 regarding inactivating the vehicle if the driver is impaired.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, Pakhomov, and Singhal in view of Kusens (U.S. 2017/0195637).
Kostic indirectly teaches the limitation regarding measuring the occlusion of the eyes.   Srivastava looks at closed eyes as a metric of pain.
Another very express reference is added.
Regarding Claim 8, Kusens teaches:
8. The electronic device as defined in claim 2, wherein the electronic device is configured to determine if an occlusion of eyes of the first user by eyelids of the first user [Kusens, Figures 3A and 3B.  The eyes of the patient are monitored and the change in the location of reference points around the eyes and other points around the eyes is monitored and is an indication of a stroke event.  "[0040] … computerized monitoring system 130 may digitally superimpose over the face an x-y plane, such as y-axis 340 and x-axis 350. …. Reference points may be assigned to distinctive features of the face. For example, in FIG. 3A, there are reference points 320 around the eyes, and reference points 330 around the mouth of the person 120 being monitored.”  “[0041] … A timer may be employed to evaluate whether the asymmetry or change in position of the reference points persists for a minimum amount of time, which could help distinguish asymmetric facial gestures, like an eyebrow raised in skepticism, from stroke symptoms.”   “[0042] In some embodiments, a certain degree of change in the symmetry or relative symmetry about the x-, y-, and/or z-axes, such as a change in distance of at least 10%, or at least 3 mm, may be required before issuing an alert. ….” ]
indicates an adverse health state. [Kusens, Change in the location of the reference points around the eyes includes occlusion of the eye by an eyelid.  And any kind of pronounced and increased asymmetry is taken as an indicator of a stroke.  “[0043] On detecting facial features and/or a change in facial features consistent with a stroke symptom, computerized monitoring system 130 may communicate the detected stroke symptom to computerized communication system 140. Computerized communication system 140 may be configured to send an alert of the stroke symptom to one or more designated recipients….”]
…
Kostic, Srivastava, Pakhomov, Singhal and Kusens are directed to analyzing human face or speech to compare to a previous value and obtain the change in the physiologic condition of the person.  Kusens is focused on image analysis and includes a detailed description of analysis of the face for symptoms of stroke and it would have been obvious to combine the detailed method of Kusens with the combination to make the image analysis portion of the combination more precise.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claims 16 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, Pakhomov in view of Kusens.
Claim 16 includes limitations similar to the limitations of Claim 8 and is rejected under similar rationale. 
Claim 21 is a method claim with limitations similar to the limitations of Claims 16 (or 8) and 17 and is rejected under similar rationale. 
The Claims depend from Claims 9 and 18 and therefore do not require addition of Singhal which was cited for the last limitation of Claim 2 regarding inactivating the vehicle if the driver is impaired.

Claims 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kostic in view of Srivastava and further in view of Pakhomov and Singhal.
Regarding Claim 10, Kostic in Figure 13 teaches an embodiment that is implemented in a car and “11. The stroke detection device of claim 9 wherein the main device is one of a tablet computer, a laptop computer, a desktop computer, a phone, a camera, a wearable device, and a vehicle.”  But does not teach disabling the car when a stroke is detected.  
Singhal teaches:
10. The electronic device as defined in claim 9, wherein the electronic device comprises a vehicle, and the first action comprises inhibiting the vehicle from being drivable or flyable.  [Singhal is directed to a “mental impairment detection determination using reaction time test" which denies ignition to the car if the driver is considered to be impaired.  The test of impairment includes an interactive voice response (Figure 1B, 54) that asks questions from the user and asks him to respond to the question or repeat a phrase and measures the response time and compares it to a sober response time of the same user.  " [0062] The stimulus is in the form of a simple question that requires a verbal response or a motor response that can be detected and measured. The simple question may be what is your name or what is 2 plus 4, or blow the horn two times. The response is a speech or sound that can be picked up by a microphone and with the help of prior art speech recognition and processing technology be able to measure the response for timeliness and or accuracy. Prior art speech processing technologies and devices provide the ability to be able to receive process and precisely measure speech and sound responses.”  Figure 5 and “[0126] At Step 116, Ground station: Send ignition close loop command to vehicle if reaction time is within limits. [0127] At Step 118, Vehicle: Receive Vehicle Ignition command and complete auto ignition.”]
Kostic, Srivastava, and Pakhomov and Singhal are directed to analyzing human speech to compare to a previous value and obtain the change in the physiologic condition of the person.  Singhal uses the obtained result to disable a car that the user is driving and it would have been obvious to combine the use and application of Singhal with the method of the combination to use the results of the analysis portion of the Combination for the purpose of disabling a car instead of  or in addition to calling an ambulance.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results.  See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 19 includes limitations similar to the limitations of Claim 10 and is rejected under similar rationale. 

Claims 17 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Kostic, Srivastava, and Pakhomov in view of Kusens (U.S. 2017/0195637).
Regarding Claim 17, Kostic teaches:
17. The electronic device as defined in claim 9, 
wherein the first action comprises generation of a notification and provision of the notification to one or more destinations, [Kostic notifies the user and hospital if the device detects a stroke: “[0072] … If the comparison of any of these subsequently gathered samples with their corresponding baseline samples differs by more than corresponding predefined thresholds, then … issues a notification to the user.” “[0070] … automatically contacts a stroke monitoring center or another predefined recipient.” “[0116] … Step 106 may also involve the automatic communication with one or more remote recipients….” “[0112] …. By automatically forwarding the current location of the user to one or more recipients (e.g. individual 82, hospital 84, and/or monitoring center 86), the recipient is able to summon an ambulance, or other rescue personnel, to the user should the condition of the user warrant such a step.”]
wherein the notification comprises: at least a portion of the received digitized human vocal expression, text corresponding to at least a portion of the received digitized human vocal expression, and at least one received image of the first user. [Kostic permits video monitoring of the patients.]
Kusens teaches general monitoring and thus teaches or suggests:
wherein the notification comprises: 
at least a portion of the received digitized human vocal expression, [Kusens has a confirmation feature during which the “central monitoring station 150” people monitor the audio and video feed of the stroke patient. “[0045] … On receiving an alert, the central monitoring station 150, or an attendant there, may view live image, video and/or audio feed from the 3D motion sensor 110, and evaluate whether the automated observations are persistent and/or troubling.. …”]
text corresponding to at least a portion of the received digitized human vocal expression, and, [Kusens, “[0044] … , computerized monitoring system 130 can analyze sound data for recognizable words, such as yes, no, help ….” This requires speech recognition which at the least suggests conversion to text. Additionally, Kusens teaches that alerts can be sent in the form of an email or a text message. See [0049]. ]
at least one received image of the first user. [Kusens conducts monitoring of the patient which transmits the image of the patient to the “central monitoring station 150.” “[0045] … On receiving an alert, the central monitoring station 150, or an attendant there, may view live image, video and/or audio feed from the 3D motion sensor 110, …” See also 11-12. ]
Kostic, Srivastava, Pakhomov and Kusens are directed to analyzing human face or speech to compare to a previous value and obtain the change in the physiologic condition of the person. It would have been obvious to combine the additional features of method of Kusens with the combination to allows the emergency personnel access to the raw data that caused the alert. This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 22 includes limitations similar to the limitations of Claim 17 and is rejected under similar rationale. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499.  The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Fariba Sirjani/
Primary Examiner, Art Unit 2659




2. An electronic device configured to process audible expressions from users, comprising: 
a network interface; 
at least one computing device; and 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: 
receive, over a network via the network interface, a digitized human vocal expression of a first user from a first source; 
process the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain; 
use the processed digitized human vocal expression to determine characteristics of the human vocal expression by at least: 
determining a power spectrum of the human vocal expression; 
detecting quiet time using the power spectrum of the human vocal expression to determine pauses and length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression; 
use a natural language module to: 
detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; 
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes; 
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a third weight, the detected grammar violations; 
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted third identified change with respect to the detected grammar violations; and 
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken, the first action comprising causing a vehicle to be prevented from being drivable or flyable. 

3. The electronic device as defined in claim 2, wherein the electronic device is configured to estimate a quasiperiodic signal period of the human vocal expression and determine a pitch using the estimated quasiperiodic signal period and use the determined pitch in inferring the change in health status of the first user. 

4. The electronic device as defined in claim 2, wherein the electronic device is configured to determine a cepstrum pitch using an inverse Fourier transform (IFT) of a logarithm of an estimated spectrum of a human vocal expression signal and use the determined pitch in inferring the change in health status of the first user. 

5. The electronic device as defined in claim 2, wherein the electronic device is configured is configured to determine a volume of the human vocal expression based at least in part on peak heights in the power spectrum of the human vocal expression and use the determined volume in inferring the change in health status of the first user. 

6. The electronic device as defined in claim 2, wherein the electronic device is configured to determine as to how rapidly the first user is speaking based at least in part on a determination of how many words are spoken by the first user over a first period of time. 

7. The electronic device as defined in claim 2, wherein the electronic device is configured to non-speech analysis module is configured to identify pauses in speech in the human vocal expression using both the power spectrum and a magnitude spectrum of the human vocal expression. 

8. The electronic device as defined in claim 2, wherein the electronic device is configured to determine if an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state. 

9. An electronic device, comprising: 
a network interface; 
at least one computing device; and 
computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to: 
access a digitized human vocal expression of a first user from a first source converted from a time domain to a frequency domain; 
use the converted digitized human vocal expression to determine characteristics of the human vocal expression, by at least: 
detecting quiet time using identified pauses and length of pauses in speech in the human vocal expression, and 
determining how rapidly the first user is speaking in the human vocal expression; 
use natural language processing to: 
detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;
compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes; 
weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user; 
weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; 
weight, using a third weight, the detected grammar violations; 
infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted detected grammar violations; and 
based at least in part on the inferred change in health status of the first user, cause a first action is to be taken. 
10. The electronic device as defined in claim 9, wherein the electronic device comprises a vehicle, and the first action comprises inhibiting the vehicle from being drivable or flyable. 

Claim 11 includes limitations similar to the limitations of Claim 3 and is rejected under similar rationale. 
Claim 12 includes limitations similar to the limitations of Claim 4 and is rejected under similar rationale. 
Claim 13 includes limitations similar to the limitations of Claim 5 and is rejected under similar rationale. 
Claim 14 includes limitations similar to the limitations of Claim 6 and is rejected under similar rationale. 
Claim 15 includes limitations similar to the limitations of Claim 7 and is rejected under similar rationale. 
Claim 16 includes limitations similar to the limitations of Claim 8 and is rejected under similar rationale. 

17. The electronic device as defined in claim 9, 
wherein the first action comprises generation of a notification and provision of the notification to one or more destinations, 
wherein the notification comprises: at least a portion of the received digitized human vocal expression, text corresponding to at least a portion of the received digitized human vocal expression, and at least one received image of the first user. 

Claim 18 is an independent method claim with limitations similar to the limitations of Claim 9 and is rejected under similar rationale. 
Claim 19 is a method claim with limitations similar to the limitations of Claim 10 and is rejected under similar rationale. 
Claim 20 is a method claim with limitations similar to the limitations of Claim 13 (or 5) and is rejected under similar rationale. 
Claim 21 is a method claim with limitations similar to the limitations of Claims 16 (or 8) and 17 and is rejected under similar rationale. 
21. The computer implemented method as defined in claim 18, the method further comprising: 
processing one or more images of the first user to detect occlusion of eyes of the first user by eyelids of the first user; and 
determining whether an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state, 
wherein the first action is caused to be taken based in part on the determination of whether an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state. 
Claim 22 is a method claim with limitations similar to the limitations of Claim 17 and is rejected under similar rationale. 


11. The electronic device as defined in claim 9, wherein the determined characteristics of the human vocal expression comprise pitch, and the electronic device is configured to estimate a quasiperiodic signal period of the human vocal expression and determine the pitch using the estimated quasiperiodic signal period and use the determined pitch in inferring the change in health status of the first user. 

12. The electronic device as defined in claim 9, wherein the determined characteristics of the human vocal expression comprise pitch, and the electronic device is configured to determine a cepstrum pitch using an inverse Fourier transform (IFT) of a logarithm of an estimated spectrum of a human vocal expression signal and use the determined pitch in inferring the change in health status of the first user. 

13. The electronic device as defined in claim 9, wherein the electronic device is configured to determine a volume of the human vocal expression based at least in part on peak heights in a power spectrum of the human vocal expression and use the determined volume in inferring the change in health status of the first user. 

14. The electronic device as defined in claim 9, wherein the electronic device is configured to determine how rapidly the first user is speaking based at least in part on a determination of how many words are spoken over a first period of time. 

15. The electronic device as defined in claim 9, wherein the electronic device is configured to identify pauses in speech in the human vocal expression using both a power and a magnitude spectrum of the human vocal expression. 

16. The electronic device as defined in claim 9, wherein the electronic device is configured to determine if an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state. 

17. The electronic device as defined in claim 9, 
wherein the first action comprises generation of a notification and provision of the notification to one or more destinations, 
wherein the notification comprises: at least a portion of the received digitized human vocal expression, text corresponding to at least a portion of the received digitized human vocal expression, and at least one received image of the first user. 

18. A computer implemented method, comprising: accessing at a computer system comprising one or more computing devices, a digitized human vocal expression of a first user from a first source converted from a time domain to a frequency domain; using the converted digitized human vocal expression to determine characteristics of the human vocal expression, by at least: detecting quiet time using identified pauses and length of pauses in speech in the human vocal expression, determining how rapidly the first user is speaking in the human vocal expression; using natural language processing to: detect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; comparing the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes; weighting, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user; weighting, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user; weighting, using a third weight, the detected grammar violations; inferring a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, and the weighted detected grammar violations; and based at least in part on the inferred change in health status of the first user, causing a first action is to be taken. 

19. The computer implemented method as defined in claim 18, wherein the first device comprises a vehicle, and the first action comprises causing the vehicle to be prevented from being drivable or flyable. 

20. The computer implemented method as defined in claim 18, the method further comprising determining a volume of the human vocal expression based at least in part on peak heights in a power spectrum of the human vocal expression and using the determined volume in inferring the change in health status of the first user. 

21. The computer implemented method as defined in claim 18, the method further comprising: 
processing one or more images of the first user to detect occlusion of eyes of the first user by eyelids of the first user; and 
determining whether an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state, 
wherein the first action is caused to be taken based in part on the determination of whether an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state. 

22. The computer implemented method as defined in claim 18, the method further comprising: 
wherein the first action comprises generating a notification and providing the notification to one or more destinations, 
wherein the notification comprises: at least a portion of the received digitized human vocal expression, text corresponding to at least a portion of the received digitized human vocal expression, and at least one received image.