Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
In the amendments filed on 09 September 2021, the following has occurred: Claims 1, 10 and 16 have been amended.
Now claims 1-6, 8-10, 13-14, 16-17, 19 and 23-30 are pending.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09 September 2021 has been entered.

Information Disclosure Statement
The Information Disclosure Statement(s) filed on 04 February 2022, has been considered by the Examiner.
 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-6, 8-9, 16-17, 19 and 23-30 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent App. No. 2019/0239791 (hereafter “Beck”) in view of U.S. Patent App. No. 2019/0043610 (hereafter “Vaughan”), in view of U.S. Patent App. No. 2019/0110754 (hereafter “Rao”), in view of U.S. Patent App. No. 2011/0082712 (hereafter “Eberhardt”).

Regarding (Currently Amended) claim 1, Beck teaches a system for screening the mental health of patients (Beck: paragraph [0001], “a system and method to evaluate and predict the mental condition of a person”, paragraph [00070], “using the invention… to screen for mental health”), the system comprising:
--a display (Beck: paragraph [0136], “a display screen”);
--a microphone (Beck: paragraph [0136], “microphone”);
--a camera positioned to capture an image in front of the display and configured to output video data (Beck: Figure 1, element 110, paragraph [0074], “the system can include a camera 110 and a microphone 120”. Also see, paragraphs [0060], [0076]-[0081]. The Examiner notes the camera captures the facial/micro-expressions of the user of the device);
--a user interface (Beck: paragraph [0135], “user interface”);
--a memory containing machine readable medium comprising machine executable code having stored thereon instructions for performing a method of evaluating the mental health of a user; and a control system coupled to the memory comprising one or more processors (Beck: paragraph [0136], “The methods described herein can be executed on a computer system, generally comprised of a central processing unit (CPU) that is operatively connected to a memory device… Computer code executed by the CPU”. Also see, paragraph [0140]), the control system configured to execute the machine executable code to cause the control system to:
in response to the customer's actuation of the browser user interface”, paragraph [0141], “a user's computer can run an application”); and […], wherein the test application comprises: […];
--recording, by the camera, a set of test video data (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-expressions and/or spontaneous expressions. The microphone can detect the same emotion from the person's voice”, paragraph [0076], “The camera(s) can also detect and monitor facial expressions, including micro-expressions”. Also see, paragraph [0105]. The video data of the camera is recorded, and used to determine facial/micro-expressions);
--recording, by the microphone, a set of test audio data (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-expressions and/or spontaneous expressions. The microphone can detect the same emotion from the person's voice”, paragraph [0088], “a microphone and/or speech sensors 120. Deviations from normal speech patterns can be identified. A patient's speech can be analyzed over time to detect aberrations and possible signs of anxiety and/or declining mental health”. Also see paragraph [0101], [0105]. The audio data of the microphone is recorded, and used to determine deviations);
--processing the video data to assign a plurality of pixels the video data to the face of the user […]; processing the plurality of pixels to output a set of video features comprising facial expressions of the user […] (Beck: paragraphs [0075]-[0076], “face analysis sensors… The sensors can monitor physiological activity of the patient including… facial expressions, facial detect and monitor facial expressions, including micro-expressions”. Also see, paragraphs [0082]-[0083]. The Examiner interprets the video data of the camera is analyzed to determine pixels of face and determine facial expressions); and
--processing the audio data to identify sounds representing the voice of the user […] and output a set of audio features comprising tone of voice of the use […] (Beck: paragraph [0075], “The sensors can monitor physiological activity of the patient including… changes in speaking tone and/or speaking volume”, paragraph [0088], “a microphone and/or speech sensors 120. Deviations from normal speech patterns can be identified. A patient's speech can be analyzed over time to detect aberrations and possible signs of anxiety and/or declining mental health”. Also see paragraph [0101], [0105].);
--processing, using a machine learning model, the set of video features, and the set of audio features to output a mental health indication of the user (Beck: Figures 1-2, paragraph [0018], “Machine learning and/or artificial intelligence can be used to correlate the aberrations, deviations and/or patterns with one or more mental health ailments”, paragraph [0090], “Data (i.e. raw data) from the sensors can be extracted (130, 135, 140 and 145)… An algorithm can then be used to analyze the features/data compiled from the sensors. Machine learning and/or artificial intelligence (not shown) can be used to detect aberrations, deviations and/or patterns”, paragraph [0094], “Machine Learning 165 and/or artificial intelligence can be utilized to, for example, identify patterns, relationships and/or correlations among the data. The system can also predict the mental condition of a patient”. Also see, paragraph [0095].  The Examiner notes as shown in Figure 2, the recorded video and audio are fused and input into a machine learning algorithm to make a prediction of mental health), wherein the machine learning model was generated by:
a plurality of datasets from a plurality of subjects, each subject' s dataset comprising an array of features and corresponding feature values, and a classification of the subject' s develop mental disorder or condition”), 
--the labeled training data comprising audio and video data recorded for each of the plurality of individuals […] (Vaughan: paragraph [0077], “data collected and utilized by the system can include subject and caregiver video , audio , responses to questions or activities”, paragraph [0115], “The training data 650, used by the training module to construct the assessment model, can comprise… one or more of answers to the questions, observations of the subject such as characterizations based on video images”. Also see, paragraphs [0063], [0118], and [0127]. The Examiner notes the datasets for the plurality of individuals comprises audio, video data and answers, and this is used to train the model); and
--determining a plurality of features from the labeled training data (Vaughan: paragraph [0088], “determining a plurality of features and corresponding feature values”); […].
Beck may not explicitly teach (underlined below for clarity):
--terminate the test application upon receiving, by the control system, an indication to stop the test;
Vaughan teaches a system to assess mental health of a user (Vaughan: paragraph [0079], “a system diagram for a digital personalized medicine platform 100 for providing diagnosis and therapy related to behavioral , neurological or mental health disorders”), in which
terminate the test application upon receiving, by the control system, an indication to stop the test (Vaughan: Figure 21, paragraph [0214], “one or more button allowing a user to access an incomplete evaluation… and the ability to finish that evaluation”. The Examiner interprets the “Cancel” button on figure 21, is an indication received by the user to the control system to stop the test);
One of ordinary skill in the art before the effective filing date would have found it obvious to include using an indication to stop the test as taught by Vaughan within the system for determining mental health as taught by Beck with the motivation of “improve both the accuracy and efficiency for diagnosis and treatment” (Vaughan: paragraph [0005]).
Beck and Vaughan may not explicitly teach (underlined below for clarity):
--displaying, on the display, a set of text for the user to read aloud comprising a statement;
--displaying, on the display, live video data recorded by the camera capturing the user while reading the statement aloud;
--processing the video data to assign a plurality of pixels the video data to the face of the user while reading the statement aloud; processing the plurality of pixels to output a set of video features comprising facial expressions of the user while reading the statement aloud; and
--processing the audio data to identify sounds representing the voice of the user while the user read the statement aloud and output a set of audio features comprising tone of voice of the use while reading the statement aloud;
--the labeled training data comprising audio and video data recorded for each of the plurality of individuals while reading the statement aloud;

--displaying, on the display, a set of text for the user to read aloud comprising a statement (Rao: paragraph [0089], “the patient may be prompted to read a specific statement aloud to provide a standardized audio sample”, paragraph [0104], “focusing on speech patterns by having the user read a sentence displayed on the screen and recording the speech using the device's microphone”, paragraph [0153], “having the patient read from the list of words provided”. Also see, paragraph [0110]. The Examiner notes that “for the user to read aloud” is an intended use of the displayed that is not required to occur. This feature has been fully considered by the Examiner; however, the limitation does not provide patentable distinction over the cited prior art because it is an intended use or result of the displayed text. The Examiner notes a sentence is displayed for the user to read aloud, and be captured by the microphone);
--displaying, on the display, live video data recorded by the camera capturing the user while reading the statement aloud (Rao: paragraph [0076], “the device used to collect the data will prompt the user or patient to perform the preferable tests. Such prompts may be made, by way of example… by providing a frame or other outline on a live video feed displayed on the device to indicate where the camera should be centered”, paragraph [0110], “Record close-up video (with audio) of the patient's face while they say a prompted sentence”);
--processing the video data to assign a plurality of pixels the video data to the face of the user while reading the statement aloud; processing the plurality of pixels to output a set of video features comprising facial expressions of the user while reading the statement aloud (Rao: paragraph [0049], “wherein said raw patient data comprises a video recording, wherein said a recording of the patient's face while reading a prepared statement”, paragraphs [0088]-[0090], “Video analysis of the patient may include analysis of the patient's face and facial movements, mouth specific movements… initial processing may be done to accurately localize the body part and its sub components (e.g., the face and parts of the face such as eye and mouth locations). The localization may be used to constrain the region over which further processing and feature extraction is performed… record a depth value for every pixel in the image”, paragraph [0100], “extract measurements associated with some action (e.g., jaw displacement in tremor, finger tapping rate, repetitive speech rate, facial expression”. Also see, paragraphs [0108]-[0110]); and
--processing the audio data to identify sounds representing the voice of the user while the user read the statement aloud and output a set of audio features comprising tone of voice of the use while reading the statement aloud (Rao: Figures 2-4, paragraph [0089], “the processing may involve detection of speech and other sounds, statistical analysis of the audio data, filtering of the signal for feature extraction. The raw audio data and or any derived features could then be provided as input to a recurrent neural network to perform further feature extraction.”, paragraph [0100], “extract measurements associated with some action (e.g., jaw displacement in tremor, finger tapping rate, repetitive speech rate”, paragraph [0104], “focusing on speech patterns by having the user read a sentence displayed on the screen and recording the speech using the device's microphone”. Also see, paragraphs [0044], [0108]-[0110]);
--the labeled training data comprising audio and video data recorded for each of the plurality of individuals while reading the statement aloud (Rao: paragraphs [0101]-[0104], “train a plurality of machine learning systems to generate a number of classification models… to produce a predictive diagnostic model… trained by comparing the data from subjects which have These video recordings will be used for training models to diagnose… Record close-up video (with audio) of the patient's face while they say a prompted sentence”. Also see, paragraph [0086]);
One of ordinary skill in the art before the effective filing date would have found it obvious to include capturing, analyzing and training a model using data while a user reads a statement aloud as taught by Rao within the video and audio analysis as taught by Beck, and Vaughan with the motivation of “identify new clinical indicia of disease, or recognize previously unidentified combinations of symptoms that allow it to accurately diagnose a disorder” (Rao: paragraph [0031]).
Beck, Vaughan and Rao may not explicitly teach (underlined below for clarity):
--generating a plurality of subset machine learning models based on the plurality of features;
--evaluating a classification performance of the generated plurality of subset machine learning models; and
--selecting at least one of the subset machine learning models as the machine learning model.
Eberhardt teaches generating a plurality of subset machine learning models based on the plurality of features (Eberhardt: Figure 3, paragraphs [0039]-[0042], “training data is randomized and segmented in order to score feature selection… The training set is used to train a BBN classifier… a classifier is trained on each of the training sets created in the data preparation step”, claim 3, “building a list of a plurality of BBN model candidates based on the training set of claim data”. Also see, paragraph [0096]);
--evaluating a classification performance of the generated plurality of subset machine learning models (Eberhardt: Figure 3, paragraph [0034], “each of the BBN models is scored using one or more scoring methods, such as Minimum Description Length (MDL), also known as the Bayesian Information Criterion (BIC), as well as Bayesian Scoring (BDe)”, paragraph [0042], “a receiver operating characteristic (ROC) curve is plotted for each test exercise to calculate classification accuracy”. Also see, paragraph [0005], claim 3); and
--selecting at least one of the subset machine learning models as the machine learning model (Eberhardt: Figure 3, paragraph [0034], “a BBN model having the highest score among all is selected as the final candidate from the BBN candidates”. Also see, claim 3).
One of ordinary skill in the art before the effective filing date would have found it obvious to include training a plurality of models and scoring each to select one as taught by Eberhardt within the training and use of a machine learning model as taught by Beck, Vaughan, Rao with the motivation of “improve patient screening, risk stratification, and therapy selection to address current shortcomings” (Eberhardt: paragraph [0055]).

Regarding (Original) claim 3, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 1, and further teaches wherein recording, by the microphone, further comprises: initiating the recording upon determining, by the control system, that the user is speaking (Beck: paragraphs [0110]-[0112], “The system continues to monitor the subject as he/she speaks… Data is collected from the subject… the system can identify speech and breathing patterns that are associated with a patient speaking”. Also see, paragraph [0127]).


Regarding (Previously Presented) claim 4, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 1, and further teaches wherein the control system is further configured to: receive the set of test video data and the set of test audio data (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-expressions and/or spontaneous expressions. The microphone can detect the same emotion from the person's voice”, paragraph [0076], “The camera(s) can also detect and monitor facial expressions, including micro-expressions”, paragraph [0088], “a microphone and/or speech sensors 120. Deviations from normal speech patterns can be identified. A patient's speech can be analyzed over time to detect aberrations and possible signs of anxiety and/or declining mental health”. Also see, paragraph [0105]; Vaughan: paragraph [0077]. The Examiner notes this is receiving of video and audio data from the camera and microphone);
--preprocess the received set of test video data to identify a plurality of video segments, each video segment corresponding to one phrase in the statement and comprising a time window (Beck: paragraph [0013], “detecting and recording data from the person's autonomic and voluntary responses with one or more sensors”, paragraph [0087], “The response time for answering individual questions can also be observed and recorded”; Vaughan: paragraphs [0113]-[0114], “a preprocessing module 605… The preprocessed new data can be passed on to the prediction module, which may output a prediction 670”, paragraph [0148], “The user interface may be configured to administer the assessment procedure in real-time, such that the user answers one question at a time and the prediction module can select the next best question segmenting actions in the video”. Also see, Rao: paragraphs [0104], [0110]. The Examiner interprets the detecting of response time is identification of segments for recitation of a response (i.e., a phrase) in the recitation of statements, in combination with the preprocessing model and segmentation of Vaughan and Rao teaches what is required); and
--preprocess the received set of test audio data to identify a plurality of audio segments, each audio segment corresponding to one question in the series of questions and comprising a time window (Beck: paragraph [0013], “detecting and recording data from the person's autonomic and voluntary responses with one or more sensors”, paragraph [0087], “The response time for answering individual questions can also be observed and recorded”; Vaughan: paragraphs [0113]-[0114], “a preprocessing module 605… The preprocessed new data can be passed on to the prediction module, which may output a prediction 670”, paragraph [0148], “The user interface may be configured to administer the assessment procedure in real-time, such that the user answers one question at a time and the prediction module can select the next best question to ask based on recommendations made by the feature recommendation module”. The Examiner interprets the detecting of response time is identification of segments for answering the individual questions, in combination with the preprocessing model of Vaughan teaches what is required).
The motivation to combine is the same as in claim 1, incorporated herein. 

Regarding (Original) claim 5, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 1, and further teaches wherein the control system is further configured to: preprocess the plurality of audio segments and the plurality of video segments to identify a fusion algorithm that combines input from sensors (including a camera, pressure sensor and speech sensor) to evaluate and/or predict the mental condition of a person”, paragraph [0031], “an algorithm that synchronizes sensor input”. Also see, paragraph [0104]; Vaughan: paragraphs [0113]-[0114], [0148]. The Examiner notes fusion and synchronization of the sensor data (audio and video data) reads on identification of the overlapping time windows and integrating into a data set to be used for analysis (i.e., the integrated set)).
The motivation to combine is the same as in claim 1, incorporated herein.

Regarding (Original) claim 6, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 1, and further teaches wherein the machine learning model is at least one of: a generalized linear model, a regression model, a logistical regression model, and a supervised machine learning classification model (Beck: Figure 2, element 375. Also see, Vaughan: paragraph [0120]. The Examiner notes SVM and Neural networks are supervised learning classification models).
The motivation to combine is the same as in claim 1, incorporated herein.

Regarding (Original) claim 8, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 1, and further teaches wherein the mental health indication identifies a likelihood of the user having one of a plurality of mental health disorders, the plurality of mental health disorders comprising: a neuropsychiatric disorder, schizophrenia, and a bipolar disorder (Beck: paragraph [0061], “there are a myriad of mental illnesses, common illnesses include depression, dementia and schizophrenia”; Vaughan: paragraph [0179], “behavioral disorders such as autism spectrums disorders , attention deficits disorders , bipolar disorder , schizophrenia , epilepsy , cerebral palsy , and / or any other behavioral disorder”).
The motivation to combine is the same as in claim 1, incorporated herein.

Regarding (Original) claim 9, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 1, and further teaches wherein the mental health indication identifies whether the user is a patient or a healthy control (Beck: paragraph [0094], “The system can also predict the mental condition of a patient. For example, the system can recognize a decline in cognitive abilities. The system can also identify patterns of conduct/activity of patients in historical data that have subsequently experienced mental health issues”).
The motivation to combine is the same as in claim 1, incorporated herein.

Regarding (Currently Amended) claim 16, Beck teaches a system for screening the mental health of patients (Beck: paragraph [0001], “a system and method to evaluate and predict the mental condition of a person”, paragraph [00070], “using the invention… to screen for mental health”), the system comprising:
--a memory containing machine readable medium comprising machine executable code having stored thereon instructions for performing a method; a control system coupled to the memory comprising one or more processors (Beck: paragraph [0136], “The methods described herein can be executed on a computer system, generally comprised of a central processing unit (CPU) that is operatively connected to a memory device… Computer code executed by the CPU”. Also see, paragraph [0140]), the control system configured to execute the machine executable code to cause the control system to: […];
--receive a set of test video data recorded during a test representing the face of the user […]; process the set of test video data to output a set of video features (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-expressions and/or spontaneous expressions. The microphone can detect the same emotion from the person's voice”, paragraph [0076], “The camera(s) can also detect and monitor facial expressions, including micro-expressions”. Also see, paragraph [0105]. The video data of the camera is recorded, and used to determine facial/micro-expressions which are video features);
--receive a set of test audio data recorded during the test representing the voice of the user […]; process the set of audio data to output a set of audio features (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-expressions and/or spontaneous expressions. The microphone can detect the same emotion from the person's voice”, paragraph [0088], “a microphone and/or speech sensors 120. Deviations from normal speech patterns can be identified. A patient's speech can be analyzed over time to detect aberrations and possible signs of anxiety and/or declining mental health”. Also see paragraph [0101], [0105]. The audio data of the microphone is recorded, and used to determine deviations which are audio features);
--process, using a machine learning model, the set of video features, and the set of audio features to output an indication of the mental health of the user (Beck: Figures 1-2, paragraph [0018], “Machine learning and/or artificial intelligence can be used to correlate the aberrations, deviations and/or patterns with one or more mental health ailments”, paragraph [0090], “Data (i.e. raw data) from the sensors can be extracted (130, 135, 140 and 145)… An algorithm can then be used to analyze the features/data compiled from the sensors. Machine learning and/or artificial intelligence (not shown) can be used to detect aberrations, deviations and/or patterns”, paragraph [0094], “Machine Learning 165 and/or artificial intelligence can be utilized to, for example, identify patterns, relationships and/or correlations among the data. The system can also predict the mental condition of a patient”. Also see, paragraph [0095].  The Examiner notes as shown in Figure 2, the recorded video and audio are fused and input into a machine learning algorithm to make a prediction of mental health), wherein the machine learning model was generated by:
--receiving labeled training data for a plurality of individuals associated with an indication of a mental health of the plurality of individuals (Vaughan: paragraph [0115], “The training data 650, used by the training module to construct the assessment model, can comprise a plurality of datasets from a plurality of subjects, each subject's dataset comprising an array of features and corresponding feature values, and a classification of the subject's develop mental disorder or condition”), 
--the labeled training data comprising audio and video data recorded for each of the plurality of individuals […] (Vaughan: paragraph [0077], “data collected and utilized by the system can include subject and caregiver video , audio , responses to questions or activities”, paragraph [0115], “The training data 650, used by the training module to construct the assessment model, can comprise… one or more of answers to the questions, observations of the subject such as characterizations based on video images”. Also see, paragraphs [0063], 
--determining a plurality of features from the labeled training data (Vaughan: paragraph [0088], “determining a plurality of features and corresponding feature values”); […].
Beck may not explicitly teach (underlined below for clarity): 
--display, on a display, a set of text […];
Vaughan teaches a system to assess mental health of a user (Vaughan: paragraph [0079], “a system diagram for a digital personalized medicine platform 100 for providing diagnosis and therapy related to behavioral , neurological or mental health disorders”), in which
--display, on a display, a set of text […] (Vaughan: Figure 21, paragraph [0024], “1) display a plurality of questions related to a cognitive function of the subject ; 2) receive input from a user comprising answers to the plurality of questions related to the subject”. The Examiner notes that “for the user to read aloud” is an intended use of the displayed that is not required to occur. This feature has been fully considered by the Examiner; however, the limitation does not provide patentable distinction over the cited prior art because it is an intended use or result of the displayed text. The Examiner further notes the displayed questions are interpreted to read on a statement);
One of ordinary skill in the art before the effective filing date would have found it obvious to include using an indication to stop the test and displaying the text as taught by Vaughan within the system for determining mental health as taught by Beck with the motivation of “improve both the accuracy and efficiency for diagnosis and treatment” (Vaughan: paragraph [0005]).
Beck and Vaughan may not explicitly teach (underlined below for clarity):
for the user to read aloud;
--receive a set of test video data recorded during a test representing the face of the user while the user is reading the set of text aloud; process the set of test video data to output a set of video features;
--receive a set of test audio data recorded during the test representing the voice of the user while the user is reading the set of text aloud; process the set of audio data to output a set of audio features;
--the labeled training data comprising audio and video data recorded for each of the plurality of individuals recorded while reading the set of text aloud;
Rao teaches a machine learning system for processing audio and video data to produce a diagnostic prediction for neurological disorders (Rao: Figures 2-4, paragraphs [0031], [0034]), in which
--display, on a display, a set of text for the user to read aloud (Rao: paragraph [0089], “the patient may be prompted to read a specific statement aloud to provide a standardized audio sample”, paragraph [0104], “focusing on speech patterns by having the user read a sentence displayed on the screen and recording the speech using the device's microphone”, paragraph [0153], “having the patient read from the list of words provided”. Also see, paragraph [0110]. The Examiner notes that “for the user to read aloud” is an intended use of the displayed that is not required to occur. This feature has been fully considered by the Examiner; however, the limitation does not provide patentable distinction over the cited prior art because it is an intended use or result of the displayed text. The Examiner notes a sentence is displayed for the user to read aloud, and be captured by the microphone);
while the user is reading the set of text aloud; process the set of test video data to output a set of video features (Rao: paragraph [0049], “wherein said raw patient data comprises a video recording, wherein said video recording comprises… a recording of the patient's face while reading a prepared statement”, paragraphs [0088]-[0090], “Video analysis of the patient may include analysis of the patient's face and facial movements, mouth specific movements… initial processing may be done to accurately localize the body part and its sub components (e.g., the face and parts of the face such as eye and mouth locations). The localization may be used to constrain the region over which further processing and feature extraction is performed… record a depth value for every pixel in the image”, paragraph [0100], “extract measurements associated with some action (e.g., jaw displacement in tremor, finger tapping rate, repetitive speech rate, facial expression”. Also see, paragraphs [0108]-[0110]);
--receive a set of test audio data recorded during the test representing the voice of the user while the user is reading the set of text aloud; process the set of audio data to output a set of audio features (Rao: Figures 2-4, paragraph [0089], “the processing may involve detection of speech and other sounds, statistical analysis of the audio data, filtering of the signal for feature extraction. The raw audio data and or any derived features could then be provided as input to a recurrent neural network to perform further feature extraction.”, paragraph [0100], “extract measurements associated with some action (e.g., jaw displacement in tremor, finger tapping rate, repetitive speech rate”, paragraph [0104], “focusing on speech patterns by having the user read a sentence displayed on the screen and recording the speech using the device's microphone”. Also see, paragraphs [0044], [0108]-[0110]);
while reading the set of text aloud (Rao: paragraphs [0101]-[0104], “train a plurality of machine learning systems to generate a number of classification models… to produce a predictive diagnostic model… trained by comparing the data from subjects which have been classified as possessing a certain neurological disorder to the data from subjects which have been classified as "healthy."”, paragraphs [0107]-[0110], “range of tests may be recorded using a video camera with a functional microphone. The procedure for recording these data should be consistent from one patient to the next. These video recordings will be used for training models to diagnose… Record close-up video (with audio) of the patient's face while they say a prompted sentence”. Also see, paragraph [0086]);
One of ordinary skill in the art before the effective filing date would have found it obvious to include capturing, analyzing and training a model using data while a user reads a statement aloud as taught by Rao within the video and audio analysis as taught by Beck and Vaughan with the motivation of “identify new clinical indicia of disease, or recognize previously unidentified combinations of symptoms that allow it to accurately diagnose a disorder” (Rao: paragraph [0031]).
Beck, Vaughan and Rao may not explicitly teach (underlined below for clarity):
--generating a plurality of subset machine learning models based on the plurality of features;
--evaluating a classification performance of the generated plurality of subset machine learning models; and
--selecting at least one of the subset machine learning models as the machine learning model.
generating a plurality of subset machine learning models based on the plurality of features (Eberhardt: Figure 3, paragraphs [0039]-[0042], “training data is randomized and segmented in order to score feature selection… The training set is used to train a BBN classifier… a classifier is trained on each of the training sets created in the data preparation step”, claim 3, “building a list of a plurality of BBN model candidates based on the training set of claim data”. Also see, paragraph [0096]);
--evaluating a classification performance of the generated plurality of subset machine learning models (Eberhardt: Figure 3, paragraph [0034], “each of the BBN models is scored using one or more scoring methods, such as Minimum Description Length (MDL), also known as the Bayesian Information Criterion (BIC), as well as Bayesian Scoring (BDe)”, paragraph [0042], “a receiver operating characteristic (ROC) curve is plotted for each test exercise to calculate classification accuracy”. Also see, paragraph [0005], claim 3); and
--selecting at least one of the subset machine learning models as the machine learning model (Eberhardt: Figure 3, paragraph [0034], “a BBN model having the highest score among all is selected as the final candidate from the BBN candidates”. Also see, claim 3).
One of ordinary skill in the art before the effective filing date would have found it obvious to include training a plurality of models and scoring each to select one as taught by Eberhardt within the training and use of a machine learning model as taught by Beck, Vaughan and Rao with the motivation of “improve patient screening, risk stratification, and therapy selection to address current shortcomings” (Eberhardt: paragraph [0055]).

Regarding (Original) claim 17, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 16, and further teaches wherein the machine learning model is at least one 
The motivation to combine is the same as in claim 16, incorporated herein.

Regarding (Previously Presented) claim 19, Beck teaches a machine learning […] system (Beck: Figures 1-2, paragraph [0018], “Machine learning and/or artificial intelligence can be used to correlate the aberrations, deviations and/or patterns with one or more mental health ailments”), comprising:
--at least one non-transitory processor-readable storage medium that stores at least one of processor-executable instructions or data; and at least one processor communicatively coupled to the at least one non-transitory processor-readable storage medium, in operation (Beck: paragraph [0136], “The methods described herein can be executed on a computer system, generally comprised of a central processing unit (CPU) that is operatively connected to a memory device… Computer code executed by the CPU”. Also see, paragraph [0140]), the at least one processor configured to: […]:
--video data and audio data recorded while each of the plurality of individuals read text from a digital display (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-expressions and/or spontaneous expressions. The microphone can detect the same emotion from the person's voice”. Also see, paragraphs [0076], [0105]), 
detect and monitor facial expressions, including micro-expressions”, paragraph [0088], “a microphone and/or speech sensors 120. Deviations from normal speech patterns can be identified. A patient's speech can be analyzed over time to detect aberrations and possible signs of anxiety and/or declining mental health”. Also see, paragraph [0101], [0105]. The video data of the camera is recorded, and used to determine facial/micro-expressions. The audio data of the microphone is recorded, and used to determine deviations); […].
Beck may not explicitly teach (underlined below for clarity):
--a machine learning training system,
--receive labeled training data including data for a plurality of individuals that indicates whether each of the plurality of individuals has one or more of a plurality of mental health disorders, the labeled training data further comprising:
--process the audio data, and the video data to output a plurality of features;
--store the features of the diagnostic classifier in the at least one non-transitory processor-readable storage medium.
Vaughan teaches a system to assess mental health of a user (Vaughan: paragraph [0079], “a system diagram for a digital personalized medicine platform 100 for providing diagnosis and therapy related to behavioral , neurological or mental health disorders”), in which
--a machine learning training system (Vaughan: paragraph [0113], “an assessment model that has been trained using a large set of clinically validated data to learn the statistical 
--receive labeled training data including data for a plurality of individuals that indicates whether each of the plurality of individuals has one or more of a plurality of mental health disorders (Vaughan: paragraph [0115], “The training data 650, used by the training module to construct the assessment model , can comprise a plurality of datasets from a plurality of subjects, each subject' s dataset comprising an array of features and corresponding feature values, and a classification of the subject' s develop mental disorder or condition”), the labeled training data further comprising:
--process the audio data, and the video data to output a plurality of features (Vaughan: paragraph [0088], “determining a plurality of features and corresponding feature values”);
--store the features of the diagnostic classifier in the at least one non-transitory processor-readable storage medium (Vaughan: paragraph [0159], “The storage unit 1215 can store files, such as drivers, libraries and saved programs”).
One of ordinary skill in the art before the effective filing date would have found it obvious to include training a machine learning model as taught by Vaughan within the system for determining mental health using machine learning as taught by Beck with the motivation of “improve both the accuracy and efficiency for diagnosis and treatment” (Vaughan: paragraph [0005]).
Beck and Vaughan may not explicitly teach (underlined below for clarity):
--wherein the video data is processed to identify portions of the video data comprising the face of the individual and the audio data is processed to identify sounds representing the voice of the individual while reading the text from the digital display;
a video recording, wherein said video recording comprises… a recording of the patient's face while reading a prepared statement”, paragraphs [0088]-[0090], “Video analysis of the patient may include analysis of the patient's face and facial movements, mouth specific movements… initial processing may be done to accurately localize the body part and its sub components (e.g., the face and parts of the face such as eye and mouth locations). The localization may be used to constrain the region over which further processing and feature extraction is performed… record a depth value for every pixel in the image”, paragraph [0100], “extract measurements associated with some action (e.g., jaw displacement in tremor, finger tapping rate, repetitive speech rate, facial expression”. Also see, paragraphs [0108]-[0110]) and the audio data is processed to identify sounds representing the voice of the individual while reading the text from the digital display (Rao: Figures 2-4, paragraph [0089], “the processing may involve detection of speech and other sounds, statistical analysis of the audio data, filtering of the signal for feature extraction. The raw audio data and or any derived features could then be provided as input to a recurrent neural network to perform further feature extraction.”, paragraph [0100], “extract measurements associated with some action (e.g., jaw displacement in tremor, finger tapping rate, repetitive speech rate”, paragraph [0104], “focusing on speech patterns by having the user read a sentence displayed on the screen and recording the speech using the device's microphone”. Also see, paragraphs [0044], [0108]-[0110]);
One of ordinary skill in the art before the effective filing date would have found it obvious to include capturing, analyzing and training a model using data while a user reads a 
Beck, Vaughan and Rao may not explicitly teach (underlined below for clarity):
--generate a plurality of subset machine learning models based on the plurality of features;
--evaluate a classification performance of the generated plurality of subset machine learning models;
--select at least one of the plurality of subset machine learning models as a diagnostic classifier; 
Eberhardt teaches generate a plurality of subset machine learning models based on the plurality of features (Eberhardt: Figure 3, paragraphs [0039]-[0042], “training data is randomized and segmented in order to score feature selection… The training set is used to train a BBN classifier… a classifier is trained on each of the training sets created in the data preparation step”, claim 3, “building a list of a plurality of BBN model candidates based on the training set of claim data”. Also see, paragraph [0096]);
--evaluate a classification performance of the generated plurality of subset machine learning models (Eberhardt: Figure 3, paragraph [0034], “each of the BBN models is scored using one or more scoring methods, such as Minimum Description Length (MDL), also known as the Bayesian Information Criterion (BIC), as well as Bayesian Scoring (BDe)”, paragraph [0042], “a receiver operating characteristic (ROC) curve is plotted for each test exercise to calculate classification accuracy”. Also see, paragraph [0005], claim 3);
select at least one of the plurality of subset machine learning models as a diagnostic classifier (Eberhardt: Figure 3, paragraph [0034], “a BBN model having the highest score among all is selected as the final candidate from the BBN candidates”. Also see, claim 3);
One of ordinary skill in the art before the effective filing date would have found it obvious to include training a plurality of models and scoring each to select one as taught by Eberhardt within the training and use of a machine learning model as taught by Beck, Vaughan and Rao with the motivation of “improve patient screening, risk stratification, and therapy selection to address current shortcomings” (Eberhardt: paragraph [0055]).

Regarding (Original) claim 23, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 19, and further teaches wherein the diagnostic classifier is configured to output a mental health indication identifying an individual as healthy or as having a general mental health issue (Beck: paragraph [0094], “The system can also predict the mental condition of a patient. For example, the system can recognize a decline in cognitive abilities. The system can also identify patterns of conduct/activity of patients in historical data that have subsequently experienced mental health issues”).
The motivation to combine is the same as in claim 19, incorporated herein.

Regarding (Original) claim 24, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 19, and further teaches wherein the diagnostic classifier is configured to output a mental health indication identifying an individual as healthy or as having a specific mental health issue (Beck: paragraph [0018], “Machine learning and/or artificial intelligence can be used to correlate the aberrations, deviations and/or patterns with one or more mental health ailments”, paragraph [0061], “there are a myriad of mental illnesses, common illnesses include stress, anxiety, depression, dementia and schizophrenia”; Vaughan: paragraph [0179], “behavioral disorders such as autism spectrums disorders , attention deficits disorders , bipolar disorder , schizophrenia , epilepsy , cerebral palsy , and / or any other behavioral disorder”).
The motivation to combine is the same as in claim 19, incorporated herein.

Regarding (Original) claim 25, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 19, and further teaches wherein the diagnostic classifier is configured to output a mental health indication identifying an individual as having either a first specific mental health disorder or a second specific mental health disorder (Beck: paragraph [0018], “Machine learning and/or artificial intelligence can be used to correlate the aberrations, deviations and/or patterns with one or more mental health ailments”, paragraph [0061], “there are a myriad of mental illnesses, common illnesses include stress, anxiety, depression, dementia and schizophrenia”; Vaughan: paragraph [0179], “behavioral disorders such as autism spectrums disorders , attention deficits disorders , bipolar disorder , schizophrenia , epilepsy , cerebral palsy , and / or any other behavioral disorder”. The systems can diagnose all of the conditions mentioned (i.e., a first and second)).
The motivation to combine is the same as in claim 19, incorporated herein.

Regarding (Original) claim 26, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 19, and further teaches wherein the diagnostic classifier is configured output a mental health indication identifying a risk of developing a mental health disorder for an 
The motivation to combine is the same as in claim 19, incorporated herein.

Regarding (Original) claim 27, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 19, and further teaches wherein the labeled training data further comprises: for each individual in the plurality of individuals, an indication of at least one of the following: whether the individual is healthy, whether the individual has a general mental health issue, whether the individual has one or more specific mental health disorders, whether the individual is at risk of developing a general mental health issue, or whether the individual is at risk of developing one or more specific mental health disorders (Vaughan: paragraph [0115], “The training data 650, used by the training module to construct the assessment model , can comprise a plurality of datasets from a plurality of subjects, each subject' s dataset comprising an array of features and corresponding feature values, and a classification of the subject' s develop mental disorder or condition”).
The motivation to combine is the same as in claim 19, incorporated herein.

Regarding (Original) claim 28, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 19, and further teaches wherein training the initial machine learning model further comprises using k-fold cross validation with logistic regression (Vaughan: paragraph [0114], “a validation module 615, configured to validate the trained assessment model using any appropriate validation algorithm (e.g., Stratified K-fold cross-validation”, paragraph [0120], “logistic regression”). 


Regarding (Original) claim 29, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 19, and further teaches
--wherein the labeled training data further comprises at least one of functional measurement data or physiological measurement data (Vaughan: paragraph [0077], “data collected and utilized by the system can include subject and caregiver video , audio , responses to questions or activities”, paragraph [0115], “The training data 650, used by the training module to construct the assessment model, can comprise… one or more of answers to the questions, observations of the subject such as characterizations based on video images”, paragraph [0182], “measured physiological parameters of the subject”. Also see, paragraphs [0063], [0118], and [0127]. The Examiner notes the datasets for the plurality of individuals comprises physiological data).
The motivation to combine is the same as in claim 19, incorporated herein.

Regarding (Original) claim 30, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 19, and further teaches further comprising: using the features of the diagnostic classifier as a screening tool to assess at least one of intermediate or end-point outcomes in at least one clinical trial testing for treatment responses (Vaughan: paragraph [0018], “The therapeutic module may be configured to receive the updated diagnostic data to determine an updated amount and an updated timing for administering an updated dose of a therapeutic agent and output an updated personal treatment plan for the subject in response to the diagnostic data and the updated diagnostic data”. The Examiner interprets this is assessing 
The motivation to combine is the same as in claim 19, incorporated herein.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent App. No. 2019/0239791 (hereafter “Beck”), U.S. Patent App. No. 2019/0043610 (hereafter “Vaughan”), U.S. Patent App. No. 2019/0110754 (hereafter “Rao”) and U.S. Patent App. No. 2011/0082712 (hereafter “Eberhardt”) as applied to claim 1 above, and further in view of U.S. Patent App. No. 2015/0288924 (hereafter “Liu”).

Regarding (Original) claim 2, Beck, Vaughan, Rao and Eberhardt teaches the limitations of claim 1, but may not explicitly teach wherein the indication to stop the test application comprises a determination, by the control system, that a user face is not within an image captured by the camera.
Liu teaches wherein the indication to stop the test application comprises a determination, by the control system, that a user face is not within an image captured by the camera (Liu: paragraph [0009], “collecting information of a current video communication process includes, when detecting, by using a face tracking technology, that a face of a first participant is absent”).
One of ordinary skill in the art before the effective filing date would have found it obvious to include detecting a face is missing as taught by Liu with the video capture as taught by Beck, Vaughan, Rao and Eberhardt with the motivation of improving the capture of video (Liu: paragraphs [0006] and [0009]).

Claims 10 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent App. No. 2019/0239791 (hereafter “Beck”) in view of U.S. Patent App. No. 2019/0043610 (hereafter “Vaughan”) in view of U.S. Patent App. No. 2019/0110754 (hereafter “Rao”) in view of U.S. Patent App. No. 2015/0288924 (hereafter “Liu”) in view of U.S. Patent App. No. 2011/0082712 (hereafter “Eberhardt”).

	Regarding (Currently Amended) claim 10, Beck teaches a system for screening the mental health of patients (Beck: paragraph [0001], “a system and method to evaluate and predict the mental condition of a person”, paragraph [00070], “using the invention… to screen for mental health”), the system comprising:
--a display (Beck: paragraph [0136], “a display screen”);
--a microphone (Beck: paragraph [0136], “microphone”);
--a camera configured to capture an image in front of the display and to output video data (Beck: Figure 1, element 110, paragraph [0074], “the system can include a camera 110 and a microphone 120”. Also see, paragraph [0060], [0076]-[0081]. The Examiner notes the camera captures the facial/micro-expressions of the user of the device);
--a user interface (Beck: paragraph [0135], “user interface”);
--a memory containing machine readable medium comprising machine executable code
having stored thereon instructions for performing a method; and a control system coupled to the memory comprising one or more processors (Beck: paragraph [0136], “The methods described herein can be executed on a computer system, generally comprised of a central processing unit (CPU) that is operatively connected to a memory device… Computer code executed by the CPU”. Also see, paragraph [0140]), the control system configured to execute the machine executable code to cause the control system to:
--receive, through the user interface, an indication to initiate a test and executing a test application […] (Beck: paragraphs [0134]-[0135], “in response to the customer's actuation of the browser user interface”, paragraph [0141], “a user's computer can run an application”), the test application comprising: […];
--recording, by the camera, a set of test video data during the test (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-expressions and/or spontaneous expressions. The microphone can detect the same emotion from the person's voice”, paragraph [0076], “The camera(s) can also detect and monitor facial expressions, including micro-expressions”. Also see, paragraph [0105]. The video data of the camera is recorded, and used to determine facial/micro-expressions); […];
--continually processing the set of test video data during the test to: identifying a face of the user (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-expressions and/or spontaneous expressions. The microphone can detect the same emotion from the person's voice”, paragraph [0076], “The camera(s) can also detect and monitor facial expressions, including micro-expressions”. Also see, paragraph [0105]. Determining facial/micro-expressions reads on identifying a face); […];
--recording, by the microphone, a set of test audio data during the test (Beck: paragraphs [0073]-[0074], “Both autonomic and voluntary responses of the person can be recorded and analyzed… The camera can detect stress, anxiety or other emotions from expressions, micro-Deviations from normal speech patterns can be identified. A patient's speech can be analyzed over time to detect aberrations and possible signs of anxiety and/or declining mental health”. Also see paragraph [0101], [0105]. The audio data of the microphone is recorded, and used to determine deviations); and
--processing the set of test audio data and test video data to identify audio and video features (Beck: Figures 1-2, paragraph [0018], “Machine learning and/or artificial intelligence can be used to correlate the aberrations, deviations and/or patterns with one or more mental health ailments”, paragraph [0090], “Data (i.e. raw data) from the sensors can be extracted (130, 135, 140 and 145)… An algorithm can then be used to analyze the features/data compiled from the sensors. Machine learning and/or artificial intelligence (not shown) can be used to detect aberrations, deviations and/or patterns”, paragraph [0094], “Machine Learning 165 and/or artificial intelligence can be utilized to, for example, identify patterns, relationships and/or correlations among the data. The system can also predict the mental condition of a patient”. Also see, paragraph [0095].  The Examiner notes as shown in Figure 2, the recorded video, audio and answers are fused and input into a machine learning algorithm to make a prediction of mental health) and 
--storing the audio and video features in the memory (Beck: paragraph [0058], “data from an individual person can be recorded and stored”, paragraphs [0105], “Both raw data and processed data can be stored”), […];
--processing, using a machine learning model, the set of video features, and the set of audio features to output a mental health indication of the user (Beck: Figures 1-2, paragraph Machine learning and/or artificial intelligence can be used to correlate the aberrations, deviations and/or patterns with one or more mental health ailments”, paragraph [0090], “Data (i.e. raw data) from the sensors can be extracted (130, 135, 140 and 145)… An algorithm can then be used to analyze the features/data compiled from the sensors. Machine learning and/or artificial intelligence (not shown) can be used to detect aberrations, deviations and/or patterns”, paragraph [0094], “Machine Learning 165 and/or artificial intelligence can be utilized to, for example, identify patterns, relationships and/or correlations among the data. The system can also predict the mental condition of a patient”. Also see, paragraph [0095].  The Examiner notes as shown in Figure 2, the recorded video and audio are fused and input into a machine learning algorithm to make a prediction of mental health), wherein the machine learning model was generated by:
--receiving labeled training data for a plurality of individuals indicating whether each of the plurality of individuals has one or more mental health disorders (Vaughan: paragraph [0115], “The training data 650, used by the training module to construct the assessment model , can comprise a plurality of datasets from a plurality of subjects, each subject' s dataset comprising an array of features and corresponding feature values, and a classification of the subject' s develop mental disorder or condition”), 
--the labeled training data comprising audio and video data recorded for each of the plurality of individuals […] (Vaughan: paragraph [0077], “data collected and utilized by the system can include subject and caregiver video , audio , responses to questions or activities”, paragraph [0115], “The training data 650, used by the training module to construct the assessment model, can comprise… one or more of answers to the questions, observations of the subject such as characterizations based on video images”. Also see, paragraphs [0063], 
--determining a plurality of features from the labeled training data (Vaughan: paragraph [0088], “determining a plurality of features and corresponding feature values”); […].
Beck may not explicitly teach (underlined below for clarity):
--receive, through the user interface, an indication to initiate a test and executing a test application until receiving an indication to stop the test,
Vaughan teaches a system to assess mental health of a user (Vaughan: paragraph [0079], “a system diagram for a digital personalized medicine platform 100 for providing diagnosis and therapy related to behavioral , neurological or mental health disorders”), in which
--receive, through the user interface, an indication to initiate a test and executing a test application until receiving an indication to stop the test (Vaughan: Figure 21, paragraph [0214], “one or more button allowing a user to access an incomplete evaluation… and the ability to finish that evaluation”. The Examiner interprets the “Cancel” button on figure 21, is an indication received by the user to the control system to stop the test),
One of ordinary skill in the art before the effective filing date would have found it obvious to include using an indication to stop the test as taught by Vaughan within the system for determining mental health as taught by Beck with the motivation of “improve both the accuracy and efficiency for diagnosis and treatment” (Vaughan: paragraph [0005]).
Beck and Vaughan may not explicitly teach (underlined below for clarity):
--displaying text on the display for the user to read aloud;
--displaying, on the display, a window displaying live video data recorded by the camera capturing the user while reading the statement aloud;
wherein the video features comprise facial expressions of the user while reading the statement aloud and 
--the audio features comprise sounds representing the voice of the user while the user read the statement aloud;
--the labeled training data comprising audio and video data recorded for each of the plurality of individuals while reading the statement aloud;
Rao teaches a machine learning system for processing audio and video data to produce a diagnostic prediction for neurological disorders (Rao: Figures 2-4, paragraphs [0031], [0034]), in which
--displaying text on the display for the user to read aloud (Rao: paragraph [0089], “the patient may be prompted to read a specific statement aloud to provide a standardized audio sample”, paragraph [0104], “focusing on speech patterns by having the user read a sentence displayed on the screen and recording the speech using the device's microphone”, paragraph [0153], “having the patient read from the list of words provided”. Also see, paragraph [0110]. The Examiner notes that “for the user to read aloud” is an intended use of the displayed that is not required to occur. This feature has been fully considered by the Examiner; however, the limitation does not provide patentable distinction over the cited prior art because it is an intended use or result of the displayed text. The Examiner notes a sentence is displayed for the user to read aloud, and be captured by the microphone);
--displaying, on the display, a window displaying live video data recorded by the camera capturing the user while reading the statement aloud (Rao: paragraph [0076], “the device used to collect the data will prompt the user or patient to perform the preferable tests. Such prompts may be made, by way of example… by providing a frame or other outline on a live video feed displayed on the device to indicate where the camera should be centered”, paragraph [0110], “Record close-up video (with audio) of the patient's face while they say a prompted sentence”);
--wherein the video features comprise facial expressions of the user while reading the statement aloud (Rao: paragraph [0049], “wherein said raw patient data comprises a video recording, wherein said video recording comprises… a recording of the patient's face while reading a prepared statement”, paragraphs [0088]-[0090], “Video analysis of the patient may include analysis of the patient's face and facial movements, mouth specific movements… initial processing may be done to accurately localize the body part and its sub components (e.g., the face and parts of the face such as eye and mouth locations). The localization may be used to constrain the region over which further processing and feature extraction is performed… record a depth value for every pixel in the image”, paragraph [0100], “extract measurements associated with some action (e.g., jaw displacement in tremor, finger tapping rate, repetitive speech rate, facial expression”. Also see, paragraphs [0108]-[0110]) and 
--the audio features comprise sounds representing the voice of the user while the user read the statement aloud (Rao: Figures 2-4, paragraph [0089], “the processing may involve detection of speech and other sounds, statistical analysis of the audio data, filtering of the signal for feature extraction. The raw audio data and or any derived features could then be provided as input to a recurrent neural network to perform further feature extraction.”, paragraph [0100], “extract measurements associated with some action (e.g., jaw displacement in tremor, finger tapping rate, repetitive speech rate”, paragraph [0104], “focusing on speech patterns by having the user read a sentence displayed on the screen and recording the speech using the device's microphone”. Also see, paragraphs [0044], [0108]-[0110]);
while reading the statement aloud (Rao: paragraphs [0101]-[0104], “train a plurality of machine learning systems to generate a number of classification models… to produce a predictive diagnostic model… trained by comparing the data from subjects which have been classified as possessing a certain neurological disorder to the data from subjects which have been classified as "healthy."”, paragraphs [0107]-[0110], “range of tests may be recorded using a video camera with a functional microphone. The procedure for recording these data should be consistent from one patient to the next. These video recordings will be used for training models to diagnose… Record close-up video (with audio) of the patient's face while they say a prompted sentence”. Also see, paragraph [0086]);
One of ordinary skill in the art before the effective filing date would have found it obvious to include capturing, analyzing and training a model using data while a user reads a statement aloud as taught by Rao within the video and audio analysis as taught by Beck and Vaughan with the motivation of “identify new clinical indicia of disease, or recognize previously unidentified combinations of symptoms that allow it to accurately diagnose a disorder” (Rao: paragraph [0031]).
Beck, Vaughan and Rao may not explicitly teach (underlined below for clarity):
--determining whether all of a plurality of pixels of the face are within a frame; and stopping the test if the face is outside the frame;
Liu teaches determining whether all of a plurality of pixels of the face are within a frame; and stopping the test if the face is outside the frame (Liu: paragraph [0009], “collecting information of a current video communication process includes, when detecting, by using a face tracking technology, that a face of a first participant is absent”);

Beck, Vaughan, Rao and Liu may not explicitly teach (underlined below for clarity):
--generating a plurality of subset machine learning models based on the plurality of features;
--evaluating a classification performance of the generated plurality of subset machine learning models; and
--selecting at least one of the subset machine learning models as the machine learning model.
Eberhardt teaches generating a plurality of subset machine learning models based on the plurality of features (Eberhardt: Figure 3, paragraphs [0039]-[0042], “training data is randomized and segmented in order to score feature selection… The training set is used to train a BBN classifier… a classifier is trained on each of the training sets created in the data preparation step”, claim 3, “building a list of a plurality of BBN model candidates based on the training set of claim data”. Also see, paragraph [0096]);
--evaluating a classification performance of the generated plurality of subset machine learning models (Eberhardt: Figure 3, paragraph [0034], “each of the BBN models is scored using one or more scoring methods, such as Minimum Description Length (MDL), also known as the Bayesian Information Criterion (BIC), as well as Bayesian Scoring (BDe)”, paragraph [0042], “a receiver operating characteristic (ROC) curve is plotted for each test exercise to calculate classification accuracy”. Also see, paragraph [0005], claim 3); and
selecting at least one of the subset machine learning models as the machine learning model (Eberhardt: Figure 3, paragraph [0034], “a BBN model having the highest score among all is selected as the final candidate from the BBN candidates”. Also see, claim 3).
One of ordinary skill in the art before the effective filing date would have found it obvious to include training a plurality of models and scoring each to select one as taught by Eberhardt within the training and use of a machine learning model as taught by Beck, Vaughan, Rao and Liu with the motivation of “improve patient screening, risk stratification, and therapy selection to address current shortcomings” (Eberhardt: paragraph [0055]).

Regarding (Previously Presented) claim 13, Beck, Vaughan, Rao, Liu and Eberhardt teaches the limitations of claim 10, and further teaches wherein processing the set of test audio data and set of test video data further comprises: preprocessing the set of test audio data and the test video data to identify overlapping time windows; and outputting a set of integrated audio and video segments based on the identified overlapping time windows (Beck: paragraph [0025], “a fusion algorithm that combines input from sensors (including a camera, pressure sensor and speech sensor) to evaluate and/or predict the mental condition of a person”, paragraph [0031], “an algorithm that synchronizes sensor input”. Also see, paragraph [0104]; Vaughan: paragraphs [0113]-[0114], [0148]. The Examiner notes fusion and synchronization of the sensor data (audio and video data) reads on identification of the overlapping time windows and integrating into a data set to be used for analysis (i.e., the integrated set)).
The motivation to combine is the same as in claim 10, incorporated herein.

Regarding (Original) claim 14, Beck, Vaughan, Rao, Liu and Eberhardt teaches the limitations of claim 13, and further teaches wherein the machine learning model is at least one of: a generalized linear model, a regression model, a logistical regression model, and a supervised machine learning classification model (Beck: Figure 2, element 375. Also see, Vaughan: paragraph [0120]. The Examiner notes SVM and Neural networks are supervised learning classification models).
The motivation to combine is the same as in claim 10, incorporated herein.

Response to Arguments
Applicant’s arguments filed on 09 September 2021 have been fully considered but they are not persuasive. Applicant’s arguments will be addressed below in the order in which they appear in the response filed on 09 September 2021.

Rejection under 35 U.S.C. § 103
Regarding the rejections of claims 1-6, 8-10, 13-14, 16-17, 19 and 23-30, the Examiner has considered the applicant’s arguments; however, the arguments are not persuasive as addressed herein. The Examiner has attempted to address all of the arguments presented by the Applicant; however, any arguments inadvertently not addressed are not persuasive for at least the following reasons:
Applicant argues:
Applicant respectfully submits that the claim element "for the user to read aloud" is not an intended use, because the reading aloud that set of text is later used (e.g., recorded) by the camera and the microphone. Therefore, the Examiner must consider this feature.


	It is respectfully submitted the displaying of text “for the user to read aloud” is an intended use of the display of text that is not required to occur. Nevertheless, the Examiner notes Rao (see above, but at least paragraph [0089]), explicitly recites displaying text for the user to read aloud. Therefore, regardless of the requirement of the “for the user to read aloud” statement, Rao teaches what is required.

Applicant further argues:
Applicant respectfully submits that no prior art technology exists that can screen patients by inputting audio and video of a patient talking into a machine learning classifier… Rao at paragraph [0101] teaches away from the claimed technology, by stating that it is preferential to focus on one aspect of the data and train an initial machine learning diagnostic model that outputs a preliminary diagnosis before combining them… Specifically, Rao does not disclose taking both the audio and video data from reading the same statement aloud, and then combining those into a machine learning model. Rather, Rao trains classifiers for each separate modality, and then aggregates them. The independent claims specify that the plurality of features are extracted from both the audio data and the video data generated while reading the same statement aloud, and a plurality of subset machine learning models is then generated based on the plurality of features… Applicant notes that example in Rao cited by the Office Action at paragraph [0110] is offered as another alternative speech analysis. It does not teach that the audio data and video data are being labeled as training data together over the user reading the same statement aloud. Further, this claim element builds from the previous elements of processing the audio data and video data, then feeding them together to the training. Rao does not make any distinction on how the dataset are trained together.

The Examiner respectfully disagrees.
	It is respectfully submitted, both Beck and Rao teach processing both video and audio data (see above, but at least Beck: Figure 2; Rao: Figure 2). One of ordinary skill in the art would understand that paragraph [0110] of Rao, teaches training a system to use video and audio data while the user is reading the statement aloud to produce a result. The claims as drafted, are taught and mapped above and show each and every limitation is taught by the combination of Beck and Rao, using audio and video data to produce a result, while Rao explicitly shows use of audio and video data while reading the statement aloud to produce a result. Therefore, under the broadest reasonable interpretation, the references teach what is required.

Applicant further argues:
Applicant submits that Eberhardt does not disclose "generating a plurality of subset
machine learning models based on the plurality of features" and "evaluating a classification performance of the generated plurality of subset machine learning models."… In contrast with the interpretation of the claim language in view of the specification, Eberhardt does not generate a plurality of subset machine learning models based on the plurality of features. Rather, as the Office Action correctly quotes at page 11, Eberhardt merely discloses "training data is randomized and segmented in order to score feature selection" such that "a classifier is trained on each of the training sets created in the data preparation step." In other words, Eberhardt does not generate subsets of machine learning models based on the plurality of features, which includes features extracted from multimodal data collected over reading the same specific statement(s).

The Examiner respectfully disagrees.
	It is respectfully submitted, that the claim only requires generation of a plurality of subset machine learning models based on the plurality of features, the claim does not limit how the features are used, only that the models are based on the features. Eberhard teaches generation of models based on a plurality of features, and would be obvious to include using the features determined in Beck and Rao within Eberhardt with the motivation of “improve patient screening, risk stratification, and therapy selection to address current shortcomings” (Eberhardt: paragraph [0055]). The Examiner suggests explicitly claiming how the features are used, instead of using broad language, because as currently drafted the claim is taught by the use of the plurality of features in Eberhardt. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew E Lee whose telephone number is (571)272-8323.  The examiner can normally be reached on M-Th 9-5:00 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Robert Morgan can be reached on (571) 272-6773.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/A.E.L./Examiner, Art Unit 3626      

/ROBERT W MORGAN/Supervisory Patent Examiner, Art Unit 3626