DETAILED ACTION

Examiner’s comments: regarding to the priority date, after reviewing the subject matters of this application, and particularly, in view of the amended claims of 8/21/2020, it is reasonable to claim the earliest priority date of provisional application 61/978,632, filed April 11, 2014; but there is not support to claim the priority date application 14/206,109, filed March 12, 2014, and neither of any provisional applications 61/813,295, filed April 18, 2013; 61/832,715, filed June 7, 2013; 61/836,560, filed June 18, 2013; and 61/872,494, filed August 30, 2013. Applicant need to file proper term disclaimer.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 57, 71-85 are rejected under 35 U.S.C. 103 as being unpatentable over Patel (US 20150126888 A1, which claims priority date of Provisional Application  No. 61/645,176, May 10, 2012), in view of Higgins (US 20150248608 A1, which claims priority date of Provisional Application  No. 61/945874 Feb 28, 2014)


Higgins teaches a deep learning feedforward neural network to process audio feature data (see Higgins: e.g., -- any features used in scoring are engineered (i.e., chosen, designed) by the computer-based system (i.e., and not by a human), based on directly-observable elements of constructed responses (e.g., words, characters, sequences of characters, acoustic frames, etc.)…. utilize a "deep learning" technique that obviates the need for heuristic encoding of an input feature space and allows more appropriate and general features to be induced by the computer-based system. …. FIG. 1, a convolutional neural network model utilized by the scoring engine 118 may include a convolutional neural network that is configured to receive the numerical vectors 108A, 108B, 108C and to determine the score 118 for the constructed response 102.  A convolutional neural network includes multiple nodes organized in layers. --, in [0035]-[0037], and [0085]; also see: -- As shown in FIG. 4, the structure of the example convolutional neural network 400 includes a number of successive layers 406, 408, 410, 412, 414, 416.  In an example, a layer feeds activation forward to successive layers by (i) multiplying activation values of a vector at a layer n of the network 400 by values of a weight matrix, and (ii) transforming the resultant values by a sigmoidal function.--, in [0061]);
Patel and Higgins are combinable as they are in the same field of endeavor: using machine leaning in analysis of audio, caustic features. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Patel’s system using Higgins’s teachings by 
Patel as modified by Higgins further disclose said deep learning feedforward neural network comprising plural alternating convolutional layers and max-pooling layers that produce plural first outputs (see Higgins: e.g., in [0048]-[0050]), said first outputs being coupled to a plurality of artificial neurons, each of said plurality of artificial neurons being coupled to plural of said first outputs and producing a second output that is a weighted sum of its inputs (see Higgins: e.g., in [0048]-[0050], and, --the convolution layer 216 is applied to the plurality of first vectors 220 using the input layer 214, which passes the first vectors 220 to the convolution layer 216 via a plurality of first connections.  Each of the first connections has an associated weight…..Such weights may be used in computing outputs (e.g., activations) of the hidden layer 218.  Examples of functions that may be used by the hidden layer 218 include maximum, sum, and average functions.  In an example, a function used by the hidden layer 218 may be any function that can compute a single value from multiple values. --, in [0052]), the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data (see Higgins: e.g., -- Each of the second plurality of connections that connect the hidden layer to the convolution layer has an associated second weight.  In an example, at least a subset of the second plurality of connections have a same second weight…. Each node of the hidden layer generates an output based on (i) a weighted summation of received inputs, and (ii) an activation function.  Specifically, a node of the hidden layer receives inputs from nodes of the convolution layer of the model via one or more weighted connections.--, in [0040]-[0045]).

Re Claim 71, Patel as modified by Higgins further disclose said microphone-sensed audio, and said reference audio training data, comprise air flow noises from the group: crackles, rales, rhonchi, wheezes, coughs, and snoring (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 72, Patel as modified by Higgins further disclose said microphone-sensed audio, and said reference audio training data, comprise words spoken by individuals (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 73, Patel as modified by Higgins further disclose said reference audio training data was derived from audio previously sensed from said first user (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 74, Patel as modified by Higgins further disclose said reference audio training data was derived from audio previously sensed from plural other individuals (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427).--, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

	Re Claim 75, Patel as modified by Higgins further disclose said reference audio training data was derived from audio previously sensed from plural other individuals who are of the said gender as said first user and who have body weights within a predefined threshold weight as said first user (see Patel: e.g., -- Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning). [0104] Lung Function Regression: The feature extraction creates a number of features at 15 ms time steps that approximate flow rate over time.  Referring to FIG. 8, by treating each feature as a flow rate, regression can be used to directly yield PEF, FEV.sub.1, and FVC.  For example, PEF is defined as the maximum flow reached in a single effort.  Thus, for a curve in a given fold, the maximum of each feature can be used to regress to the PEF.  Integrating flow with respect to time gives the volume of air exhaled.  Thus for FVC, the integration of each feature is accomplished.  For FEV.sub.1, the integration of the features during the first second is used.  These new features can be combined with other demographic features such as gender and age, weight, and height range.  These features can be quantized to discrete ranges to prevent overfitting.--, in [0103]-[0104]).

Re Claim 76, Patel as modified by Higgins further disclose said reference audio training data was derived from audio previously sensed from plural other individuals who are of the said gender as said first user and who have heights within a predefined height of said first user (see Patel: e.g., -- Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning). [0104] Lung Function Regression: The feature extraction creates a number of features at 15 ms time steps that approximate flow rate over time.  Referring to FIG. 8, by treating each feature as a flow rate, regression can be used to directly yield PEF, FEV.sub.1, and FVC.  For example, PEF is defined as the maximum flow reached in a single effort.  Thus, for a curve in a given fold, the maximum of each feature can be used to regress to the PEF.  Integrating flow with respect to time gives the volume of air exhaled.  Thus for FVC, the integration of each feature is accomplished.  For FEV.sub.1, the integration of the features during the first second is used.  These new features can be combined with other demographic features such as gender and age, weight, and height range.  These features can be quantized to discrete ranges to prevent overfitting.--, in [0103]-[0104]).

Re Claim 77, Patel discloses a system comprising a processor, a memory, a display, and one or more microphones, said system including a portion adapted for wearing or carrying by a first user and being characterized by instructions in the memory to implement a machine learning and neural network to process audio feature data derived from first user speech captured by the microphone(s) to determine health information about the first user (see Patel: e.g., Fig. 10, and Fig. 12, and, --Devices, system, and methods generate expiratory flow-based pulmonary function data by processing a digital audio file of sound of a subject's forced expiratory maneuver.  A mobile device configured to generate expiratory flow-based pulmonary function data includes a microphone, a processor, and a data storage device.  The microphone is operable to convert sound of the subject's forced expiratory maneuver into a digital data file.  The processor is operatively coupled with the microphone.  The data storage device is operatively coupled with the processor and stores instructions that, when executed by the processor, cause the processor to process the digital data file to generate expiratory flow-based pulmonary function data for assessing pulmonary function of the subject.  The sound of the subject's forced expiratory maneuver can be converted into the digital data file--, in abstract, and [0025], [0038], and [0065]; and see Patel: e.g., Fig. 8, and -- past features of different patients can be saved so that new curves can be compared to their empirical average for a given feature.  Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning).--, in [0022]-[0023], [0085], and [0101]-[0106]; and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096]),  
Patel however does not explicitly disclose above machine learning and neural network is a deep learning feedforward neural network,
Higgins teaches a deep learning feedforward neural network to process audio feature data (see Higgins: e.g., -- any features used in scoring are engineered (i.e., chosen, designed) by the computer-based system (i.e., and not by a human), based on directly-observable elements of constructed responses (e.g., words, characters, sequences of characters, acoustic frames, etc.)…. utilize a "deep learning" technique that obviates the need for heuristic encoding of an input feature space and allows more appropriate and general features to be induced by the computer-based system. …. FIG. 1, a convolutional neural network model utilized by the scoring engine 118 may include a convolutional neural network that is configured to receive the numerical vectors 108A, 108B, 108C and to determine the score 118 for the constructed response 102.  A convolutional neural network includes multiple nodes organized in layers. --, in [0035]-[0037], and [0085]; also see: -- As shown in FIG. 4, the structure of the example convolutional neural network 400 includes a number of successive layers 406, 408, 410, 412, 414, 416.  In an example, a layer feeds activation forward to successive layers by (i) multiplying activation values of a vector at a layer n of the network 400 by values of a weight matrix, and (ii) transforming the resultant values by a sigmoidal function.--, in [0061]);
Patel and Higgins are combinable as they are in the same field of endeavor: using machine leaning in analysis of audio, caustic features. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Patel’s system using Higgins’s teachings by including a deep learning feedforward neural network to process audio feature data to Patel’s machine learning and audio feature data classification in order to process the audio feature data (see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]),
Patel as modified by Higgins further disclose said deep learning feedforward neural network comprising plural alternating convolutional layers and max-pooling layers that produce plural first outputs (see Higgins: e.g., in [0048]-[0050]), said first outputs being coupled to a plurality of artificial neurons, each of said plurality of artificial neurons being coupled to plural of said first outputs and producing a second output that is a weighted sum of its inputs (see Higgins: e.g., in [0048]-[0050], and, --the convolution layer 216 is applied to the plurality of first vectors 220 using the input layer 214, which passes the first vectors 220 to the convolution layer 216 via a plurality of first connections.  Each of the first connections has an associated weight…..Such weights may be used in computing outputs (e.g., activations) of the hidden layer 218.  Examples of functions that may be used by the hidden layer 218 include maximum, sum, and average functions.  In an example, a function used by the hidden layer 218 may be any function that can compute a single value from multiple values. --, in [0052]), the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio feature training data (see Higgins: e.g., -- Each of the second plurality of connections that connect the hidden layer to the convolution layer has an associated second weight.  In an example, at least a subset of the second plurality of connections have a same second weight…. Each node of the hidden layer generates an output based on (i) a weighted summation of received inputs, and (ii) an activation function.  Specifically, a node of the hidden layer receives inputs from nodes of the convolution layer of the model via one or more weighted connections.--, in [0040]-[0045]).

Re Claim 78, Patel as modified by Higgins further disclose said reference audio feature training data was derived from audio previously sensed from said first user (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 79, Patel as modified by Higgins further disclose said reference audio feature training data was derived from audio previously sensed from plural other individuals (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427).--, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 80, Patel as modified by Higgins further disclose the deep learning feedforward neural network was previously trained to recognize possible depression of the first user from said reference audio feature training data (see Patel: e.g., --A number of researchers have also evaluated how multiple sensors could be connected to a smartphone via an external board to collect physiological information (see, e.g., Brunette, W., Sodt, R., Chaudhri, R., et al. The Open Data Kit Sensors Framework: Application-Level Sensor Drivers for Android.  MobiSys, (2012); also see, e.g., Majchrzak, T. and Chakravorty, A. Improving the Compliance of Transplantation Medicine Patients with an Integrated Mobile System.--, in [0019]; and, -- Respiration monitoring during sleeping.  ISABEL '11, (2011)).  A few systems have leveraged simple, low-cost microphones to analyze signals, such as heart rate and cough.  Many systems exist that extract heart rate using a mobile phone--, in [0022]).

Re Claim 81, Patel as modified by Higgins further disclose the deep learning feedforward neural network was previously trained to recognize a possible stroke of the first user from said reference audio feature training data (see Patel: e.g., --A number of researchers have also evaluated how multiple sensors could be connected to a smartphone via an external board to collect physiological information (see, e.g., Brunette, W., Sodt, R., Chaudhri, R., et al. The Open Data Kit Sensors Framework: Application-Level Sensor Drivers for Android.  MobiSys, (2012); also see, e.g., Majchrzak, T. and Chakravorty, A. Improving the Compliance of Transplantation Medicine Patients with an Integrated Mobile System.--, in [0019]; and, -- Respiration monitoring during sleeping.  ISABEL '11, (2011)).  A few systems have leveraged simple, low-cost microphones to analyze signals, such as heart rate and cough.  Many systems exist that extract heart rate using a mobile phone--, in [0022]).

Re Claim 82, Patel discloses a method involving a machine learning and neural network (see Patel: e.g., Fig. 10, and Fig. 12, and, --Devices, system, and methods generate expiratory flow-based pulmonary function data by processing a digital audio file of sound of a subject's forced expiratory maneuver.  A mobile device configured to generate expiratory flow-based pulmonary function data includes a microphone, a processor, and a data storage device.  The microphone is operable to convert sound of the subject's forced expiratory maneuver into a digital data file.  The processor is operatively coupled with the microphone.  The data storage device is operatively coupled with the processor and stores instructions that, when executed by the processor, cause the processor to process the digital data file to generate expiratory flow-based pulmonary function data for assessing pulmonary function of the subject.  The sound of the subject's forced expiratory maneuver can be converted into the digital data file--, in abstract, and [0025], [0038], and [0065]; and see Patel: e.g., Fig. 8, and -- past features of different patients can be saved so that new curves can be compared to their empirical average for a given feature.  Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning).--, in [0022]-[0023], [0085], and [0101]-[0106]);
Patel however does not explicitly disclose above machine learning and neural network is a deep learning feedforward neural network,
Higgins teaches a deep learning feedforward neural network to process audio feature data (see Higgins: e.g., -- any features used in scoring are engineered (i.e., chosen, designed) by the computer-based system (i.e., and not by a human), based on directly-observable elements of constructed responses (e.g., words, characters, sequences of characters, acoustic frames, etc.)…. utilize a "deep learning" technique that obviates the need for heuristic encoding of an input feature space and allows more appropriate and general features to be induced by the computer-based system. …. FIG. 1, a convolutional neural network model utilized by the scoring engine 118 may include a convolutional neural network that is configured to receive the numerical vectors 108A, 108B, 108C and to determine the score 118 for the constructed response 102.  A convolutional neural network includes multiple nodes organized in layers. --, in [0035]-[0037], and [0085]; also see: -- As shown in FIG. 4, the structure of the example convolutional neural network 400 includes a number of successive layers 406, 408, 410, 412, 414, 416.  In an example, a layer feeds activation forward to successive layers by (i) multiplying activation values of a vector at a layer n of the network 400 by values of a weight matrix, and (ii) transforming the resultant values by a sigmoidal function.--, in [0061]);
Patel and Higgins are combinable as they are in the same field of endeavor: using machine leaning in analysis of audio, caustic features. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Patel’s system using Higgins’s teachings by including a deep learning feedforward neural network to process audio feature data to Patel’s machine learning and audio feature data classification in order to process the audio feature data (see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]),
Patel as modified by Higgins further disclose said deep learning feedforward neural network comprising plural alternating convolutional layers and max-pooling layers that produce plural first outputs, said first outputs being coupled to a plurality of artificial neurons, each of said plurality of artificial neurons being coupled to plural of said first outputs and producing a second output that is a weighted sum of its inputs (see Higgins: .e.g., in [0040]-[0045], and [0048]-[0052]), the method comprising the acts:
deriving reference features from speech captured from one or more individuals (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]);
associating labels or tags with said reference features, said labels indicating physiologic interpretations of said speech captured from said one or more individuals see Patel: e.g.,  -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]; and, -- Probabilistic Models for Segmenting and Labeling Sequence Data.  Proc.  Int.  Conf on Machine Learning, (2001)) and a bagged decision tree regression can be employed.--, in [0106];  also see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]; and, -- Specifically, applying the network 400 to predict a score for a constructed response does not involve the extraction of human-engineered features from the constructed response.  Instead, during the supervised training step, the convolutional neural network 400 itself identifies important characteristics of human-scored reference responses that are related to the classifications or scores assigned to the reference responses by human graders.--, in [0076]); and
using said reference features and associated labels to train said deep learning feedforward neural network (see Higgins: e.g., -- Specifically, applying the network 400 to predict a score for a constructed response does not involve the extraction of human-engineered features from the constructed response.  Instead, during the supervised training step, the convolutional neural network 400 itself identifies important characteristics of human-scored reference responses that are related to the classifications or scores assigned to the reference responses by human graders.--, in [0076]);
wherein the network is thereby trained to provide a physiologic interpretation of speech from a user (see Patel: e.g., Fig. 10, and Fig. 12, and, --Devices, system, and methods generate expiratory flow-based pulmonary function data by processing a digital audio file of sound of a subject's forced expiratory maneuver.  A mobile device configured to generate expiratory flow-based pulmonary function data includes a microphone, a processor, and a data storage device.  The microphone is operable to convert sound of the subject's forced expiratory maneuver into a digital data file.  The processor is operatively coupled with the microphone.  The data storage device is operatively coupled with the processor and stores instructions that, when executed by the processor, cause the processor to process the digital data file to generate expiratory flow-based pulmonary function data for assessing pulmonary function of the subject.  The sound of the subject's forced expiratory maneuver can be converted into the digital data file--, in abstract, and [0025], [0038], and [0065]; and see Patel: e.g., Fig. 8, and -- past features of different patients can be saved so that new curves can be compared to their empirical average for a given feature.  Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning).--, in [0022]-[0023], [0085], and [0101]-[0106]; --A number of researchers have also evaluated how multiple sensors could be connected to a smartphone via an external board to collect physiological information (see, e.g., Brunette, W., Sodt, R., Chaudhri, R., et al. The Open Data Kit Sensors Framework: Application-Level Sensor Drivers for Android.  MobiSys, (2012); also see, e.g., Majchrzak, T. and Chakravorty, A. Improving the Compliance of Transplantation Medicine Patients with an Integrated Mobile System.--, in [0019]; and, -- Respiration monitoring during sleeping.  ISABEL '11, (2011)).  A few systems have leveraged simple, low-cost microphones to analyze signals, such as heart rate and cough.  Many systems exist that extract heart rate using a mobile phone--, in [0022]).

Re Claim 83, Patel as modified by Higgins further disclose said network is trained with reference features derived from speech captured from plural individuals having a gender matching that of said user, and having weights within a threshold weight of said user (see Higgins: e.g., in [0048]-[0050], and, --the convolution layer 216 is applied to the plurality of first vectors 220 using the input layer 214, which passes the first vectors 220 to the convolution layer 216 via a plurality of first connections.  Each of the first connections has an associated weight…..Such weights may be used in computing outputs (e.g., activations) of the hidden layer 218.  Examples of functions that may be used by the hidden layer 218 include maximum, sum, and average functions.  In an example, a function used by the hidden layer 218 may be any function that can compute a single value from multiple values. --, in [0052]), the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data (see Higgins: e.g., -- Each of the second plurality of connections that connect the hidden layer to the convolution layer has an associated second weight.  In an example, at least a subset of the second plurality of connections have a same second weight…. Each node of the hidden layer generates an output based on (i) a weighted summation of received inputs, and (ii) an activation function.  Specifically, a node of the hidden layer receives inputs from nodes of the convolution layer of the model via one or more weighted connections.--, in [0040]-[0045]).

Re Claim 84, Patel as modified by Higgins further disclose said network is trained with reference features derived from speech captured from plural individuals having a gender matching that of said user, and having heights within a threshold height of said user (see Patel: e.g., -- Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning). [0104] Lung Function Regression: The feature extraction creates a number of features at 15 ms time steps that approximate flow rate over time.  Referring to FIG. 8, by treating each feature as a flow rate, regression can be used to directly yield PEF, FEV.sub.1, and FVC.  For example, PEF is defined as the maximum flow reached in a single effort.  Thus, for a curve in a given fold, the maximum of each feature can be used to regress to the PEF.  Integrating flow with respect to time gives the volume of air exhaled.  Thus for FVC, the integration of each feature is accomplished.  For FEV.sub.1, the integration of the features during the first second is used.  These new features can be combined with other demographic features such as gender and age, weight, and height range.  These features can be quantized to discrete ranges to prevent overfitting.--, in [0103]-[0104]).

Re Claim 85, Patel as modified by Higgins further said one or more individuals comprise said first individual, wherein the network is thereby trained to provide a physiologic assessment of said user from reference features earlier derived from speech of said user (see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]; and, -- Specifically, applying the network 400 to predict a score for a constructed response does not involve the extraction of human-engineered features from the constructed response.  Instead, during the supervised training step, the convolutional neural network 400 itself identifies important characteristics of human-scored reference responses that are related to the classifications or scores assigned to the reference responses by human graders.--, in [0076]).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEI WEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on 8:00 - 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/WEI WEN YANG/Primary Examiner, Art Unit 2667