DETAILED ACTION 
Response to Arguments
The amendments filed 6/7/2021 have been entered and made of record. 

Applicant's amendments and arguments filed 6/7/2021 have been fully considered but they are not persuasive:
In Applicant Arguments/Remarks page 5, Applicant states that Higgins cannot be used for prior art teachings, which were not taught in the Provisional Application  No. 61/945874 Feb 28, 2014.
However, the Examiner disagrees because:
in the previous Office Action (Non-Final OA of 3/10/2021), Higgins’s disclosures are used in:  
Higgins teaches a deep learning feedforward neural network to process audio feature data (see Higgins: e.g., -- any features used in scoring are engineered (i.e., chosen, designed) by the computer-based system (i.e., and not by a human), based on directly-observable elements of constructed responses (e.g., words, characters, sequences of characters, acoustic frames, etc.)…. utilize a "deep learning" technique that obviates the need for heuristic encoding of an input feature space and allows more appropriate and general features to be induced by the computer-based system. …. {the exact same sentences/disclosures can be found in lines 14-16, page 1, in the Specification of Provisional Application  No. 61/945874 Feb 28, 2014}, in FIG. 1, a convolutional neural network model utilized by the scoring engine 118 may include a convolutional neural network that is configured to receive the a layer feeds activation forward to successive layers by (i) multiplying activation values of a vector at a layer n of the network 400 by values of a weight matrix, and (ii) transforming the resultant values by a sigmoidal function.--, in [0061] {the exact same sentences/disclosures can be found in diagram and last paragraph, page 2, in the Specification of Provisional Application  No. 61/945874 Feb 28, 2014}),

Above teaching are clearly disclosed in the in the Specification of Provisional Application  No. 61/945874 Feb 28, 2014 (a copy of 61/945874 Feb 28, 2014 is provided with this Office Action, for Applicant to review). It is proper to apply Higgins’s teaching from the Provisional Application  No. 61/945874 Feb 28, 2014 as the prior art teachings in the Office Action, because 61/945874 Feb 28, 2014, is before April 11, 2014.

In Applicant Arguments/Remarks page 6, Applicant states that cited references, Patel as modified by Higgins do not teach “...the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data”
However, the Examiner disagrees because:
render this Application 16/289,167 fail to claim the priority as a division of application 15/362,446 filed on November 28, 2016. Because, the subject matters of this limitation of  “...the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data” was first time disclosed in the Specification of 8/21/2020,  but has never been taught or disclosed in any corresponding previous documents in the case before 8/21/2020. Therefore, if Applicant would further consider this particular limitation might be inventive, then it is required to change or remove the priority claim of this Application. This would allow Examiner to search and consider possible references after April 11. 24, and before 8/21/2020 as the effective filing date for the prior art teachings.

During the examination and considerations, and in the previous Office Action (Non-Final OA of 3/10/2021), the Examiner has considered above limitation of  “...the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data” as a general description of the training process using reference audio training data, and the weights used in producing said second outputs, and 
Patel as modified by Higgins clearly teaches the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data (see Higgins: e.g., -- Each of the second plurality of connections that connect the hidden layer to the convolution layer has an associated second weight.  In an example, at least a subset of the second plurality of connections have a same second weight…. Each node of the hidden layer generates an output based on (i) a weighted summation of received inputs, and (ii) an activation function.  Specifically, a node of the hidden layer receives inputs from nodes of the convolution layer of the model via one or more weighted connections.--, in [0040]-[0045]; also see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]; --The plurality of numerical vectors for the un-scored response are used to train a denoising auto-encoder or restricted Boltzmann machine (RBM) in order to estimate values of the weights for the connections between the input layer 406 and the first hidden layer 408.  Such weights should be suitable for encoding the regularities in the co-occurrence of directly-observable elements in the un-scored response.  The weights between the input layer 406 and the first hidden layer 408 are shared across all sets of nodes (i.e., weights between the sets of nodes 407A and 409A of the input and first hidden layers 406, 408, respectively, are set to be the same as the weights between the sets of nodes 407B and 409B, and so on).--, in [0072] {the relevant disclosures can be found in last paragraph, page 3 through page4} in the Specification of Provisional Application No. 61/945874 Feb 28, 2014}; and, -- Specifically, applying the network 400 to predict a score for a constructed response does not involve the extraction of human-engineered features from the constructed response.  Instead, during the supervised training step,the convolutional neural network 400 itself identifies important characteristics of human-scored reference responses that are related to the classifications or scores assigned to the reference responses by human graders.--, in [0076]).
Therefore, amended claims 57, 71-81, and 86-87 are still not patentably distinguishable over the prior art reference(s). Further discussions are addressed in the prior art rejection section below.


Examiner’s comments: regarding to the priority date, after further reviewing the subject matters of this application, and the filing history and the corresponding previous documents in the case, limitation “...the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data” in the independent claim 57, and claim 77 of the version of 8/21/2020, the same as in the version of 6/7/2021, would render this Application 16/289,167 fail to claim the priority as a division of application 15/362,446 filed on November 28, 2016. Because, the subject matters of this limitation of  “...the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data” was first time disclosed in the Specification of 8/21/2020,  but has never been taught or disclosed in any corresponding previous documents in the case before 8/21/2020. Therefore, if Applicant would further consider this particular limitation might be inventive, then it is required to change or remove the priority claim of this Application. This would allow Examiner to search and consider references after April 11. 24 but before 8/21/2020 as the effective filing date, for the prior art teachings.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 57, 71-81, and 86-87 are rejected under 35 U.S.C. 103 as being unpatentable over Patel (US 20150126888 A1, which claims priority date of Provisional Application  No. 61/645,176, May 10, 2012), in view of Higgins (US 20150248608 A1, which claims priority date of Provisional Application  No. 61/945874 Feb 28, 2014)
Re Claim 57, Patel discloses a system comprising a processor, a memory, a display, and one or more microphones, said system including a portion adapted for wearing or carrying by a first user and being characterized by instructions in the memory to implement a machine learning and neural network to process audio feature data derived from the microphone(s) to determine health information about the first user (see Patel: e.g., Fig. 10, and Fig. 12, and, --Devices, system, and methods generate expiratory flow-based pulmonary function data by processing a digital audio file of sound of a subject's forced expiratory maneuver.  A mobile device configured to generate expiratory flow-based pulmonary function data includes a microphone, a processor, and a data storage device.  The microphone is operable to convert sound of the subject's forced expiratory maneuver into a digital data file.  The processor is operatively coupled with the microphone.  The data storage device is operatively coupled with the processor and stores instructions that, when executed by the processor, cause the processor to process the digital data file to generate expiratory flow-based pulmonary function data for assessing pulmonary function of the subject.  The sound of the subject's forced expiratory maneuver can be converted into the digital data file--, in abstract, and [0025], [0038], and [0065]; and see Patel: e.g., Fig. 8, and -- past features of different patients can be saved so that new curves can be compared to their empirical average for a given feature.  Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning).--, in [0022]-[0023], [0085], and [0101]-[0106]),  
Patel however does not explicitly disclose above machine learning and neural network is a deep learning feedforward neural network,
Higgins teaches a deep learning feedforward neural network to process audio feature data (see Higgins: e.g., -- any features used in scoring are engineered (i.e., chosen, designed) by the computer-based system (i.e., and not by a human), based on directly-observable elements of constructed responses (e.g., words, characters, sequences of characters, acoustic frames, etc.)…. utilize a "deep learning" technique that obviates the need for heuristic encoding of an input feature space and allows more appropriate and general features to be induced by the computer-based system. …. FIG. 1, a convolutional neural network model utilized by the scoring engine 118 may include a convolutional neural network that is configured to receive the numerical vectors 108A, 108B, 108C and to determine the score 118 for the constructed response 102.  A convolutional neural network includes multiple nodes organized in layers. --, in [0035]-[0037], and [0085]; also see: -- As shown in FIG. 4, the structure of the example convolutional neural network 400 includes a number of successive layers 406, 408, 410, 412, 414, 416.  In an example, a layer feeds activation forward to successive layers by (i) multiplying activation values of a vector at a layer n of the network 400 by values of a weight matrix, and (ii) transforming the resultant values by a sigmoidal function.--, in [0061]);
Patel and Higgins are combinable as they are in the same field of endeavor: using machine leaning in analysis of audio, caustic features. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Patel’s system using Higgins’s teachings by including a deep learning feedforward neural network to process audio feature data to Patel’s machine learning and audio feature data classification in order to process the audio feature data (see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]),
Patel as modified by Higgins further disclose said deep learning feedforward neural network comprising plural alternating convolutional layers and max-pooling layers that produce plural first outputs (see Higgins: e.g., in [0048]-[0050]), said first outputs being coupled to a plurality of artificial neurons, each of said plurality of artificial neurons being coupled to plural of said first outputs and producing a second output that is a weighted sum of its inputs (see Higgins: e.g., in [0048]-[0050], and, --the convolution layer 216 is applied to the plurality of first vectors 220 using the input layer 214, which passes the first vectors 220 to the convolution layer 216 via a plurality of first connections.  Each of the first connections has an associated weight…..Such weights may be used in computing outputs (e.g., activations) of the hidden layer 218.  Examples of functions that may be used by the hidden layer 218 include maximum, sum, and average functions.  In an example, a function used by the hidden layer 218 may be any function that can compute a single value from multiple values. --, in [0052]), 
the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio training data (see Higgins: e.g., -- Each of the second plurality of connections that connect the hidden layer to the convolution layer has an associated second weight.  In an example, at least a subset of the second plurality of connections have a same second weight…. Each node of the hidden layer generates an output based on (i) a weighted summation of received inputs, and (ii) an activation function.  Specifically, a node of the hidden layer receives inputs from nodes of the convolution layer of the model via one or more weighted connections.--, in [0040]-[0045]; also see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]; --The plurality of numerical vectors for the un-scored response are used to train a denoising auto-encoder or restricted Boltzmann machine (RBM) in order to estimate values of the weights for the connections between the input layer 406 and the first hidden layer 408.  Such weights should be suitable for encoding the regularities in the co-occurrence of directly-observable elements in the un-scored response.  The weights between the input layer 406 and the first hidden layer 408 are shared across all sets of nodes (i.e., weights between the sets of nodes 407A and 409A of the input and first hidden layers 406, 408, respectively, are set to be the same as the weights between the sets of nodes 407B and 409B, and so on).--, in [0072] {the relevant disclosures can be found in last paragraph, page 3 through page4} in the Specification of Provisional Application No. 61/945874 Feb 28, 2014}).

Re Claim 71, Patel as modified by Higgins further disclose said microphone-sensed audio, and said reference audio training data, comprise air flow noises from the group: crackles, rales, rhonchi, wheezes, coughs, and snoring (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 72, Patel as modified by Higgins further disclose said microphone-sensed audio, and said reference audio training data, comprise words spoken by individuals (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 73, Patel as modified by Higgins further disclose said reference audio training data was derived from audio previously sensed from said first user (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 74, Patel as modified by Higgins further disclose said reference audio training data was derived from audio previously sensed from plural other individuals (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427).--, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

	Re Claim 75, Patel as modified by Higgins further disclose said reference audio training data was derived from audio previously sensed from plural other individuals who are of the said gender as said first user and who have body weights within a predefined threshold weight as said first user (see Patel: e.g., -- Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning). [0104] Lung Function Regression: The feature extraction creates a number of features at 15 ms time steps that approximate flow rate over time.  Referring to FIG. 8, by treating each feature as a flow rate, regression can be used to directly yield PEF, FEV.sub.1, and FVC.  For example, PEF is defined as the maximum flow reached in a single effort.  Thus, for a curve in a given fold, the maximum of each feature can be used to regress to the PEF.  Integrating flow with respect to time gives the volume of air exhaled.  Thus for FVC, the integration of each feature is accomplished.  For FEV.sub.1, the integration of the features during the first second is used.  These new features can be combined with other demographic features such as gender and age, weight, and height range.  These features can be quantized to discrete ranges to prevent overfitting.--, in [0103]-[0104]).

Re Claim 76, Patel as modified by Higgins further disclose said reference audio training data was derived from audio previously sensed from plural other individuals who are of the said gender as said first user and who have heights within a predefined height of said first user (see Patel: e.g., -- Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning). [0104] Lung Function Regression: The feature extraction creates a number of features at 15 ms time steps that approximate flow rate over time.  Referring to FIG. 8, by treating each feature as a flow rate, regression can be used to directly yield PEF, FEV.sub.1, and FVC.  For example, PEF is defined as the maximum flow reached in a single effort.  Thus, for a curve in a given fold, the maximum of each feature can be used to regress to the PEF.  Integrating flow with respect to time gives the volume of air exhaled.  Thus for FVC, the integration of each feature is accomplished.  For FEV.sub.1, the integration of the features during the first second is used.  These new features can be combined with other demographic features such as gender and age, weight, and height range.  These features can be quantized to discrete ranges to prevent overfitting.--, in [0103]-[0104]).

Re Claim 77, Patel discloses a system comprising a processor, a memory, a display, and one or more microphones, said system including a portion adapted for wearing or carrying by a first user and being characterized by instructions in the memory to implement a machine learning and neural network to process audio feature data derived from first user speech captured by the microphone(s) to determine health information about the first user (see Patel: e.g., Fig. 10, and Fig. 12, and, --Devices, system, and methods generate expiratory flow-based pulmonary function data by processing a digital audio file of sound of a subject's forced expiratory maneuver.  A mobile device configured to generate expiratory flow-based pulmonary function data includes a microphone, a processor, and a data storage device.  The microphone is operable to convert sound of the subject's forced expiratory maneuver into a digital data file.  The processor is operatively coupled with the microphone.  The data storage device is operatively coupled with the processor and stores instructions that, when executed by the processor, cause the processor to process the digital data file to generate expiratory flow-based pulmonary function data for assessing pulmonary function of the subject.  The sound of the subject's forced expiratory maneuver can be converted into the digital data file--, in abstract, and [0025], [0038], and [0065]; and see Patel: e.g., Fig. 8, and -- past features of different patients can be saved so that new curves can be compared to their empirical average for a given feature.  Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning).--, in [0022]-[0023], [0085], and [0101]-[0106]; and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096]),  
Patel however does not explicitly disclose above machine learning and neural network is a deep learning feedforward neural network,
Higgins teaches a deep learning feedforward neural network to process audio feature data (see Higgins: e.g., -- any features used in scoring are engineered (i.e., chosen, designed) by the computer-based system (i.e., and not by a human), based on directly-observable elements of constructed responses (e.g., words, characters, sequences of characters, acoustic frames, etc.)…. utilize a "deep learning" technique that obviates the need for heuristic encoding of an input feature space and allows more appropriate and general features to be induced by the computer-based system. …. FIG. 1, a convolutional neural network model utilized by the scoring engine 118 may include a convolutional neural network that is configured to receive the numerical vectors 108A, 108B, 108C and to determine the score 118 for the constructed response 102.  A convolutional neural network includes multiple nodes organized in layers. --, in [0035]-[0037], and [0085]; also see: -- As shown in FIG. 4, the structure of the example convolutional neural network 400 includes a number of successive layers 406, 408, 410, 412, 414, 416.  In an example, a layer feeds activation forward to successive layers by (i) multiplying activation values of a vector at a layer n of the network 400 by values of a weight matrix, and (ii) transforming the resultant values by a sigmoidal function.--, in [0061]);
Patel and Higgins are combinable as they are in the same field of endeavor: using machine leaning in analysis of audio, caustic features. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Patel’s system using Higgins’s teachings by including a deep learning feedforward neural network to process audio feature data to Patel’s machine learning and audio feature data classification in order to process the audio feature data (see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]),
Patel as modified by Higgins further disclose said deep learning feedforward neural network comprising plural alternating convolutional layers and max-pooling layers that produce plural first outputs (see Higgins: e.g., in [0048]-[0050]), said first outputs being coupled to a plurality of artificial neurons, each of said plurality of artificial neurons being coupled to plural of said first outputs and producing a second output that is a weighted sum of its inputs (see Higgins: e.g., in [0048]-[0050], and, --the convolution layer 216 is applied to the plurality of first vectors 220 using the input layer 214, which passes the first vectors 220 to the convolution layer 216 via a plurality of first connections.  Each of the first connections has an associated weight…..Such weights may be used in computing outputs (e.g., activations) of the hidden layer 218.  Examples of functions that may be used by the hidden layer 218 include maximum, sum, and average functions.  In an example, a function used by the hidden layer 218 may be any function that can compute a single value from multiple values. --, in [0052]), 
the weights used in producing said second outputs having previously been established by a backpropagation training process using reference audio feature training data (see Higgins: e.g., -- Each of the second plurality of connections that connect the hidden layer to the convolution layer has an associated second weight.  In an example, at least a subset of the second plurality of connections have a same second weight…. Each node of the hidden layer generates an output based on (i) a weighted summation of received inputs, and (ii) an activation function.  Specifically, a node of the hidden layer receives inputs from nodes of the convolution layer of the model via one or more weighted connections.--, in [0040]-[0045]; also see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]; --The plurality of numerical vectors for the un-scored response are used to train a denoising auto-encoder or restricted Boltzmann machine (RBM) in order to estimate values of the weights for the connections between the input layer 406 and the first hidden layer 408.  Such weights should be suitable for encoding the regularities in the co-occurrence of directly-observable elements in the un-scored response.  The weights between the input layer 406 and the first hidden layer 408 are shared across all sets of nodes (i.e., weights between the sets of nodes 407A and 409A of the input and first hidden layers 406, 408, respectively, are set to be the same as the weights between the sets of nodes 407B and 409B, and so on).--, in [0072] {the relevant disclosures can be found in last paragraph, page 3 through page4} in the Specification of Provisional Application No. 61/945874 Feb 28, 2014}).

Re Claim 78, Patel as modified by Higgins further disclose said reference audio feature training data was derived from audio previously sensed from said first user (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427). --, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 79, Patel as modified by Higgins further disclose said reference audio feature training data was derived from audio previously sensed from plural other individuals (see Patel: e.g., -- devices, and systems disclosed herein for performing spirometry relative to a clinical spirometer endorsed by the American Thoracic Society (ATS) was conducted.  Through a study that included 52 subjects, it was shown that a smartphone spirometer has a mean error of 5.1% for the most common measures of lung function--, in [0066]-[0068], and, --Spectrogram processing is used to extract resonances.  In the frequency domain, resonances are assumed to be amplitudes excited by reflections in the vocal tract and mouth opening and therefore should be proportional to the flow rate that causes them.  Linear prediction is then used as a flow approximation.  Linear prediction assumes that a signal can be divided into a source and a shaping filter and it estimates the source power and shaping filter coefficients.  The "filter" in this case is an approximation of the vocal tract (see, e.g., Wakita, H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. Audio and Electroacoustics, IEEE Transactions on 21, 5 (1973), 417-427).--, in [0096], -- The participants in the dataset can be folded into several training subsets, providing a number of diverse models that can be combined to create a global model.  For example, one subset randomly divides the participants into ten folds equally.  Another subset divides participants with wheezes together into ten folds.  Another subset divides the dataset into ten folds, but ensures there are equal numbers of abnormal and normal curves to train on.  Still other subsets can be created based on if subjects have a throat clear or based on their level of obstruction.  Each subset is used to create a different regression model and the ensemble can be clustered together to form one decision.--, in [0099]-[0100], and [0103]).

Re Claim 80, Patel as modified by Higgins further disclose the deep learning feedforward neural network was previously trained to recognize possible depression of the first user from said reference audio feature training data (see Patel: e.g., --A number of researchers have also evaluated how multiple sensors could be connected to a smartphone via an external board to collect physiological information (see, e.g., Brunette, W., Sodt, R., Chaudhri, R., et al. The Open Data Kit Sensors Framework: Application-Level Sensor Drivers for Android.  MobiSys, (2012); also see, e.g., Majchrzak, T. and Chakravorty, A. Improving the Compliance of Transplantation Medicine Patients with an Integrated Mobile System.--, in [0019]; and, -- Respiration monitoring during sleeping.  ISABEL '11, (2011)).  A few systems have leveraged simple, low-cost microphones to analyze signals, such as heart rate and cough.  Many systems exist that extract heart rate using a mobile phone--, in [0022]).

Re Claim 81, Patel as modified by Higgins further disclose the deep learning feedforward neural network was previously trained to recognize a possible stroke of the first user from said reference audio feature training data (see Patel: e.g., --A number of researchers have also evaluated how multiple sensors could be connected to a smartphone via an external board to collect physiological information (see, e.g., Brunette, W., Sodt, R., Chaudhri, R., et al. The Open Data Kit Sensors Framework: Application-Level Sensor Drivers for Android.  MobiSys, (2012); also see, e.g., Majchrzak, T. and Chakravorty, A. Improving the Compliance of Transplantation Medicine Patients with an Integrated Mobile System.--, in [0019]; and, -- Respiration monitoring during sleeping.  ISABEL '11, (2011)).  A few systems have leveraged simple, low-cost microphones to analyze signals, such as heart rate and cough.  Many systems exist that extract heart rate using a mobile phone--, in [0022]).

Re Claims 86 and 87, Patel as modified by Higgins further the weights used in producing said outputs have been previously established by a backpropagation supervised training process (see Patel: e.g., -- Features can then be scaled by dividing them by empirical averages for a given participant.  This adds new "personalized" features to the model without the need for curves from a spirometer for a participant (i.e., semi-supervised learning). [0104] Lung Function Regression: The feature extraction creates a number of features at 15 ms time steps that approximate flow rate over time.  Referring to FIG. 8, by treating each feature as a flow rate, regression can be used to directly yield PEF, FEV.sub.1, and FVC.  For example, PEF is defined as the maximum flow reached in a single effort.  Thus, for a curve in a given fold, the maximum of each feature can be used to regress to the PEF.  Integrating flow with respect to time gives the volume of air exhaled.  Thus for FVC, the integration of each feature is accomplished.  For FEV.sub.1, the integration of the features during the first second is used.  These new features can be combined with other demographic features such as gender and age, weight, and height range.  These features can be quantized to discrete ranges to prevent overfitting.--, in [0103]-[0104]; also see Higgins: e.g. in in [0035]-[0037], [0061] and [0085]; and, ----The plurality of numerical vectors for the un-scored response are used to train a denoising auto-encoder or restricted Boltzmann machine (RBM) in order to estimate values of the weights for the connections between the input layer 406 and the first hidden layer 408.  Such weights should be suitable for encoding the regularities in the co-occurrence of directly-observable elements in the un-scored response.  The weights between the input layer 406 and the first hidden layer 408 are shared across all sets of nodes (i.e., weights between the sets of nodes 407A and 409A of the input and first hidden layers 406, 408, respectively, are set to be the same as the weights between the sets of nodes 407B and 409B, and so on)…. Specifically, applying the network 400 to predict a score for a constructed response does not involve the extraction of human-engineered features from the constructed response.  Instead, during the supervised training step, the convolutional neural network 400 itself identifies important characteristics of human-scored reference responses that are related to the classifications or scores assigned to the reference responses by human graders.--, in [0072]-[0076]).












Conclusion
Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEI WEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on 8:00 - 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WEI WEN YANG/Primary Examiner, Art Unit 2667