DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 11/21/2022 have been fully considered but they are not persuasive. Regarding arguments on pages 10-11 of the Remarks, Examiner notes that even if the techniques described in the application differ from those of the applied reference, the reference can still teach the limitations of the claims. For instance, the claims do not appear to teach trajectories as inputs to a model or trained inversion that are recited in the arguments. 
Regarding arguments on pages 11-13 of the Remarks, Examiner notes that while Applicant explains the workings of the DIVA models, the cited portion of Quatieri665 has not been addressed. Examiner cited para [0050] of Quatieri665, which teaches “positions and velocities of articulators” included in the articulatory states, thus teaching the limitation. Regarding the training of the model, Examiner notes that the claim teaches that the system is trained on data including measurements, while Quatieri665 teaches that the model performs an iterative learning process using a behavioral measurement of speech, and that speech-related signals include articulator positions. Therefore, the references read on the present claim limitations.
Regarding arguments on pages 13-14 of the Remarks, Examiner notes that all three of the currently applied references to claim 1 teach vocal tract variables, while the new Wang reference also teaches lip movements in para [0019]. When taken in combination together, the claimed limitations are taught.
Regarding arguments on page 14 of the Remarks, Examiner notes that the third reference is provided to teach the use of the Wisconsin X-Ray Microbeam database. However, achieving speaker independent representation is another benefit of combining the references. Therefore, the combination is proper, even if the reason for combination does not include the use of the x-ray measurements.
Regarding arguments on page 15 of the Remarks, Examiner notes that the claims recite “determining … time varying acoustic features from … speech in the audio recording” and “determining … from the acoustic features, one or more time varying vocal tract variables”, as well as “determining … a measurement of a disorder”. These limitations are performed by using mathematical calculations, such as conversion of speech to features, conversion of features to variables, and using a degree of correlation to determine a measurement. Additionally, the limitations could be considered mental processes in addition to mathematical calculations, as the mathematical calculations could potentially be performed in the human mind.
Regarding arguments on page 16, Examiner is persuaded that claim 9 includes a practical application, as the animation of the vocal tract along with the audio recording is practically useful. However, Examiner maintains the rejection of claim 8, as simply displaying an image of a vocal tract does not have the same practical application or usefulness.

Claim Objections
Claim 30 objected to because of the following informalities:  line 4 reads “transformations the cinematographic” which should read “transformations to the cinematographic” or a similar amendment, while line 6 is unclear as to whether the machine learning system is trained using the data, or whether the data itself is being trained.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-8, 10-28 and 31-32 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  Using the subject matter eligibility test from page 74621 of the Federal Register Notice titled “2014 Interim Guidance on Patent Subject Matter Eligibility,” a two-step process is performed. Under step 1, the claims are analyzed to determine if the claim is directed to a process, machine, article of manufacture, or composition of matter. In this case, claims 1-15 and 30-32 are directed to a method, which is a process, and claims 16-28 are directed to a system, which is a machine or article of manufacture. Step 2A (part 1 of the Mayo test), using the guidance from pages 50-57 of the Federal Register Vol. 84 No. 4 from Monday, January 7, 2019, requires applying a two-prong inquiry. In Prong One, examiners evaluate whether the claim recites a judicial exception, determining if the claim is directed to a law of nature, a natural phenomenon, or an abstract idea. In this case, claim 1 recites computing, which is a mathematical calculation. In Prong Two, examiners evaluate whether the judicial exception is integrated into a practical application that imposes a meaningful limit on the judicial exception. In this case, no additional limitations to the abstract idea are provided.
Step 2B (part 2 of the Mayo test) requires analyzing the claims to determine if they recite additional elements that amount to significantly more than the judicial exception. In this case, the claims do not include additional elements that are sufficient to amount to significantly more than the abstract idea itself.  

Regarding claim 1 and 16, receiving audio is a mental process, and computing coefficients and determining a measurement are mathematical calculations, which are abstract ideas, without integration into a practical application and without significantly more.

Regarding claims 2-3, 5, 7, 15, 17-18, 20, and 31-32, the limitations are further clarifications of the above abstract ideas, without integration into a practical application and without significantly more.

Regarding claims 4 and 19, computation using a neural network is a mathematical calculation, which is an abstract idea without integration into a practical application and without significantly more.

Regarding claims 6 and 21, estimating a glottal state is a mathematical calculation, which is an abstract idea without integration into a practical application and without significantly more.

Regarding claims 8 and 22-23, displaying an image and playing an audio recording is considered extra-solution activity that does not qualify as a practical application, nor as significantly more.

Regarding claims 10 and 24, associating variables with an utterance is a mental process, which is an abstract idea without integration into a practical application and without significantly more.

Regarding claims 11-13 and 25-27, computing functions, generating a matrix, and generating an eigenspectrum and determining a measurement are mathematical calculations, which are abstract ideas without integration into a practical application and without significantly more.

Regarding claims 14 and 28, computing changes is a mathematical calculation, which is an abstract idea without integration into a practical application and without significantly more.

The limitations of the claims, taken alone, do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements individually. Applicable case law cited in the Federal Register includes, but is not limited to: Alice Corp., 134 S. Ct. at 2355-56, Digitech Image Tech., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344 (Fed. Cir. 2014), Benson, 409 U.S. at 63.

See "Preliminary Examination Instructions in view of the Supreme Court Decision in Alice Corporation Pty. Ltd. v. CLS Bank International, et al.," dated June 25, 2014, and the Federal Register notice titled "2014 Interim Guidance on Patent Subject Matter Eligibility" (79 FR 74618).

	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 10-13, 15-19, 24-27, and 30-31 is/are rejected under 35 U.S.C. 103 as being unpatentable over Quatieri et al. (US 2015/0112232 A1), hereinafter referred to as Quatieri232, in view of Quatieri et al. (US 2017/0053665 A1), hereinafter referred to as Quatieri665, and further in view of Wang et al. (US 2012/0280974 A1), hereinafter referred to as Wang.

Regarding claim 1, Quatieri232 teaches:
A method for measuring neuromotor coordination from speech: 
receiving an audio recording that includes spoken speech (Fig. 8A element 102, para [0065], where input speech is received); 
determining using a computing device time varying acoustic features from at least a portion of the speech in the audio recording, the acoustic features representing at least one characteristic of the at least a portion of the speech in the audio recording (para [0020], where features from the speech including MFCCs, formant frequencies, prosodic characteristics, etc. are measured); 
determining using a computing device, from the acoustic features, one or more time varying vocal tract variables (para [0064], [0070], where the feature domains or channels of formant frequencies and Delta MFCC are the variables computed from the features, which capture vocal tract shape and dynamics); and 
determining using a computing device a measurement of a disorder based at least in part on a degree of correlation between at least two of the vocal tract variables (para [0064], [0070], where auto and cross correlations among channels of measurement domains are the basis for key depression features).  
Quatieri232 does not teach:
determining using a computing device, from the acoustic features, one or more time varying vocal tract variables, each of said vocal tract variables being an output of a machine learning system estimating a position or movement of a corresponding  articulator or a shape of an anatomical structure of a vocal tract, said machine learning system being trained on data including cinematographic measurements of vocal tracts during natural spoken utterances showing anatomical structure including positions or movement of articulators of vocal tracts, wherein the vocal tract variables are selected from a group consisting of: a place of articulation; a manner of articulation; constriction and/or location of lips, tongue tip, tongue body, velum, or glottis; and features and/or position of nasal cavity, buccal cavity, nostrils, epiglottis, trachea, and hard palate;
Quatieri665 teaches:
determining using a computing device, from the acoustic features, one or more time varying vocal tract variables (para [0032], where the DIVA model takes as inputs speech formants and computes parameters including articulatory commands), each of said vocal tract variables being an output of a machine learning system estimating a position or movement of a corresponding  articulator or a shape of an anatomical structure of a vocal tract (para [0050], where articulatory states include positions and velocities of articulators, which are output from the model), said machine learning system being trained on data including measurements of positions or movement of articulators of vocal tracts (para [0024], where articulator positions are measured, and para [0021], where the model learns using the measured signals), wherein the vocal tract variables are selected from a group consisting of: a place of articulation; a manner of articulation; constriction and/or location of lips, tongue tip, tongue body, velum, or glottis; and features and/or position of nasal cavity, buccal cavity, nostrils, epiglottis, trachea, and hard palate (para [0050], where the variables are positions and velocities of articulators such as tongue, jaw, lips, and larynx);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 by using the DIVA model of Quatieri665 (Quatieri665 para [0032]) in the determination of the vocal tract changes of Quatieri232 (Quatieri232 para [0070]), as it would allow building a model to match a specific speaker instead of a generic speaker, while also allowing specificity across disorders (Quatieri665 para [0069]).
Wang teaches:
said machine learning system being trained on data including cinematographic measurements of vocal tracts during natural spoken utterances showing anatomical structure including positions or movement of articulators of vocal tracts (para [0023-24], where a model is trained using audio and video of an individual speaking a known script or text including lip movements, where the lips are part of the vocal tract), wherein the vocal tract variables are selected from a group consisting of: a place of articulation; a manner of articulation; constriction and/or location of lips, tongue tip, tongue body, velum, or glottis; and features and/or position of nasal cavity, buccal cavity, nostrils, epiglottis, trachea, and hard palate (para [0019], [0024], where lip movements are determined).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 by using the video data of Wang (Wang para [0023-24]) in the training of Quatieri232 in view of Quatieri665 (Quatieri665 para [0021], [0024]), in order to generate visual feature vectors corresponding to input audio feature vectors (Wang para [0006]).

Regarding claim 2, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1 wherein the acoustic features represent characteristics of an audio power spectrum of the portion of the speech (Quatieri232 para [0043], where the MFCCs are coefficients that represent the short-term power spectrum).  

Regarding claim 3, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 2 wherein the acoustic features comprise cepstral coefficients (Quatieri232 para [0043], where the MFCCs are coefficients that represent the short-term power spectrum).  

Regarding claim 4, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1
wherein determining the vocal tract variables comprises providing the acoustic features as inputs to a neural network, and using the neural network to compute the vocal tract variables from the acoustic features (Quatieri665 para [0021], [0032], where DIVA is a neural network model, which receives feature inputs and computes the vocal tract variables).

Regarding claim 10, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1 further comprising associating the vocal tract variables with an utterance within the audio recording (Quatieri232 para [0033], where the speech-related variables are related to the user's speech).  

Regarding claim 11, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1 wherein determining the measurement of the disorder comprises computing time correlation dependent functions of the at least one vocal tract variables (Quatieri232 para [0032], [0041], where a channel-delay correlation matrix of the channels is used, or alternatively, using the variables of para [0034-39]).  

Regarding claim 12, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 11 wherein computing the time correlation dependent functions comprises generating a channel-delay correlation matrix of the vocal tract variables and/or acoustic features (Quatieri232 para [0032], [0041], where a channel-delay correlation matrix of the channels is used, or alternatively, using the variables of para [0034-39]).  

Regarding claim 13, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 12 further comprising generating an eigenspectrum of the channel-delay correlation matrix, and determining the measurement of the disorder comprises identifying eigenvalues within the eigenspectrum that have magnitudes indicating depressed speech (Quatieri232 para [0041], where the eigenvalues are constructed from the matrix, and para [0084], where the matrix eigenspectra are the eigenvalues, and where the eigenvalues are used to differentiate healthy and depressed subjects and estimate depression levels).  

Regarding claim 15, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1 wherein determining the measurement of the disorder further includes a time delay correlation of the vocal tract variables (Quatieri232 Fig. 7, para [0031], where a time delay correlation of the channels is performed).  

Regarding claim 16, Quatieri232 teaches:
A machine-implemented system for measuring neuromotor coordination from speech, the system comprising: 
a receiver to receive an audio recording that includes speech (Fig. 8A element 102, para [0065], where input speech is received); 
a feature extractor configured to compute acoustic features from at least a portion of the spoken speech in the audio recording, the acoustic features representing at least one characteristic of the at least a portion of the speech in the audio recording (para [0020], where features from the speech including MFCCs, formant frequencies, prosodic characteristics, etc. are measured);  
a vocal tract variable generator configured to estimate from the acoustic features one or more time varying vocal tract variables (para [0064], [0070], where the feature domains or channels of formant frequencies and Delta MFCC are the variables computed from the features, which capture vocal tract shape and dynamics); and 
a disorder identification module configured to determine a measurement of a disorder based at least in part on a degree of correlation between at least two of the vocal tract variables (para [0064], [0070], where auto and cross correlations among channels of measurement domains are the basis for key depression features).  
Quatieri232 does not teach:
a vocal tract variable generator comprising a machine learning system trained on data including cinematographic measurements of vocal tracts during natural spoken utterances showing anatomical structure including positions or movement of articulators of vocal tracts and configured to estimate from the acoustic features one or more time varying vocal tract variables, each of said vocal tract variables estimating a position or movement of a corresponding articulator of a vocal tract, wherein the vocal tract variables are selected from a group consisting of: a place of articulation; a manner of articulation; constriction and/or location of lips, tongue tip, tongue body, velum, or glottis; and features and/or position of nasal cavity, buccal cavity, nostrils, epiglottis, trachea, and hard palate;
Quatieri665 teaches:
a vocal tract variable generator comprising a machine learning system trained on data including measurements of positions or movement of articulators of vocal tracts (para [0024], where articulator positions are measured, and para [0021], where the model learns using the measured signals) and configured to estimate from the acoustic features one or more time varying vocal tract variables (para [0032], where the DIVA model takes as inputs speech formants and computes parameters including articulatory commands), each of said vocal tract variables estimating a position or movement of a corresponding articulator of a vocal tract (para [0050], where articulatory states include positions and velocities of articulators, which are output from the model), wherein the vocal tract variables are selected from a group consisting of: a place of articulation; a manner of articulation; constriction and/or location of lips, tongue tip, tongue body, velum, or glottis; and features and/or position of nasal cavity, buccal cavity, nostrils, epiglottis, trachea, and hard palate (para [0050], where the variables are positions and velocities of articulators such as tongue, jaw, lips, and larynx);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 by using the DIVA model of Quatieri665 (Quatieri665 para [0032]) in the determination of the vocal tract changes of Quatieri232 (Quatieri232 para [0070]), as it would allow building a model to match a specific speaker instead of a generic speaker, while also allowing specificity across disorders (Quatieri665 para [0069]).
Wang teaches:
a machine learning system trained on data including cinematographic measurements of vocal tracts during natural spoken utterances showing anatomical structure including positions or movement of articulators of vocal tracts (para [0023-24], where a model is trained using audio and video of an individual speaking a known script or text including lip movements, where the lips are part of the vocal tract), wherein the vocal tract variables are selected from a group consisting of: a place of articulation; a manner of articulation; constriction and/or location of lips, tongue tip, tongue body, velum, or glottis; and features and/or position of nasal cavity, buccal cavity, nostrils, epiglottis, trachea, and hard palate (para [0019], [0024], where lip movements are determined).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 by using the video data of Wang (Wang para [0023-24]) in the training of Quatieri232 in view of Quatieri665 (Quatieri665 para [0021], [0024]), in order to generate visual feature vectors corresponding to input audio feature vectors (Wang para [0006]).

Regarding claim 17, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 16 wherein the acoustic features represent an audio power spectrum of the portion of the spoken speech (Quatieri232 para [0043], where the MFCCs are coefficients that represent the short-term power spectrum).  

Regarding claim 18, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 17 wherein the acoustic features are cepstral coefficients (Quatieri232 para [0043], where the MFCCs are coefficients that represent the short-term power spectrum).  

Regarding claim 19, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 16, wherein the machine learning system comprises a neural network that computes the vocal tract variables from the acoustic features (Quatieri665 para [0021], [0032], where DIVA is a neural network model, which receives feature inputs and computes the vocal tract variables).  

Regarding claim 24, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 16 further comprising a time delay correlation module that associates the vocal tract variables with an utterance within the audio recording (Quatieri232 para [0033], where the speech-related variables are related to the user's speech).  

Regarding claim 25, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 16 further comprises a time delay correlation module that computes time correlation dependent functions of the at least one vocal trace variables (Quatieri232 para [0032], [0041], where a channel-delay correlation matrix of the channels is used, or alternatively, using the variables of para [0034-39]).  

Regarding claim 26, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 25 wherein the time delay correlation module computes the time correlation dependent functions by generating a channel-delay correlation matrix of the vocal tract variables and/or acoustic features (Quatieri232 para [0032], [0041], where a channel-delay correlation matrix of the channels is used, or alternatively, using the variables of para [0034-39]).  

Regarding claim 27, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 26 further the time delay correlation module generates an eigenspectrum of the channel-delay correlation matrix (Quatieri232 para [0041], where the eigenvalues are constructed from the matrix, and para [0084], where the matrix eigenspectra are the eigenvalues); and 
the disorder identification module determining the measurement of the disorder comprises identifying eigenvalues within the eigenspectrum that have magnitudes indicating depressed speech (Quatieri232 para [0041], where the eigenvalues are constructed from the matrix, and para [0084], where the matrix eigenspectra are the eigenvalues, and where the eigenvalues are used to differentiate healthy and depressed subjects and estimate depression levels).  

Regarding claim 30, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1, further comprising: 
collecting the data including cinematographic measurements of vocal tracts during natural spoken utterances showing positions or movement of articulators of vocal tracts (Wang para [0032-33], where the audiovisual content of a person reading is captured, including images of the articulators); 
applying one or more transformations the cinematographic measurements to determine the positions or movement of articulators of vocal tracts (Wang para [0034], where eigenvectors of each lip image are determined by applying PCA); and 
training the machine learning system data associating the positions or movement of articulators of vocal tracts and acoustic features of the natural spoken utterances in the audio recording (Wang para [0023-25], [0036], where a model is trained using audio and video of an individual speaking a known script or text including lip movements, where the lips are part of the vocal tract, to associate visual and audio features).

Regarding claim 31,
The method of claim 1, wherein the acoustic features comprise a waveform of the portion of the speech (Quatieri665 para [0024], where a speech waveform is used).

Claims 5 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Quatieri232, in view of Quatieri665, and Wang, and further in view of Seneviratne et al. (Seneviratne, N., Sivaraman, G., & Espy-Wilson, C. Y. (2019). Multi-Corpus Acoustic-to-Articulatory Speech Inversion. In Interspeech (pp. 859-863).), hereinafter referred to as Seneviratne.

Regarding claim 5, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 4 
Quatieri232 in view of Quatieri665 and Wang does not teach:
wherein the neural network comprises stored parameters determined using the Wisconsin X-Ray Microbeam database representing vocal tract variables associated with audio data
Seneviratne teaches:
wherein the neural network comprises stored parameters determined using the Wisconsin X-Ray Microbeam database representing vocal tract variables associated with audio data (Seneviratne pages 860-861 Section 3.2, where a FF-DNN receives MFCCs as input and outputs Tract Variables using XRMB data for training).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 and Wang by using the neural network of Seneviratne (Seneviratne pages 860-861, section 3.2) in the estimation of Quatieri232 in view of Quatieri665 and Wang (Quatieri232 para [0057], [0062]) in order to achieve a relatively speaker independent representation of speech articulation and characterize salient features of the vocal tract area function (Seneviratne page 860 section 2.1).

Regarding claim 20, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 19 
Quatieri232 in view of Quatieri665 and Wang does not teach:
wherein the neural network comprises stored parameters using the Wisconsin X-Ray Microbeam database representing vocal tract variables associated with audio data
Seneviratne teaches:
wherein the neural network comprises stored parameters using the Wisconsin X-Ray Microbeam database representing vocal tract variables associated with audio data (Seneviratne pages 860-861 Section 3.2, where a FF-DNN receives MFCCs as input and outputs Tract Variables using XRMB data for training).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 and Wang by using the neural network of Seneviratne (Seneviratne pages 860-861, section 3.2) in the estimation of Quatieri232 in view of Quatieri665 and Wang (Quatieri232 para [0057], [0062]) in order to achieve a relatively speaker independent representation of speech articulation and characterize salient features of the vocal tract area function (Seneviratne page 860 section 2.1).

Claims 6-7 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Quatieri232, in view of Quatieri665, and Wang, and further in view of Lester et al. (US 2019/0371354 A1), hereinafter referred to as Lester.

Regarding claim 6, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1
Quatieri232 in view of Quatieri665 and Wang does not teach:
wherein determining the vocal tract variables includes estimating a glottal state.
Lester teaches:
wherein determining the vocal tract variables includes estimating a glottal state (para [0054], where glottal closure features are determined from the determined fundamental period corresponding to the input audio signal).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 and Wang by using the glottal state of Lester (Lester para [0054]) as one of the vocal tract variables of Quatieri232 in view of Quatieri665 and Wang (Quatieri232 para [0064], [0070]) in order to break up the signal into segments and create a reduced audio signal without altering the pitch (Lester para [0055]).

Regarding claim 7, Quatieri232 in view of Quatieri665, Wang, and Lester teaches:
The method of claim 6 wherein estimating the glottal state comprises calculating the glottal state from acoustic measurements of the audio signal (Lester para [0054], where glottal closure features are determined from the determined fundamental period corresponding to the input audio signal).  

Regarding claim 21, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 16
Quatieri232 in view of Quatieri665 and Wang does not teach:
further comprising a glottal estimator configured to generate vocal tract variables includes estimating a glottal state.
Lester teaches:
further comprising a glottal estimator configured to generate vocal tract variables includes estimating a glottal state (para [0054], where glottal closure features are determined from the determined fundamental period corresponding to the input audio signal).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 and Wang by using the glottal state of Lester (Lester para [0054]) as one of the vocal tract variables of Quatieri232 in view of Quatieri665 and Wang (Quatieri232 para [0064], [0070]) in order to break up the signal into segments and create a reduced audio signal without altering the pitch (Lester para [0055]).

Claims 8-9, 22-23, and 32 is/are rejected under 35 U.S.C. 103 as being unpatentable over Quatieri232, in view of Quatieri665, and Wang, and further in view of Norsworthy (US 2015/0127352 A1).

Regarding claim 8, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1
Quatieri232 in view of Quatieri665 and Wang does not teach:
further comprising displaying an image of a vocal tract on a display device.
Norsworthy teaches:
further comprising displaying an image of a vocal tract on a display device (Fig. 8, para [0042], where an animated vocal tract is shown on the device).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 and Wang by using the animation of Norsworthy (Norsworthy para [0042]) to display the vocal tract of Quatieri232 in view of Quatieri665 and Wang (Quatieri232 para [0064], [0070]) in order to promote literacy (Norsworthy para [0042]).

Regarding claim 9, Quatieri232 in view of Quatieri665, Wang, and Norsworthy teaches:
The method of claim 8 further comprising playing the audio recording and simultaneously animating the image of the vocal tract to display at least one of a constriction location, a degree of articulators along the vocal tract, and an anatomical shape of the vocal tract of the speaker represented by the time varying vocal tract variables (Norsworthy Fig. 8, para [0042], where the vocal tract is animated while synchronized with the audio, which displays the constriction location and degree of articulators).  

Regarding claim 22, Quatieri232 in view of Quatieri665 and Wang teaches:
The system of claim 16
Quatieri232 in view of Quatieri665 and Wang does not teach:
further comprising a display interface configured to display an image of a vocal tract.
Norsworthy teaches:
further comprising a display interface configured to display an image of a vocal tract (Fig. 8, para [0042], where an animated vocal tract is shown on the device).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 and Wang by using the animation of Norsworthy (Norsworthy para [0042]) to display the vocal tract of Quatieri232 in view of Quatieri665 and Wang (Quatieri232 para [0064], [0070]) in order to promote literacy (Norsworthy para [0042]).

Regarding claim 23, Quatieri232 in view of Quatieri665, Wang, and Norsworthy teaches:
The system of claim 22 wherein the display interface is configured to play the audio recording and simultaneously animate the image of the vocal tract to display the constriction location and degree of articulators along the vocal tract of the speaker (Norsworthy Fig. 8, para [0042], where the vocal tract is animated while synchronized with the audio, which displays the constriction location and degree of articulators).  

Regarding claim 32, Quatieri232 in view of Quatieri665 and Wang teaches:
The method of claim 1, 
Quatieri232 in view of Quatieri665 and Wang does not teach:
wherein the vocal tract variables comprise a variable representing position or movement of an anatomical cavity of the vocal tract.
Norsworthy teaches:
wherein the vocal tract variables comprise a variable representing position or movement of an anatomical cavity of the vocal tract (para [0024], where the diacritics are considered the variables corresponding to the anatomical cavities of the vocal tract.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Quatieri232 in view of Quatieri665 by using the animation of Norsworthy (Norsworthy para [0042]) to display the vocal tract of Quatieri232 in view of Quatieri665 (Quatieri232 para [0064], [0070]) in order to promote literacy (Norsworthy para [0042]).

Allowable Subject Matter
Claims 14 and 28 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  The closest prior art of Quatieri, Wang, Seneviratne, Lester, and Norsworthy do not teach the limitations of the claims. Specifically, none of the cited prior art teaches determining the measurement of the disorder by computing changes in articulator kinematics as determined through phasing of coupled oscillatory models of articulatory gestures derived from vocal tract variables. Hence, none of the cited prior art, either alone or in combination thereof, teaches the combination of limitations of the claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2017/0040017 A1 para [0016] teaches video capture of visible articulators of an actor, to provide a feature vector for each video frame.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658