DETAILED ACTION
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on August 23rd, 2021 has been entered.
 This action is in response to the amendments filed on August 23rd, 2021. A summary of this action:
Claims 23-40 have been presented for examination.
Claims 1-22 have been cancelled
Claim 32 is objected to
Claim 35 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 33, 36-37, 39 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 
Claim 23-24, 26-27, 29-32, 34, and 40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 and in further view of Mittal et al., “Classification of breast tissue for cancer diagnosis: Application of FT-IR imaging and random forests”, 2013
Claim 38 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Dobry et al., “Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal”, 2011
Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 and in further view of Dobry et al., “Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal”, 2011
This action is non-final

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment/Arguments
Regarding the § 103 Rejection
	The applicant’s arguments are not persuasive for the independent claims, however the argument regarding Zhang is persuasive (newly added claim 35). 
	As the applicant has cancelled the previous claims, the previous rejection for the previous claims is moot. However, as the Examiner is relying upon the same art as previously relied upon for the § 103 rejection below, the Examiner has responded to the present arguments below.

	The applicant submits (Remarks, page 8):
...Hence, Cohen does not use any of the models having a score less than the maximum score, but rather, by providing a mechanism for removing the local optimum, Cohen explicitly teaches away from using such model instances....

	This argument is not persuasive.
	As per MPEP § 2143 Example 1: “ "[w]hen the prior art teaches away from combining certain known elements, discovery of successful means of combining them is more likely to be nonobvious." KSR, 550 U.S. at 416, 82 USPQ2d at 1395”
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

	See Emmott, as relied upon for the argued limitation, specifically see § 4.5 and see the motivation to combine on page 31 of the rejection: “The motivation to combine would have been that the technique in Emmott provides a more "robust" solution than "a single Gaussian", especially in cases where "data points of low density are declared to be anomalies" (Emmott, section 4.5), e.g. cancer detection.”, i.e. § 4.5 of Emmott “However, a single GMM is not very robust”.
	The applicant’s arguments are a piecemeal attack against Cohen alone without addressing the fact that Emmott clearly teaches a motivation to use more than a single model.
Cohen does NOT “teach away” from such a combination – merely, Cohen does not teach using an ensemble GMM. 
The prior art as previously relied upon still teaches the presently claimed invention.

	The applicant submits (Remarks, page 8):
...Emmott generates "a diverse set of models". Each different value of k represents an entirely different model, since a k value of four equates to a GMM with four components and a k value of six represents a GMM with six components....In contrast to the Examiner's assertion, Applicant does not argue that each configuration cannot be a GMM, but rather, that each model in Emmott cannot be considered a configuration in the context of claim 23 since each model in Emmott has a different number of model parameters.

	This argument is not persuasive.
	This argument again is a piecemeal attack against Emmott alone, and furthermore it is a piecemeal attack on Emmott as relied upon.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

	The argument requires that this statement in Emmott § 4.5: “To improve robustness, we generate a diverse set of models by varying the number of clusters k, the EM initializations, and training on 15 bootstrap replicates of the data [29]” terminates after the “k”, i.e. that this sentence simply ends at the “k” – it clearly does no such thing. 
	Emmott creates the “diverse set” by varying not only “k” but also by varying the “EM initializations” and the “bootstrap replicates”, i.e. for each value of “k”, there is also a “diverse set” of GMMs generated by, for example, “varying...the EM initializations”.
	To be clear: the applicant’s arguments require a piecemeal analysis of a single sentence in Emmott. This is wholly not persuasive.
	The applicant’s argument then continues to focus on the “k” for the score – without addressing the remaining portions of that previous sentence – Emmott is teaches retaining all ensemble members in each value of “k” wherein the “the values of k whose average is less than 85% of the best observed value are discarded" – in other words, for each value of k a score is determined – this score is the “average” of the “diverse set of models” for that value of “k”, i.e. for each value of “k” there are a “diverse set of models by varying....the EM initializations...”, an “average” of these models is taken for “each value of k”, and then any values of k “whose average is less than 85% of the best....are discarded” – i.e., this retains all ensemble members in the retained values of k.
	In regards to the newly added claims, specifically for the limitation “a first number of model parameters in the first set equals a second number of model parameters in the second set;” – Emmott still teaches this – for each value of “k” there is a diverse set by varying at least the EM initializations – i.e., there are a plurality of GMMs, each GMM from a different “EM initialization”, wherein each of the GMMs for a set value of k have the “equal” number of model parameters, i.e. for each of the “k” components there is a set of model parameters, e.g. what a GMM is, other prior art has also been cited for this numerous times) – in terms of this “ensemble” for a value of k having configurations/GMMs with scores less than the maximum – this would have been known from Emmott’s use of “varying...EM initializations”, i.e. by Emmott varying the EM initializations the “ensemble” for each value of k would have had multiple GMMs with varying scores, including a maximum, and one lower than that maximum – this is merely the result of “varying...EM initializations” – and furthermore, this is a similar technique as disclosed in ¶ 30-31 of the instant specification wherein the “seed” of the “EM...algorithm” is varied, i.e. “250 random seeds are employed” [the seed is part of the EM initialization]. 
	The prior art as previously relied upon still teaches the presently claimed invention.

	In addition, the applicant’s arguments state: “The Examiner also misconstrues the teachings of Fig. 5 of the present application” – this is wholly incorrect. See the rejection, page 13 which clarifies on how the claims and specification convey what would be “as claimed would reasonably encompass” – to clarify on this: an ensemble mixture is, as per Emmott § 4.5, a “diverse set of models” instead of “a single GMM”, i.e. there are a plurality of GMMs in an ensemble, and so hence it is called an “ensemble Gaussian Mixture Model” – the present claims encompass subject matter such as this. 
	Figure 5 of the specification would reasonably convey embodiments such as an ensemble GMM – as an ensemble GMM comprises a plurality of GMMs, i.e. “configurations” as recited used in the claims, i.e. the plurality of GMMs as shown in figure 5 would reasonably encompass being called an ensemble of GMMs as figure 5 shows, using Emmott’s phrasing, “a the claimed invention reasonably encompasses this. It is not limited to this, as the claims contain no such recitations that would limit the invention to this – instead, the claims encompass this. 
	
	To further demonstrate that the claims and specification convey embodiments such as ensembles – see Perrone et al., “When Networks Disagree: Ensemble Methods for Hybrid Neural Networks”, 1992 – see § 3, see figure 2 – this shows an ensemble estimate figure wherein figure 2(b) shows the “True estimate” as a solid black line [e.g., see figure 5 of the instant specification which uses the same solid black line for the data] and then shows the “estimates” as dashed lines – the grey lines in figure 5 in the instant specification. The only distinction between these figures is that the figure 5 is produced with a GMM, whereas Perrone in 1992 used only a single Gaussian – however one of ordinary skill would readily infer that when k=1 for a GMM, e.g. for the GMM as disclosed in the specification – this would have produced a figure substantially similar to Perrone figure 2(b). And to further clarify on this “Example” in Perrone – see the abstract, this is an example to demonstrate the effect of “local minima” on such a fitting, i.e. this is to show why ensemble techniques are “much better than either of the individual estimates” (Perrone, § 3- the description of this figure). 
	In other words, Perrone provides a visual depiction of what an ensemble of Gaussian distributions is, i.e. for an ensemble of GMMs when k=1, and visually shows that an “ensemble” would have resulted in figure 5 in the instant specification. 

    PNG
    media_image1.png
    765
    1052
    media_image1.png
    Greyscale


	The applicant submits (Remarks, page 9):
Bigdeli fails to remedy the defects in Emmott. Bigdeli fails to teach defining a classification cluster in a parameter domain defined by the plurality of model parameters and determining a distance between each member of the set of configurations the classification cluster. In contrast, Bigdeli defines a cluster of data points and generates a single GMM to model the cluster. Bigdeli does not define a cluster in the parameter domain. The cluster of data is the input used to generate the GMM model, so it is in the data domain.

This argument is not persuasive.
piecemeal attack against Bigdeli alone, and fails to address the relied upon combination of references and the articulated rationale underpinning the combination of references in the rejection. 
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
	To point: the applicant’s arguments focus solely on Bigdeli’s use of a “single GMM” – see Emmott for this part of the combination. 
	To clarify – see the rejection, page 32: “Bigdeli's input is the GMM ensemble of Cohen/Emmott]”, i.e. the combination of the prior art uses the ensemble GMM of Cohen/Emmott as input into the technique of Bigdeli. 
	
	See the remaining parts of the final rejection for more details on the teachings of Cohen, as taken in view of Emmott and Bigdeli.

The applicant further submits (Remarks, page 10):
The combination of the prior art with the addition of Zhang fails to teach these limitations. Zhang teaches screening non-cell pixels from cell pixels. Zhang only analyzes the cell data to identify cancerous cells. Zhang excludes the non-cell pixels, so no configurations would be generated for these cells. There are no configurations generated during the screening phase of Zhang. Zhang teaches only generating modeling signals for the pixels that passed the screening criteria (i.e., only cell pixels). The general teaching of Zhang to screen before classifying does not render obvious...

As claim 35 is a newly added claim, there is no rejection to be withdrawn.

Allowable Subject Matter
Claim 35 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  

Claim 35 recites, in part:
The method of claim 33, further comprising:
	defining a screening cluster in the parameter domain;
	generating a screening set of configurations for defining modeling signals to model the energy absorption spectrum signal for the selected pixel using a first number of random seeds;
	and responsive to determining that at least one of the configurations in the screening set of configurations is within the screening cluster, generating a diagnostic set of configurations including the first configuration and the second configuration using a second number of random seeds greater than the first number of random seeds. 

None of the prior art of record, either taken alone or in combination, teach the claimed invention when taken as a whole.

The closest combination of prior art of record is the previous combination of prior art relied upon of Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015, and in further view of Zhang et al., “Classification cation of Fourier Transform Infrared Microscopic Imaging Data of Human Breast Cells by Cluster Analysis and Artificial Neural Networks”, 2003

The next closest combination of prior art of record is the combination of prior art previously relied upon, as cited above, taken in further combination with newly cited Zhang et al., “Cascade of Classier Ensembles for Reliable Medical Image Classification”, PhD Dissertation, University of Liverpool, March 2014 – see § 2.3.2, see figure 2.5, see page 52 ¶2, and see page 15 ¶ 3.

The next closest combination of prior art of record is the combination of prior art previously relied upon, as cited above, taken in further combination with newly cited Rafiee et al., “Region-of-interest extraction in low depth of field images using ensemble of clustering and difference of Gaussian approaches”, 2013, see the abstract this is a “two-stage unsupervised segmentation approach based on ensemble clustering” first using a “mixture-based model” with an “ensemble EM clustering algorithm” – see § 2.2 , and see figure 4 – and see page 2688 col. 2, last paragraph, also see § 3 

The next closest combination of prior art of record is the combination of prior art previously relied upon, as cited above, taken in further combination with newly cited Zhuang et al., “Acoustic Fall Detection Using Gaussian Mixture Models and GMM Supervectors” – see the abstract, see figure 2 and see §3 and §4, also see figure 3.


The next closest combination of prior art of record is the combination of prior art previously relied upon, as cited above, taken in further combination with newly cited Yang et al., “Neural network ensembles: combining multiple models for enhanced performance using a multistage approach”, 2004 – see figure 4 and §2.3.2.

The next closest combination of prior art of record is the combination of prior art previously relied upon, as cited above, taken in further combination with newly cited Quan et al., “Hybrid Generative-Discriminative Models for Speech and Speaker Recognition”, March 2002, IDIAP Research Project, Dalle Molle Institute for Perceptual Artificial Intelligence, Switzerland – see § 2.3 in full, including § 2.3.2, see § 3.1, §3.2, including page 9 ¶ 3, see § 4.1, see § 4.3. 

The next closest combination of prior art of record is the combination of prior art previously relied upon, as cited above, taken in further combination with previously cited 

While these combinations of prior art teaches some portions of the claimed invention, they fail to teach the entire claimed invention when read in combination.
None of the combinations stated above, nor any other combination of the prior art of record fairly teaches:
...
defining a screening cluster in the parameter domain;
	generating a screening set of configurations for defining modeling signals to model the energy absorption spectrum signal for the selected pixel using a first number of random seeds;
	and responsive to determining that at least one of the configurations in the screening set of configurations is within the screening cluster, generating a diagnostic set of configurations including the first configuration and the second configuration using a second number of random seeds greater than the first number of random seeds. 
...
when taken in combination with the remaining limitations and elements of the claimed invention.

Claim Objections
Claims 32 is objected to because of the following informalities:  
Claim 32 is objected to as the preamble of claim 32 recites “The method of claim 1” whereas claim 1 was cancelled – in light of the nearby dependent claims, e.g. claim 31, claim 32 should recite “The method of claim 23”
Appropriate correction is required.

	Claim Rejections - 35 USC § 103	
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 33, 36-37, 39 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 

Regarding Claim 33.
Cohen teaches: 
	A method for detecting ... in a ... sample, comprising: (Cohen, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”)
	acquiring a set of Fourier Transform Infrared (FTIR) spectroscopy data for the tissue sample, the FTIR data including an energy absorption spectrum signal for each of a plurality of pixels;(Cohen, as cited above, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”, i.e. the FTIR data that is acquired is a trace signal for a sample, e.g. see figure 1)
	classifying a selected pixel by:(Cohen, page 246, col. 1, ¶ 1 teaches that there is a “spectrum for each pixel” [a trace signal for each pixel] wherein page 246, col. 2, ¶ 2 teaches that the technique is to determine the class/label of each “pixel”, i.e. the process is repeated for each pixel – to clarify see section 1, ¶ 1 which teaches “Our goal is to assign each pixel a class …to which the spectrum is supposed to belong.”, i.e. each pixel is classified)
	generating a first configuration comprising a first set of model parameters for defining a first modeling signal to model a portion of the energy absorption spectrum signal;(Cohen, as cited above, teaches applying GMM to FTIR data for a sample – see page 250, ¶ 2-3 which teaches that the measured data originally includes a “spectra” with “1577 samples” and then a maximum likelihood technique that allows one to estimate simultaneously the number of meaningful classes and the pixel labels. Density estimation is already at the core of the most classical spectral method in which the observed spectra are modelized as a realization of a Gaussian Mixture Model (GMM)”)
...
	and determining that the selected pixel is associated with [a class] ...(Cohen, page 246, col. 1, ¶ 1 teaches that there is a “spectrum for each pixel” [a trace signal for each pixel] wherein page 246, col. 2, ¶ 2 teaches that the technique is to determine the class/label of each “pixel”, i.e. the process is repeated for each pixel – to clarify see section 1, ¶ 1 which teaches “Our goal is to assign each pixel a class …to which the spectrum is supposed to belong.”, i.e. each pixel is classified)
and repeating the classifying of the selected pixel for each of the plurality of pixels, wherein:(Cohen, page 246, col. 1, ¶ 1 teaches that there is a “spectrum for each pixel” [a trace signal for each pixel] wherein page 246, col. 2, ¶ 2 teaches that the technique is to determine the class/label of each “pixel”, i.e. the process is repeated for each pixel – to clarify see section 1, ¶ 1 which teaches “Our goal is to assign each pixel a class …to which the spectrum is supposed to belong.”, i.e. each pixel is classified)
	the first configuration has a first score for fitting the portion of the trace 4U.S. Application No. 15/333,888signal; (Cohen, as cited above, e.g. the abstract, teaches that the GMM technique is using “maximum likelihood” as the score for the GMM fitting the signal, see § 1 on page 247 for clarification, e.g. last paragraph “For a given model...we will use the maximum likelihood estimate”, i.e. page 249 col. 1, ¶ 2 “This first estimate is itself obtained by the classical EM algorithm, whose initialization is obtained by selecting the parameter set yielding the largest likelihood” [the best score])

Cohen does not explicitly teach:
for detecting malignancy in a tissue sample
	defining a diagnostic cluster in a parameter domain;
	generating a second configuration comprising a second set of model parameters for defining a second modeling signal to model the portion of the energy absorption spectrum signal;
	determining a first distance between the first configuration and the diagnostic cluster;
determining a second distance between the second configuration and the diagnostic cluster;
	... malignant tissue based on a determination that at least one of the first distance or the second distance is within a first distance threshold from the diagnostic cluster;
	a first number of model parameters in the first set equals a second number of model parameters in the second set;
	the second configuration has a second score for fitting the portion of the trace signal less than the first score. 

Emmott teaches:
...detecting malignancy in a tissue sample (Emmott, section 2, ¶ 1 teaches using anomaly detect to detect “the emergence of cancer cells in normal tissue” [the cancer cells are an example of malignant tissue])
	generating a second configuration comprising a second set of model parameters for defining a second modeling signal to model the portion of the energy absorption spectrum signal;(Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the emergence of cancer cells in normal tissue” – the see page 4, section 4.5 which teaches creating an “ensemble Gaussian Mixture Model” in which a “set of models”, i.e. “GMMs” are created by varying the “number of clusters”, the “EM initializations”, and the like to create an ensemble of GMMs [set of configurations] in which any GMM with a “likelihood” is “less than 85% of the best observed value are discarded” in order to rank the data points “by the remaining GMMs”, i.e. an ensemble/set of GMMs [and their respective 
	... malignant tissue ...(Emmott, section 2, ¶ 1 teaches using anomaly detect to detect “the emergence of cancer cells in normal tissue” [the cancer cells are an example of malignant tissue])
	a first number of model parameters in the first set equals a second number of model parameters in the second set;(Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the emergence of cancer cells in normal tissue” – the see page 4, section 4.5 which teaches creating an “ensemble Gaussian Mixture Model” in which a “set of models”, i.e. “GMMs” are created by varying the “number of clusters”, the “EM initializations”, and the like to create an ensemble of GMMs [set of configurations] in which any GMM with a “likelihood” is “less than 85% of the best observed value are discarded” in order to rank the data points “by the remaining GMMs”, i.e. an ensemble/set of GMMs [and their respective configurations/model parameters] is generated wherein only GMMs with a score higher than 85% of the “best”, i.e. maximum, are retained – this ensemble includes both the highest scoring, and GMMS with scores less than the highest – to clarify, Emmott teaches that for each value of k that there is also a “diverse set of models by varying...EM initializations...” wherein the “average...likelihood” is calculated for “each value of k”, i.e. that this is taking the average likelihood for “each value of k”, i.e. including the average for the GMMs with the varied “EM initializations” – in other words, for each value of k there is a set number of model parameters [e.g., see Cohen page 247 col. 1, ¶ 2 which clarifies, e.g., there is a “mean of the kth there is an ensemble of a “diverse set of models” generated based on “varying...the EM initializations” – each member in this ensemble [each configuration] for a value of k has the same number of parameters as they have the same number of k components )
	the second configuration has a second score for fitting the portion of the trace signal less than the first score. (Emmott, § 4.5 teaches that there is a “likelihood” score for fitting the data [ the portion of the trace signal], also see Cohen as cited above, page 249 col. 1, ¶ 2 “This first estimate is itself obtained by the classical EM algorithm, whose initialization is obtained by selecting the parameter set yielding the largest likelihood” – i.e. these are using the same score of “likelihood” wherein Emmott teaches that there is an “ensemble” of GMMs that are “diverse” wherein there is a best score and a score less than the best, hence Emmott takes the “average...likelihood for each value of k”)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen on using a GMM algorithm to analyze FTIR data with the teachings from Emmott on using an ensemble GMM algorithm for anomaly detection such as for detecting cancer cells. The motivation to combine would have been that the technique in Emmott provides a more “robust” solution than “a single Gaussian” , especially in cases where “data points of low density are declared to be anomalies” (Emmott, section 4.5), e.g. cancer detection. 

Cohen, as modified by Emmott, does not explicitly teach:
defining a diagnostic cluster in a parameter domain;
	determining a first distance between the first configuration and the diagnostic cluster;
	determining a second distance between the second configuration and the diagnostic cluster;
... based on a determination that at least one of the first distance or the second distance is within a first distance threshold from the diagnostic cluster;

Bigdeli teaches: 
	defining a diagnostic cluster in a parameter domain;(Bigdeli, as cited below, in summary teaches a method which includes defining a classification cluster, i.e. a “training…cluster” in the parameter domain defined by the plurality of model parameters [Bigdeli teaches that the input data is clusters as well into separate GMMs – for the modification, Bigdeli’s input is the GMM ensemble of Cohen as modified by Emmott] – the training clusters are “labelled”, i.e. each training cluster has an associated class type, the purpose of Bigdeli is to model the “group behavior” of new data, to mitigate the “negative impact of noise” on the input data
abstract, teaches “In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behavior rather than individual characteristics of incoming samples…. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs.”, then see page each training and test cluster are represented by a GMM…” [the training cluster is a classification cluster that is defined in a parameter domain defined by the plurality of model parameters – both the training and test clusters are in the same domain, as the distance between them is measured] 
section B further clarifies “In this case measurement is required to determine the label of each testing GMM. That is why in this paper a distance is require to measure the similarity between the training and testing GMMs…With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs [in light of the applicant’s numerous arguments for this – this clearly uses the plural form of GMM]. If the similarity between a test GMM [in the combination, each member of the ensemble] and one of the training GMMs [in the combination, each classification cluster, wherein there is at least one] is more than a threshold, the test cluster should be generated by the same distribution as that of training cluster – see section IV for details on the GMM, and section 5 for the distance metric)
	determining a first distance between the first configuration and the diagnostic cluster;(Bigdeli, as cited above teaches determining a “distance” between a GMM of testing data and a GMM of training data [and/or multiple training clusters], taken in combination with Cohen, as modified by Emmott, it would be obvious to apply this technique for each GMM in the set of GMMs, as each is a separate testing cluster, i.e. Bigdeli’s technique is to label each test GMM by determining the distance to one or more training GMM(s), e.g. see page 340, col. 1 ¶ 2 and figure 2, and then see § 5 for the “distance” measure between each of the GMMs in the ensemble and the “training clusters”)
determining a second distance between the second configuration and the diagnostic cluster;(Bigdeli, as cited above teaches determining a “distance” between a GMM of testing data and a GMM of training data [and/or multiple training clusters], taken in combination with Cohen, as modified by Emmott, it would be obvious to apply this technique for each GMM in the set of GMMs, as each is a separate testing cluster, i.e. Bigdeli’s technique is to label each test GMM by determining the distance to one or more training GMM(s), e.g. see page 340, col. 1 ¶ 2 and figure 2, and then see § 5 for the “distance” measure between each of the GMMs in the ensemble and the “training clusters”)
	...based on a determination that at least one of the first distance or the second distance is within a first distance threshold from the diagnostic cluster;(Bigdeli, as cited above, teaches determining the “label” for each input GMM [example of determining the class type of the sample, i.e. an example of a class type is a label], see page 340, col. 1, ¶ 1 teaches “all the test samples in a cluster are labelled based on their collective characteristic represented by a GMM” wherein this is based on “distance” – then see section 5 which provides various means of determining the distance between the configurations, e.g. “our proposed method considers the pairwise distance of each normal component from both GMM” [example of distance between each configuration and the training cluster(s)/classification cluster(s), then see page 339, col. 2, last paragraph which teaches “With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs. If the similarity [distances] between a test GMM and one of the training GMMs is more than a threshold [distance threshold], the test cluster should be generated by the same distribution as that of training cluster.”, in other words Bigdeli determines the distance between each configuration and the classification cluster, wherein a distance threshold is used to determine if the GMMs are similar enough to have the same label, thus labelling the “incoming samples” by the label of the GMMs wherein the labels are determined by “distance”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified by Emmott on a system which generates an ensemble of GMMs from FTIR data for anomaly detection such as for cancer detection with the teachings from Bigdeli on a technique for “anomaly detection” which detections anomalies based on “group behavior”. The motivation to combine would have been that the technique of Bigdeli minimizes the impact of “noise” in the dataset for the classification result (Bigdeli, abstract and section I), i.e. “By clustering test samples before detection process, we remove the effect of noise on test samples, and consequently improve the accuracy of anomaly detection algorithm.”

Regarding Claim 36.
Emmott teaches: 
	The method of claim 33, wherein each set of the plurality of the first set of model parameter defines a first Gaussian mixture;(Emmott, § 4.5 as cited above teaches using an ensemble of GMMs – each GMM is defined by its model parameters, e.g. see Cohen § 1 ¶ 2 for an example of this, see Bigdeli § IV ¶ 2 which provides a similar example, e.g. for each kth component there is a “mean” and a “variance” (see both Bigdeli and Cohen) which defines the GMM, and Emmott teaches using an “ensemble GMM” which would have included similar such sets of parameters for each member in the ensemble)
and the second set of model parameter defines a second Gaussian mixture. (Emmott, § 4.5 as cited above teaches using an ensemble of GMMs – each GMM is defined by its model parameters, e.g. see Cohen § 1 ¶ 2 for an example of this, see Bigdeli § IV ¶ 2 which provides a similar example, e.g. for each kth component there is a “mean” and a “variance” (see both Bigdeli and Cohen) which defines the GMM, and Emmott teaches using an “ensemble GMM” which would have included similar such sets of parameters for each member in the ensemble)

Regarding Claim 37.
Bigdeli teaches: 
	The method of claim 33, wherein defining the diagnostic cluster comprises defining an ellipsoid in the parameter domain. (Bigdeli, fig. 3 shows that the “clusters” defined by the GMMs include ellipsoids, this includes both the classification cluster and the set of configurations – these are both in the parameters domain, as cited above)

Regarding Claim 39.
Bigdeli, as taken in combination above, teaches: 
	The method of claim 33, further comprising:
	defining a plurality of diagnostic clusters in the parameter domain; (Bigdeli, as cited above teaches this – i.e. Bigdeli teaches that there are a plurality of classification clusters [e.g., figure 2’s “Training Clusters”] 
abstract, teaches “In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behavior rather than individual characteristics of incoming samples…. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs.”, then see page 339-340, section B and fig. 2 and 3- this teaches “In the CPAD method, each training and test cluster are represented by a GMM…” [the training cluster is a classification cluster that is defined in a parameter domain defined by the plurality of model parameters – both the training and test clusters are in the same domain, as the distance between them is measured] 
section B further clarifies “In this case measurement is required to determine the label of each testing GMM. That is why in this paper a distance is require to measure the similarity between the training and testing GMMs…With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs [in light of the applicant’s numerous arguments for this – this clearly uses the plural form of GMM]. If the similarity between a test GMM [in the combination, each member of the ensemble] and one of the training GMMs [in the combination, each classification cluster, wherein there is at least one] is more than a threshold, the test cluster should be generated by the same distribution as that of training cluster – see section IV for details on the GMM, and section 5 for the distance metric)
	and determining that the selected pixel is associated with malignant tissue responsive to determining that the first configuration or the second configuration has a distance within the distance threshold from any of the plurality of diagnostic clusters. (Cohen, as taken in combination with Bigdeli and Emmott above teaches this – Cohen teaches classifying “each pixel” (e.g., page 246, col. 2, ¶ 2, and § 1 ¶ 1) wherein Emmott teaches using an “ensemble” 

Claim 23-24, 26-27, 29-32, 34, and 40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 


Regarding Claim 23.
Cohen teaches:
	A method for characterizing a sample(Cohen, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”) , comprising:
acquiring a trace signal for the sample;(Cohen, as cited above, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”, i.e. the FTIR data that is acquired is a trace signal for a sample, e.g. see figure 1)
generating a first configuration comprising a first set of model parameters for defining a first modeling signal to model a portion of the trace signal;(Cohen, as cited above, teaches applying GMM to FTIR data for a sample – see page 250, ¶ 2-3 which teaches that the measured data originally includes a “spectra” with “1577 samples” and then a “range of wavenumbers…was removed” which reduces this to “1528” samples, i.e. a portion of the signal is used for the GMM – the GMM is the modeling signal to model a portion of the signal – for more clarification, see section 3.2 on page 251, ¶ 1 which teaches that “the set of spectra of each image was submitted to both regular GMM, using the EM algorithm, as well as the spatially aware model proposed…cGMM [variant of a GMM”, i.e. a configuration/GMM is generated for the trace signal then see  fig. 5 the caption teaches “Spectra re-computed from the model parameters for the same two pixels as in Figure 1…”, i.e. the modelling signal [the re-computed spectra] is defined by “model parameters” for each pixel, which are the parameters for the GMM/cGMM for more clarification page 246, col. 2, ¶ 2 which teaches “Our proposed contribution is based on conditional density estimation by the penalized maximum likelihood technique that allows one to estimate simultaneously the number of meaningful classes and the pixel labels. Density estimation is already at the core of the most classical spectral method 
the first configuration has a first score for fitting the portion of the trace signal; (Cohen, as cited above, e.g. the abstract, teaches that the GMM technique is using “maximum likelihood” as the score for the GMM fitting the signal, see § 1 on page 247 for clarification, e.g. last paragraph “For a given model...we will use the maximum likelihood estimate”, i.e. page 249 col. 1, ¶ 2 “This first estimate is itself obtained by the classical EM algorithm, whose initialization is obtained by selecting the parameter set yielding the largest likelihood” [the best score])
Cohen does not explicitly teach: 
defining a first classification cluster in a parameter domain, the first classification cluster having a class type;
	generating a second configuration comprising a second set of model parameters for defining a second modeling signal to model the portion of the trace signal;
	determining a first distance between the first configuration and the first classification cluster:
	determining a second distance between the second configuration and the first classification cluster:
	and determining that the sample has the class type based on a determination that at least one of the first distance or the second distance is within a first distance threshold from the first classification cluster, wherein:
a first number of model parameters in the first set equals a second number of model parameters in the second set;
		the second configuration has a second score for fitting the portion of the trace signal less than the first score. 

Emmott teaches:
generating a second configuration comprising a second set of model parameters for defining a second modeling signal to model the portion of the trace signal;(Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the emergence of cancer cells in normal tissue” – the see page 4, section 4.5 which teaches creating an “ensemble Gaussian Mixture Model” in which a “set of models”, i.e. “GMMs” are created by varying the “number of clusters”, the “EM initializations”, and the like to create an ensemble of GMMs [set of configurations] in which any GMM with a “likelihood” is “less than 85% of the best observed value are discarded” in order to rank the data points “by the remaining GMMs”, i.e. an ensemble/set of GMMs [and their respective configurations/model parameters] is generated wherein only GMMs with a score higher than 85% of the “best”, i.e. maximum, are retained – this ensemble includes both the highest scoring, and GMMS with scores less than the highest)
a first number of model parameters in the first set equals a second number of model parameters in the second set;(Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the emergence of cancer cells in normal tissue” – the see page 4, section 4.5 which teaches creating an “ensemble Gaussian Mixture Model” in to clarify, Emmott teaches that for each value of k that there is also a “diverse set of models by varying...EM initializations...” wherein the “average...likelihood” is calculated for “each value of k”, i.e. that this is taking the average likelihood for “each value of k”, i.e. including the average for the GMMs with the varied “EM initializations” – in other words, for each value of k there is a set number of model parameters [e.g., see Cohen page 247 col. 1, ¶ 2 which clarifies, e.g., there is a “mean of the kth component”] therefore for a set value of k, e.g. “6”, there is an ensemble of a “diverse set of models” generated based on “varying...the EM initializations” – each member in this ensemble [each configuration] for a value of k has the same number of parameters as they have the same number of k components )
		the second configuration has a second score for fitting the portion of the trace signal less than the first score. (Emmott, § 4.5 teaches that there is a “likelihood” score for fitting the data [ the portion of the trace signal], also see Cohen as cited above, page 249 col. 1, ¶ 2 “This first estimate is itself obtained by the classical EM algorithm, whose initialization is obtained by selecting the parameter set yielding the largest likelihood” – i.e. these are using the same score of “likelihood” wherein Emmott teaches that there is an “ensemble” of GMMs that are “diverse” wherein there is a best score and a score less than the best, hence Emmott takes the “average...likelihood for each value of k”)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen on using a GMM algorithm to analyze FTIR data with the teachings from Emmott on using an ensemble GMM algorithm for anomaly detection such as for detecting cancer cells. The motivation to combine would have been that the technique in Emmott provides a more “robust” solution than “a single Gaussian” , especially in cases where “data points of low density are declared to be anomalies” (Emmott, section 4.5), e.g. cancer detection. 

Cohen, as modified by Emmott, does not explicitly teach:
defining a first classification cluster in a parameter domain, the first classification cluster having a class type; 
determining a first distance between the first configuration and the first classification cluster:
	determining a second distance between the second configuration and the first classification cluster:
and determining that the sample has the class type based on a determination that at least one of the first distance or the second distance is within a first distance threshold from the first classification cluster, wherein:

Bigdeli teaches: 
defining a first classification cluster in a parameter domain, the first classification cluster having a class type; (Bigdeli, as cited below, in summary teaches a method which includes defining a classification cluster, i.e. a “training…cluster” in the parameter domain defined by the plurality of model parameters [Bigdeli teaches that the input data is clusters as well into separate GMMs – for the modification, Bigdeli’s input is the GMM ensemble of Cohen as modified by Emmott] – the training clusters are “labelled”, i.e. each training cluster has an associated class type, the purpose of Bigdeli is to model the “group behavior” of new data, to mitigate the “negative impact of noise” on the input data
abstract, teaches “In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behavior rather than individual characteristics of incoming samples…. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs.”, then see page 339-340, section B and fig. 2 and 3- this teaches “In the CPAD method, each training and test cluster are represented by a GMM…” [the training cluster is a classification cluster that is defined in a parameter domain defined by the plurality of model parameters – both the training and test clusters are in the same domain, as the distance between them is measured] 
section B further clarifies “In this case measurement is required to determine the label of each testing GMM. That is why in this paper a distance is require to measure the similarity between the training and testing GMMs…With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs [in light of the applicant’s numerous arguments for this – this clearly uses the plural form of GMM]. If the similarity between a test GMM [in the combination, each member of the ensemble] and one of the training GMMs [in the combination, each classification cluster, wherein there is at least one] is more than a threshold, the test cluster should be generated by the same distribution as that of training cluster – see section IV for details on the GMM, and section 5 for the distance metric)

    PNG
    media_image2.png
    291
    455
    media_image2.png
    Greyscale

determining a first distance between the first configuration and the first classification cluster:(Bigdeli, as cited above teaches determining a “distance” between a GMM of testing 
	determining a second distance between the second configuration and the first classification cluster:(Bigdeli, as cited above teaches determining a “distance” between a GMM of testing data and a GMM of training data [and/or multiple training clusters], taken in combination with Cohen, as modified by Emmott, it would be obvious to apply this technique for each GMM in the set of GMMs, as each is a separate testing cluster, i.e. Bigdeli’s technique is to label each test GMM by determining the distance to one or more training GMM(s), e.g. see page 340, col. 1 ¶ 2 and figure 2, and then see § 5 for the “distance” measure between each of the GMMs in the ensemble and the “training clusters”)
	...based on a determination that at least one of the first distance or the second distance is within a first distance threshold from the first classification cluster, wherein:(Bigdeli, as cited above, teaches determining the “label” for each input GMM [example of determining the class type of the sample, i.e. an example of a class type is a label], see page 340, col. 1, ¶ 1 teaches “all the test samples in a cluster are labelled based on their collective characteristic represented by a GMM” wherein this is based on “distance” – then see section 5 which provides various means of determining the distance between the configurations, e.g. “our proposed method considers the pairwise distance of each normal component from both GMM” by finding the distances between the test GMMs and training GMMs. If the similarity [distances] between a test GMM and one of the training GMMs is more than a threshold [distance threshold], the test cluster should be generated by the same distribution as that of training cluster.”, in other words Bigdeli determines the distance between each configuration and the classification cluster, wherein a distance threshold is used to determine if the GMMs are similar enough to have the same label, thus labelling the “incoming samples” by the label of the GMMs wherein the labels are determined by “distance”)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified by Emmott on a system which generates an ensemble of GMMs from FTIR data for anomaly detection such as for cancer detection with the teachings from Bigdeli on a technique for “anomaly detection” which detections anomalies based on “group behavior”. The motivation to combine would have been that the technique of Bigdeli minimizes the impact of “noise” in the dataset for the classification result (Bigdeli, abstract and section I), i.e. “By clustering test samples before 

Cohen, as taken in combination above with Emmott and Bigdeli, does not explicitly teach:
 determining that the sample has the class type... 

Fernandez teaches: 
 determining that the sample has the class type... (Fernandez, page 471, col. 2, ¶ 2 teaches determining the “fraction of pixels” for each class, e.g. see fig 1 – the class type of the sample is determined by using a fraction of the pixels that are classified by that class type, the fraction includes counting pixels with the class type)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified above, on classifying pixels in FTIR image data of a sample with the teachings from Fernandez on determining a classification of the sample based on the fraction of pixels with a class type, such as for cancer detection (Fernandez, page 472, col. 1, ¶1). The motivation to combine would have been that the technique in Fernandez would have enabled the system to automatically classify the sample and also provide a classification accuracy for the sample based on the fraction of pixels, e.g. see Fernandez table 1 – this would have allowed a user of the system to more quickly detect cancer cells. 

Regarding Claim 24.
Emmott teaches: 
	The method of claim 23, wherein the sample comprises a tissue sample, and the class type comprises malignant tissue. (Emmott, section 2, ¶ 1 teaches using anomaly detect to detect “the emergence of cancer cells in normal tissue” [the cancer cells are an example of malignant tissue])

Regarding Claim 26.
Emmott teaches:
	The method of claim 24, wherein:
	the first set of model parameter defines a first Gaussian mixture; (Emmott, § 4.5 as cited above teaches using an ensemble of GMMs – each GMM is defined by its model parameters, e.g. see Cohen § 1 ¶ 2 for an example of this, see Bigdeli § IV ¶ 2 which provides a similar example, e.g. for each kth component there is a “mean” and a “variance” (see both Bigdeli and Cohen) which defines the GMM, and Emmott teaches using an “ensemble GMM” which would have included similar such sets of parameters for each member in the ensemble)
	and the second set of model parameter defines a second Gaussian mixture. (Emmott, § 4.5 as cited above teaches using an ensemble of GMMs – each GMM is defined by its model parameters, e.g. see Cohen § 1 ¶ 2 for an example of this, see Bigdeli § IV ¶ 2 which provides a similar example, e.g. for each kth component there is a “mean” and a “variance” (see both Bigdeli and Cohen) which defines the GMM, and Emmott teaches using an “ensemble GMM” which would have included similar such sets of parameters for each member in the ensemble)

Regarding Claim 27.
Bigdeli teaches: 
	The method of claim 23, wherein defining the first classification cluster comprises defining an ellipsoid in the parameter domain. (Bigdeli, fig. 3 shows that the “clusters” defined by the GMMs include ellipsoids, this includes both the classification cluster and the set of configurations – these are both in the parameters domain, as cited above)


Regarding Claim 29.
Bigdeli, as taken in combination above, teaches: 
	The method of claim 23, further comprising:
	defining a second classification cluster in the parameter domain having the class type: (Bigdeli, as cited above teaches this – i.e. Bigdeli teaches that there are a plurality of classification clusters [e.g., figure 2’s “Training Clusters”] – for claim interpretation, see at least ¶ 38 in the instant specification which recites “at least one classification cluster”, i.e. under the BRI in light of the specification this claim is merely stating that there are two classification clusters defined and used in the same manner, which is taught by Bigdeli
abstract, teaches “In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behavior rather than individual characteristics of incoming samples…. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs.”, then see page 339-340, section B and fig. 2 and 3- this teaches “In the CPAD method, each training and test cluster are represented by a GMM…” [the training cluster is a classification cluster that is defined in a parameter domain defined by the plurality of model parameters – both the training and test clusters are in the same domain, as the distance between them is measured] 
section B further clarifies “In this case measurement is required to determine the label of each testing GMM. That is why in this paper a distance is require to measure the similarity between the training and testing GMMs…With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs [in light of the applicant’s numerous arguments for this – this clearly uses the plural form of GMM]. If the similarity between a test GMM [in the combination, each member of the ensemble] and one of the training GMMs [in the combination, each classification cluster, wherein there is at least one] is more than a threshold, the test cluster should be generated by the same distribution as that of training cluster – see section IV for details on the GMM, and section 5 for the distance metric)
	determining a third distance between the first configuration and the second classification cluster; (Bigdeli, as cited above teaches this, e.g. § B the system finds “the distances between the test GMMs and training GMMs [the classification clusters])
	determining a fourth distance between the second configuration and the second classification cluster;(Bigdeli, as cited above teaches this, e.g. § B the system finds “the distances between the test GMMs and training GMMs [the classification clusters])
and determining that the sample has the class type based on a determination that at least one of the third distance or the fourth distance is within a second distance threshold from the second classification cluster. (Bigdeli, as cited above teaches this, e.g. § B the system finds “the distances between the test GMMs and training GMMs [the classification clusters], i.e. this is finding the classification cluster/training GMM that is most “similar” by “finding the distances between the test GMMs and training GMMs” (page 339, col. 2, last paragraph) – and for determining the class type from this see the combination of prior art relied upon above, i.e. Cohen teaches repeating the classification “for each pixel” (page 246, col. 1, ¶ 1 and elsewhere as cited above) wherein Fernandez teaches determining the class type based on a “fraction of pixels” (page 471, col. 2, ¶ 2 and figure 1, as cited above)- in regards to claim interpretation for the second distance threshold, see at least ¶ 54 in the instant specification, there is only “a classification threshold” [singular], i.e. as interpreted in light of the specification the claimed second distance threshold is encompasses subject matter such as Bigdeli’s “threshold” that is applied between each “test GMM” and each “training GMM” (page 339, col. 2, last paragraph))

Regarding Claim 30.
Cohen teaches: 
	The method of claim 23, wherein the trace signal comprises a Fourier Transform Infrared energy absorption spectrum signal.  (Cohen, section 3, ¶ 1 teaches that the signal is “observed using Fourier Transform Infrared microscopy (FTIR), in regards to this including the 

Regarding Claim 31.
Cohen, as taken in combination above teaches: 
	The method of claim 23, wherein the trace signal is associated with one of a plurality of pixels generated for the sample, and the method comprises:
	repeating the determining that the sample has the class type for each of the plurality of pixels; (Cohen, page 246, col. 1, ¶ 1 teaches that there is a “spectrum for each pixel” [a trace signal for each pixel] wherein page 246, col. 2, ¶ 2 teaches that the technique is to determine the class/label of each “pixel”, i.e. the process is repeated for each pixel – to clarify see section 1, ¶ 1 which teaches “Our goal is to assign each pixel a class …to which the spectrum is supposed to belong.”, i.e. each pixel is classified)

Fernandez teaches:
	and determining a count of pixels having the class type.  (Fernandez, page 471, col. 2, ¶ 2 teaches determining the “fraction of pixels” for each class, e.g. see fig 1 – the class type of the sample is determined by using a fraction of the pixels that are classified by that class type, the fraction includes counting pixels with the class type)

Regarding Claim 32.
Emmott, as taken in combination above, teaches: 
The method of claim 1, comprising:
	generating a third configuration comprising a third set of model parameters for defining a 3U.S. Application No. 15/333,888 Response to February 21, 2021 Final Office Actionthird modeling signal to model the portion of the trace signal; (Emmott, as taken in combination above , teaches this – see the citations above, i.e. Emmott teaches that there is a “diverse set of models by varying...EM initializations...for each value of k” as detailed above, this claim limitation is merely part of generating this “diverse set of models” [as there is a plurality of models – the plural form of “models” is clearly used] – in other words, this claim is merely conveying embodiments such as the one taught in the combination of prior art as relied upon above, wherein the “diverse set of models” includes at least 3 models – Emmott teaches this, there are a plurality of models)
	determining a third distance between the third configuration and the first classification cluster;(Emmott, as taken in combination above , teaches this – see the citations above, i.e. Emmott teaches that there is a “diverse set of models by varying...EM initializations...for each value of k” as detailed above, this claim limitation is merely part of generating this “diverse set of models” [as there is a plurality of models – the plural form of “models” is clearly used] – in other words, this claim is merely conveying embodiments such as the one taught in the combination of prior art as relied upon above, wherein the “diverse set of models” includes at least 3 models – Emmott teaches this, there are a plurality of models)
	determining that the sample has the class type based on a determination that at least one of the first distance, the second distance, or the third distance is within a first distance threshold from the first classification cluster, wherein:(Emmott, as taken in combination above , teaches this – see the citations above, i.e. Emmott teaches that there is a “diverse set of models clearly used] – in other words, this claim is merely conveying embodiments such as the one taught in the combination of prior art as relied upon above, wherein the “diverse set of models” includes at least 3 models – Emmott teaches this, there are a plurality of models)
	a third number of model parameters in the third set equals the first number of model parameters in the first set;(Emmott, as taken in combination above , teaches this – see the citations above, i.e. Emmott teaches that there is a “diverse set of models by varying...EM initializations...for each value of k” as detailed above, this claim limitation is merely part of generating this “diverse set of models” [as there is a plurality of models – the plural form of “models” is clearly used] – in other words, this claim is merely conveying embodiments such as the one taught in the combination of prior art as relied upon above, wherein the “diverse set of models” includes at least 3 models – Emmott teaches this, there are a plurality of models)
	the third configuration has a third score for fitting the portion of the trace signal less than the first score. (Emmott, as taken in combination above , teaches this – see the citations above, i.e. Emmott teaches that there is a “diverse set of models by varying...EM initializations...for each value of k” as detailed above, this claim limitation is merely part of generating this “diverse set of models” [as there is a plurality of models – the plural form of “models” is clearly used] – in other words, this claim is merely conveying embodiments such as the one taught in the combination of prior art as relied upon above, wherein the “diverse set of models” includes at least 3 models – Emmott teaches this, there are a plurality of models)


Regarding Claim 34.
Cohen, as modified above by Emmott and Bigdeli, does not explicitly teach:
	The method of claim 33, further comprising classifying the tissue sample as being malignant based on a count of the plurality of pixels associated with malignant tissue. 

Fernandez teaches: 
The method of claim 33, further comprising classifying the tissue sample as being malignant based on a count of the plurality of pixels associated with malignant tissue. (Fernandez, page 471, col. 2, ¶ 2 teaches determining the “fraction of pixels” for each class, e.g. see fig 1 – the class type of the sample is determined by using a fraction of the pixels that are classified by that class type, the fraction includes counting pixels with the class type)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified above, on classifying pixels in FTIR image data of a sample with the teachings from Fernandez on determining a classification of the sample based on the fraction of pixels with a class type, such as for cancer detection (Fernandez, page 472, col. 1, ¶1). The motivation to combine would have been that the technique in Fernandez would have enabled the system to automatically classify the sample and also provide a classification accuracy for the sample based on the 

Regarding Claim 40.
Cohen teaches:
	A system, comprising: (Cohen, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample” – and in regards to the use of a computer, Cohen’s technique uses a computer, e.g. § 3 – this is an “algorithm” for a computer)
	a memory to store a plurality of instructions; (Cohen, as cited above, this is part of a computer)
	and a processor to execute the instructions to:(Cohen, as cited above, this is part of a computer)
	acquire a trace signal for the sample;(Cohen, as cited above, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”, i.e. the FTIR data that is acquired is a trace signal for a sample, e.g. see figure 1)
	generate a first configuration comprising a first set of model parameters for defining a first modeling signal to model a portion of the trace signal;(Cohen, as cited above, teaches applying GMM to FTIR data for a sample – see page 250, ¶ 2-3 which teaches that the maximum likelihood technique that allows one to estimate simultaneously the number of meaningful classes and the pixel labels. Density estimation is already at the core of the most classical spectral method in which the observed spectra are modelized as a realization of a Gaussian Mixture Model (GMM)”)
	the first configuration has a first score for fitting the portion of the trace signal; (Cohen, as cited above, e.g. the abstract, teaches that the GMM technique is using “maximum likelihood” as the score for the GMM fitting the signal, see § 1 on page 247 for clarification, e.g. last paragraph “For a given model...we will use the maximum likelihood estimate”, i.e. page 249 col. 1, ¶ 2 “This first estimate is itself obtained by the classical EM algorithm, whose initialization is obtained by selecting the parameter set yielding the largest likelihood” [the best score])

	Cohen does not explicitly teach: 
define a first classification cluster in a parameter domain, the first classification cluster having a class type;
	generate a second configuration comprising a second set of model parameters for defining a second modeling signal to model the portion of the trace signal;
	determine a first distance between the first configuration and the first classification cluster;
	determine a second distance between the second configuration and the first classification cluster;
	and determine that the sample has the class type based on a determination that at least one of the first distance or the second distance is within a first distance threshold from the first classification cluster, wherein:
	a first number of model parameters in the first set equals a second number of model parameters in the second set;
	the second configuration has a second score for fitting the portion of the trace signal less than the first score.

Emmott teaches:
	generate a second configuration comprising a second set of model parameters for defining a second modeling signal to model the portion of the trace signal;(Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the 
	a first number of model parameters in the first set equals a second number of model parameters in the second set;(Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the emergence of cancer cells in normal tissue” – the see page 4, section 4.5 which teaches creating an “ensemble Gaussian Mixture Model” in which a “set of models”, i.e. “GMMs” are created by varying the “number of clusters”, the “EM initializations”, and the like to create an ensemble of GMMs [set of configurations] in which any GMM with a “likelihood” is “less than 85% of the best observed value are discarded” in order to rank the data points “by the remaining GMMs”, i.e. an ensemble/set of GMMs [and their respective configurations/model parameters] is generated wherein only GMMs with a score higher than 85% of the “best”, i.e. maximum, are retained – this ensemble includes both the highest scoring, and GMMS with scores less than the highest – to clarify, Emmott teaches that for each value of k that there is also a “diverse set of models by varying...EM initializations...” wherein the “average...likelihood” is calculated for “each value of k”, i.e. that this is taking the average likelihood for “each value of k”, i.e. including the average for the GMMs with the varied “EM initializations” – in other words, for each value of k there is a set number of model parameters [e.g., see Cohen page 247 col. 1, ¶ 2 which clarifies, e.g., there is a “mean of the kth component”] therefore for a set value of k, e.g. “6”, there is an ensemble of a “diverse set of models” generated based on “varying...the EM initializations” – each member in this ensemble [each configuration] for a value of k has the same number of parameters as they have the same number of k components )
	the second configuration has a second score for fitting the portion of the trace signal less than the first score.(Emmott, § 4.5 teaches that there is a “likelihood” score for fitting the data [ the portion of the trace signal], also see Cohen as cited above, page 249 col. 1, ¶ 2 “This first estimate is itself obtained by the classical EM algorithm, whose initialization is obtained by selecting the parameter set yielding the largest likelihood” – i.e. these are using the same score of “likelihood” wherein Emmott teaches that there is an “ensemble” of GMMs that are “diverse” wherein there is a best score and a score less than the best, hence Emmott takes the “average...likelihood for each value of k”)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen on using a GMM algorithm to analyze FTIR data with the teachings from Emmott on using an ensemble GMM algorithm for anomaly detection such as for detecting cancer cells. The motivation to combine would have been that the technique in Emmott provides a more “robust” solution than “a single Gaussian” , 

Cohen, as modified by Emmott, does not explicitly teach:
define a first classification cluster in a parameter domain, the first classification cluster having a class type;
	determine a first distance between the first configuration and the first classification cluster;
	and determine that the sample has the class type based on a determination that at least one of the first distance or the second distance is within a first distance threshold from the first classification cluster, wherein:

Bigdeli teaches: 
define a first classification cluster in a parameter domain, the first classification cluster having a class type;(Bigdeli, as cited below, in summary teaches a method which includes defining a classification cluster, i.e. a “training…cluster” in the parameter domain defined by the plurality of model parameters [Bigdeli teaches that the input data is clusters as well into separate GMMs – for the modification, Bigdeli’s input is the GMM ensemble of Cohen as modified by Emmott] – the training clusters are “labelled”, i.e. each training cluster has an associated class type, the purpose of Bigdeli is to model the “group behavior” of new data, to mitigate the “negative impact of noise” on the input data
In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behavior rather than individual characteristics of incoming samples…. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs.”, then see page 339-340, section B and fig. 2 and 3- this teaches “In the CPAD method, each training and test cluster are represented by a GMM…” [the training cluster is a classification cluster that is defined in a parameter domain defined by the plurality of model parameters – both the training and test clusters are in the same domain, as the distance between them is measured] 
section B further clarifies “In this case measurement is required to determine the label of each testing GMM. That is why in this paper a distance is require to measure the similarity between the training and testing GMMs…With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs [in light of the applicant’s numerous arguments for this – this clearly uses the plural form of GMM]. If the similarity between a test GMM [in the combination, each member of the ensemble] and one of the training GMMs [in the combination, each classification cluster, wherein there is at least one] is more than a threshold, the test cluster should be generated by the same distribution as that of training cluster – see section IV for details on the GMM, and section 5 for the distance metric)
	determine a first distance between the first configuration and the first classification cluster;(Bigdeli, as cited above teaches determining a “distance” between a GMM of testing data and a GMM of training data [and/or multiple training clusters], taken in combination with 
	determine a second distance between the second configuration and the first classification cluster;(Bigdeli, as cited above teaches determining a “distance” between a GMM of testing data and a GMM of training data [and/or multiple training clusters], taken in combination with Cohen, as modified by Emmott, it would be obvious to apply this technique for each GMM in the set of GMMs, as each is a separate testing cluster, i.e. Bigdeli’s technique is to label each test GMM by determining the distance to one or more training GMM(s), e.g. see page 340, col. 1 ¶ 2 and figure 2, and then see § 5 for the “distance” measure between each of the GMMs in the ensemble and the “training clusters”)
	and ... a determination that at least one of the first distance or the second distance is within a first distance threshold from the first classification cluster, wherein:(Bigdeli, as cited above, teaches determining the “label” for each input GMM [example of determining the class type of the sample, i.e. an example of a class type is a label], see page 340, col. 1, ¶ 1 teaches “all the test samples in a cluster are labelled based on their collective characteristic represented by a GMM” wherein this is based on “distance” – then see section 5 which provides various means of determining the distance between the configurations, e.g. “our proposed method considers the pairwise distance of each normal component from both GMM” [example of distance between each configuration and the training cluster(s)/classification cluster(s), then by finding the distances between the test GMMs and training GMMs. If the similarity [distances] between a test GMM and one of the training GMMs is more than a threshold [distance threshold], the test cluster should be generated by the same distribution as that of training cluster.”, in other words Bigdeli determines the distance between each configuration and the classification cluster, wherein a distance threshold is used to determine if the GMMs are similar enough to have the same label, thus labelling the “incoming samples” by the label of the GMMs wherein the labels are determined by “distance”)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified by Emmott on a system which generates an ensemble of GMMs from FTIR data for anomaly detection such as for cancer detection with the teachings from Bigdeli on a technique for “anomaly detection” which detections anomalies based on “group behavior”. The motivation to combine would have been that the technique of Bigdeli minimizes the impact of “noise” in the dataset for the classification result (Bigdeli, abstract and section I), i.e. “By clustering test samples before detection process, we remove the effect of noise on test samples, and consequently improve the accuracy of anomaly detection algorithm.”

Cohen, as taken in combination above with Emmott and Bigdeli, does not explicitly teach:
determine that the sample has the class type based on a...


determine that the sample has the class type based on a...(Fernandez, page 471, col. 2, ¶ 2 teaches determining the “fraction of pixels” for each class, e.g. see fig 1 – the class type of the sample is determined by using a fraction of the pixels that are classified by that class type, the fraction includes counting pixels with the class type)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified above, on classifying pixels in FTIR image data of a sample with the teachings from Fernandez on determining a classification of the sample based on the fraction of pixels with a class type, such as for cancer detection (Fernandez, page 472, col. 1, ¶1). The motivation to combine would have been that the technique in Fernandez would have enabled the system to automatically classify the sample and also provide a classification accuracy for the sample based on the fraction of pixels, e.g. see Fernandez table 1 – this would have allowed a user of the system to more quickly detect cancer cells. 


Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 and in further view of Mittal et al., “Classification of breast tissue for cancer diagnosis: Application of FT-IR imaging and random forests”, 2013

Regarding Claim 25.

	The method of claim 24, wherein the class type comprises ... carcinoma.  (Emmott, section 2, ¶ 1 teaches detecting “cancer cells in normal tissue”)

Mittal teaches: ductal carcinoma (Mittal, abstract teaches “Fourier Transform Infrared (FTIR) imaging has gained wide acceptance for determining the chemical composition of biomedical samples. In particular, there is significant potential for performing quantitative, label-free imaging in tumor biopsies. In this paper, we demonstrate the advantages offered by FTIR imaging for the analysis of breast tumor biopsies from hundreds of patients. We then demonstrate the software that we have developed for tissue classification.”, i.e. using FTIR for determining a class type of breast cancer [ductal carcinoma]).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified above, on a system for analyzing FTIR data such as for cancer detection with the teachings from Mittal on using FTIR trace signals for diagnosing breast cancer. The motivation to combine would have . 

Claim 38 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Dobry et al., “Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal”, 2011

Regarding Claim 38.
Cohen, as modified by Bigdeli and Emmott above, does not explicitly teach:
	The method of claim 37, wherein defining the ellipsoid comprises defining a singular value decomposition matrix. 

Dobry teaches:
The method of claim 37, wherein defining the ellipsoid comprises defining a singular value decomposition matrix. (Dobry, abstract, teaches applying “dimension reduction” to “GMM” “supervectors” which are used for classification with an “SVM” wherein the supervectors are projected into a “reduced space” for “training” a classifier, then see section A on page 1976 which clarifies that a “GMM” is fit to a signal, e.g. a “speech utterance” wherein “Each GMM in other words Dobry classifies the GMMs by projecting the parameters of the GMMs into a “space” created by SVD – SVD is applied to both the input data and the training data [training data used for classification cluster])

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, modified above, on a system which classifies GMMs by determining the distance between the GMMs from test/training data with the teachings from Dobry on performing the classification in a “lower dimension space” such as formed by SVD. The motivation to combine would have been that dimensional reduction before the classification would have resulted in a “faster and better separability” between the classes (Dobry, page 1975, col. 2, ¶ 1), i.e. performing a dimension 
Dobry is considered an analogous art as Dobry is reasonably pertinent to the problem faced by the inventor of classifying the GMMs using SVD. 
In addition, Dobry is also analogous as Dobry is reasonably pertinent to the problem of classifying GMMs in a parameter domain – specifically, Dobry provides evidence that the applicant’s use of a parameter domain for classification is substantially similar to classifying in a supervector space, which is typically found in speech analysis. One of ordinary skill, when faced with the problem of classifying spectra data from FTIR, would have reasonably turned towards audio/speech analysis techniques as both are dealing with spectral data, just at different frequency ranges, and audio/speech analysis has numerous examples of applying GMMs to spectral signals and then classifying in the supervector/parameter domain (see pertinent prior art of record below). 


Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 and in further view of Dobry et al., “Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal”, 2011

Regarding Claim 28.
Cohen, as taken in combination above, does not explicitly teach:
	The method of claim 27, wherein defining the ellipsoid comprises defining a singular value decomposition matrix. 

Dobry teaches:
	The method of claim 27, wherein defining the ellipsoid comprises defining a singular value decomposition matrix. (Dobry, abstract, teaches applying “dimension reduction” to “GMM” “supervectors” which are used for classification with an “SVM” wherein the supervectors are projected into a “reduced space” for “training” a classifier, then see section A on page 1976 which clarifies that a “GMM” is fit to a signal, e.g. a “speech utterance” wherein “Each GMM model is represented by GMM supervector , formed by concatenating all the Gaussians’ means” [parameters of each GMM] and then a “dimension reduction approach” is applied to the supervector, before classification, to “reduce the dimension size”, then see section III.A on page 1977 which teaches “PCA” is used which uses “SVD” to perform the “dimensional reduction” and then see page 1980, col. 1, ¶ 3 which teaches that the classification using SVM “involves a distance calculation” between the input and training vectors, in other words Dobry classifies the GMMs by projecting the parameters of the GMMs into a “space” created by SVD – SVD is applied to both the input data and the training data [training data used for classification cluster])

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, modified above, on a system which classifies GMMs by determining the distance between the GMMs from test/training data with the teachings from Dobry on performing the classification in a “lower dimension space” such as formed by SVD. The motivation to combine would have been that dimensional reduction before the classification would have resulted in a “faster and better separability” between the classes (Dobry, page 1975, col. 2, ¶ 1), i.e. performing a dimension reduction before classification would have made the system faster at classifying as the classification analysis would have been performed on smaller, i.e. reduced, data sets. 
Dobry is considered an analogous art as Dobry is reasonably pertinent to the problem faced by the inventor of classifying the GMMs using SVD. 
In addition, Dobry is also analogous as Dobry is reasonably pertinent to the problem of classifying GMMs in a parameter domain – specifically, Dobry provides evidence that the applicant’s use of a parameter domain for classification is substantially similar to classifying in a supervector space, which is typically found in speech analysis. One of ordinary skill, when faced with the problem of classifying spectra data from FTIR, would have reasonably turned towards audio/speech analysis techniques as both are dealing with spectral data, just at different frequency ranges, and audio/speech analysis has numerous examples of applying GMMs to . 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chi et al., “CLUSTER-BASED ENSEMBLE CLASSIFICATION FOR HYPERSPECTRAL REMOTE SENSING IMAGES”, 2008 – see § 2 including § 2.2 and §2.3
Ormoneit et al., “Averaging, Maximum Penalized Likelihood and Bayesian Estimation for Improving Gaussian Mixture Probability Density Estimates”, 1998 – this paper cites Perrone et al., as cited above, wherein this applies the “idea” from Perrone of an “ensemble” which converges to “different local minima” to GMMs (see § 1 and the abstract), see § II and § III for more clarity, including page 641 col. 1, last paragraph and see § 6 for various examples of this technique
Perrone et al., “When Networks Disagree: Ensemble Methods for Hybrid Neural Networks”, 1992 – see § 3, see figure 2 – this shows an ensemble estimate figure wherein figure 2(b) shows the “True estimate” as a solid black line [e.g., see figure 5 of the instant specification which uses the same solid black line for the data] and then shows the “estimates” as dashed lines – the grey lines in figure 5 in the instant specification. The only distinction between these figures is that the figure 5 is produced with a GMM, whereas Perrone in 1992 used only a single Gaussian – however one of ordinary skill would readily infer that when k=1 for a GMM, e.g. for the GMM as disclosed in the specification – this would have produced a figure substantially similar to Perrone figure 2(b). And to further clarify on this “Example” in Perrone – see the abstract, this is an example to demonstrate the effect of “local minima” on such a fitting, i.e. this is to show why ensemble techniques are “much better than either of the individual estimates” (Perrone, § 3- the description of this figure). 
Quan et al., “Hybrid Generative-Discriminative Models for Speech and Speaker Recognition”, March 2002, IDIAP Research Project, Dalle Molle Institute for Perceptual Artificial Intelligence, Switzerland – see § 2.3 in full, including § 2.3.2, see § 3.1, §3.2, including page 9 ¶ 3, see § 4.1, see § 4.3
Rafiee et al., “Region-of-interest extraction in low depth of field images using ensemble of clustering and difference of Gaussian approaches”, 2013, see the abstract this is a “two-stage unsupervised segmentation approach based on ensemble clustering” first using a “mixture-based model” with an “ensemble EM clustering algorithm” – see § 2.2 , and see figure 4 – and see page 2688 col. 2, last paragraph, also see § 3 
Shinozaki et al., “GMM AND HMM TRAINING BY AGGREGATED EM ALGORITHM WITH INCREASED ENSEMBLE SIZES FOR ROBUST PARAMETER ESTIMATION” – see the abstract, § 1, § 2 and figures 1-3  - this teaches various techniques for creating multiple GMM models using the “EM” algorithm to create an “ensemble” of “size N” (§ 3.2, ¶ 1-2)
Smith et al., “Cluster ensemble Kalman filter”, 2007 – see the abstract, see § 2.1 which teaches using GMMs for this, see figure 5 which shows the “ensemble” compared to signal data
Verma et al., “Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning”, 2012 – see the abstract and § 1 along with § 3 and figure 2  
Yang et al., “Neural network ensembles: combining multiple models for enhanced performance using a multistage approach”, 2004 – see figure 4 and §2.3.2
Zhang et al., “Breast cancer diagnosis from biopsy images with highly reliable random subspace classifier ensembles”, 2011 – see the abstract and § 1 and see figure 3 – this is another publication from Zhang’s PhD dissertation as cited above 
Zhang et al., “Cascade of Classier Ensembles for Reliable Medical Image Classification”, PhD Dissertation, University of Liverpool, March 2014 – see § 2.3.2, see figure 2.5, see page 52 ¶2, and see page 15 ¶ 3
Zhuang et al., “Acoustic Fall Detection Using Gaussian Mixture Models and GMM Supervectors” – see the abstract, see figure 2 and see §3 and §4, also see figure 3
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID A. HOPKINS whose telephone number is (571)272-0537.  The examiner can normally be reached on Monday to Friday, 8:30AM to 5 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/D.A.H./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128