DETAILED ACTION
This action is in response to the amendments filed on December 2nd, 2020. A summary of this action:
Claims 1-9, 11-17, 19-22 have been presented for examination.
Claims 1, 4, 11, 14 , 19 have been amended
Claims 20-22 are newly added
Claims 20-22 are objected to for informalities
Claim 11, 14-15, 17, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 
Claim 1-2, 4-5, 7-9, 12, 19, 20, 22 are rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 and in further view of Mittal et al., “Classification of breast tissue for cancer diagnosis: Application of FT-IR imaging and random forests”, 2013
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 and in and in further view of Dobry et al., “Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal”, 2011
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015, and in further view of Zhang et al., “Classification cation of Fourier Transform Infrared Microscopic Imaging Data of Human Breast Cells by Cluster Analysis and Artificial Neural Networks”, 2003
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015  and in further view of Dobry et al., “Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal”, 2011
This action is made Final

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment/Arguments
Regarding the claim objections
	In light of the applicant’s amendments and supporting arguments, the objections are WITHDRAWN.
	However, newly added dependent claims are objected to below, necessitated by amendment. 

Regarding the § 112(a) Rejection
	In light of the applicant’s amendments and supporting arguments, the rejection under § 112(a) is WITHDRAWN.

Regarding the § 103 Rejection
	As an initial matter, the applicant uses claim 1 as representative, i.e. “Claims 11 and 19 include similar limitations...” (Remarks, page 13).
	Although the claims are not parallel, the Examiner will use claim 1 as representative. However if the scope of the other parallel claims fails to recite the same/similar subject matter that is being argued for claim 1, the argument is considered moot for claims 11 and 19. Limitations are not read in from the specification, nor from other claims. 

The applicant submits (Remarks, page 12):
The Examiner alleges that Cohen teaches generating a set of configurations for defining a modeling signal to model a portion of the trace signal, where at least a first one of the configurations in the set has a maximum score over the set of configurations, and a subset of the set of configurations includes multiple configurations with scores less than the maximum score...

	The argument is moot. The Examiner already admitted on the record that Cohen does not explicitly teach “generating a diagnostic set of configurations and a subset of the set of configurations includes multiple configurations with scores less than the maximum score;” (final rejection, page 10)
to clarify, one of ordinary skill would have known from both the prior art and their own knowledge that the optimization algorithm to optimize a GMM [a configuration] generates iteratively a set of GMMs and then uses the most optimal GMM from the optimization. 
	Cohen, as relied upon, does not explicitly teach the “subset of the set of configurations includes multiple configurations with scores less than the maximum score” (non-final rejection, page 10) – these other configurations are discarded from the system of Cohen as they are not optimal. 
	To further clarify – while one of ordinary skill may reasonably infer that the optimization algorithms for the GMM as taught by Cohen do in fact create/generate a plurality configurations, e.g. a plurality of GMMs, and then select the most optimal wherein some of the configurations have less than a maximum score, Cohen, as relied upon, would then have discarded these suboptimal configurations. In other words, Cohen generates the most optimal configuration but as Cohen does not actually retain the “set of configurations” one of ordinary skill would find that this would not teach that specific portion of the claimed invention, as has already been admitted on the record. 

Instead, one of ordinary skill would turn to Emmott, which was relied upon for the applicant’s argued limitation. 
	See the non-final rejection, page 11. 
	Emmott teaches now using an “ensemble” GMM, i.e. a “set of models”, i.e. an “ensemble” comprises a set of GMMs [example of configurations] – each member of an “ensemble” is a GMM, and the “ensemble” is an entire set. And, as per the motivation to combine, Emmott’s “ensemble” of GMMs provides a more “robust” solution then using a “single Gaussian”, i.e. a single GMM. See Emmott, § 4.5 for more details, this is part of the relied upon portions of Emmott. 
	To be clear – the system of Emmott is being relied upon to modify the teachings of Cohen to fit an ensemble of GMMs [set of configurations] which define a modeling signal to model a portion of a trace signal. Cohen uses a single GMM, and, as per Emmott, this is not “robust” and a more “robust” solution is to use an ensemble of GMMs. 

The applicant further submits (Remarks, page 12):
Emmott generates "a diverse set of models". Each different value of k represents an entirely different model, since a k value of four equates to a GMM with four components and a k value of six represents a GMM with six components. Emmott generates a score for "each value of R' and "the values of k whose average is less than 85% of the best observed value are discarded". Hence, Emmott compares the score for each different model and discards those models with scores less than 85%...Claim 1 recites that each configuration represents a set of the plurality of model parameters for an instance of the modeling signal. In contrast, the set of data generated by Emmott represents sets of parameters for different models, not sets of the plurality of model parameters for instances of the modeling signal.


	An ensemble of GMMs is a set of configurations, each GMM in the ensemble is a configuration. The ensemble of GMMs is defined by a plurality of model parameters for each GMM, and each of the GMMs represents a set of the model parameters.
	Emmott, as relied upon and taken in combination, further teaches that the ensemble only contains configurations with scores more than 85%, i.e. all of the configurations in the ensemble have a score of at least 85%. Each of these GMMs have a set of associated parameters, and the ensemble therefore obviously has a plurality of model parameters comprising each of the sets wherein this plurality defines the ensemble. 

	The applicant’s argument is, in essence, that each configuration cannot be a GMM. The Examiner first refers the applicant to dependent claim 4, “The method of claim 1, wherein each set of the plurality of model parameters defines a Gaussian mixture.”
	Clearly, from the dependent claims each set of the model parameters, along with the associated configuration, is a “Gaussian mixture”, i.e. a Gaussian Mixture Model, the same as Emmott. 
	By the applicant’s arguments, if the claims were to actually reflect that each configuration cannot be a GMM then at least claim 4 would necessitate a rejection under § 112(d). However, the applicant’s interpretation is a unreasonable, clearly from both the claims and the specification claim 4 does in fact further limit the claimed invention and it more clearly conveys the exemplary embodiment of the claimed invention. 

he applicant’s argument is rooted in the specification, not the claims. Limitations are not read into the claims from the specification. 
	To summarize – the applicant’s argument is that each GMM is a “different model” and therefore this cannot read on the present claims. This argument is not persuasive. 
	The “ensemble” of the GMMs would reasonably teach the “set of configurations for defining a modeling signal” as recited in the present claims, i.e. the “ensemble” is the model, each GMM in the ensemble is merely a component of the model. 

	To how the applicant’s argument is relying upon the specification for unclaimed features – see the instant specification ¶ 23 which recites “In the present illustration, the model signal is a GM with four components” – this is the claim interpretation that is being implicitly relied upon by the applicant’s arguments. There is no recitation of this in the present claims. Nor is there even a brief recitation of using GMMs, despite it being a core feature of the exemplary embodiment. Nor is there anything in the claims that suggest limiting the GMMs to four components – this would require a substantial amendment of the claims. 
	To clarify – while the applicant’s disclosed invention describes that the exemplary embodiment uses a “GM configuration portfolio” (¶ 35) wherein each “configuration” is a GMM and wherein each GMM is limited to 4 components for a GMM and wherein the GMMs are varied by “initialization seeds” (¶ 35) – this embodiment is not clearly and explicitly recited in the claims, and limitations are not read in from the specification. The disclosed invention conveys an embodiment in which each “configuration” is a GMM, along with its associated set of parameters, wherein each GMM has k set to 4, wherein each GMM is varied by a the claims lack numerous recitations that would actually limit the claims to this embodiment. Clearly, the claims are intended to be broader, and they are interpreted under the broadest reasonable interpretation. 

	In addition, see Emmott § 4.3 – the applicant’s argument is solely focused on Emmott varying the “k” however, see § 4.3 Emmott varies “k”, and Emmott also varies “the EM initializations” in order to “generate a diverse set of models”, in other words Emmott is not only varying the number of components, but is also varying the “EM initializations” that are used to further diverse the set of GMMs, in other words for each value of k there are a number of GMMs with varying “EM initializations” that are used. 

	If the claims were to actually narrowly reflect the exemplary embodiment, then one of ordinary skill would still have to consider whether the claimed invention would be obvious over the prior art of record, e.g. such as by setting k to 4 for the ensemble GMM of Emmott and instead creating a “diverse set” by varying the “EM initializations” as disclosed by Emmott, and then applying thee scoring of the 85% of the best observed to the EM initializations. One of ordinary skill would reasonably find obvious such as variation of Emmott’s technique in creating an Ensemble GMM model, e.g. such as this would be obvious to a person of ordinary skill if the input data that the ensemble GMM was fitting was a multi-modal distribution with a set number of modes, e.g. 4. In other words, if one of ordinary skill was fitting an ensemble GMM to the spectra shown in figure 1 of Cohen (also see figure 5), one of ordinary skill would rd one at approximately 2800, and so one wherein the major variation between these distributions in Fernandez is the spread and max value, i.e. this data would have made it obvious to a person of ordinary skill to considering trying to fix the number of components and improving the ensemble GMM by varying the “initializations” for the given set of data. This is provided for compact prosecution, should the applicant wish to amend the claims to further narrow the scope of the claims. 

	In addition, the applicant’s arguments are not even consistent with what the specification conveys when read in full. For example, see ¶ 34 which recites “Figure 5 is a diagram illustrating an example pixel trace signal...and a set of four locally optimal configurations 500 for modeling the signal. Rather than listing the 12 model parameters for each configuration, the Figure shows the four Gaussians whose sum is the locally optimal modeling signal g.” and see figure 5 above, and see table 1.
	

    PNG
    media_image1.png
    340
    924
    media_image1.png
    Greyscale

	One of ordinary skill would reasonably infer from the specification that “each configuration” is, at least in some embodiments, a GMM, wherein the “set of four locally optimal configurations for modeling the signal” is the exemplary embodiment of “a set of configurations for defining a modelling signal...”, in other words that the “modeling signal” as claimed would reasonably encompass, in an exemplary manner, an ensemble of GMMs [“a set of configurations”] wherein each member of the set/ensemble is used to model the same portion of the trace signal, i.e. that the “modeling signal” as claimed encompasses, in an exemplary fashion, an ensemble of GMMs being fit to the same portion of the trace signal as clearly depicted in figure 5. 
	As clearly depicted in figure 5, a “portion of the trace signal” from figure 4 is modelled wherein that portion of the trace signal, as depicted, has a plurality of “configurations”, i.e. GMMs, which are fit to that one portion of the trace signal – each of the individual boxes in figure 5. 

	To the applicant’s argument that “In contrast, the set of data generated by Emmott represents sets of parameters for different models, not sets of the plurality of model this is not persuasive for at least the reasons described above. The ensemble is reasonably considered a model, i.e. an ensemble model, wherein the ensemble comprises GMMs and wherein each GMM has its own associated set of parameters, and therefore, obviously, the ensemble has a plurality of parameters comprised of each of the sets of parameters for each GMM. 

	The applicant’s arguments are not pervasive – they implicitly rely upon unclaimed features, and then they further rely upon excluding an exemplary embodiment from the specification by mere argument without any supporting recitation in the claim. 

The applicant further submits (Remarks, page 13): 
Bigdeli fails to teach defining a classification cluster in a parameter domain defined by the plurality of model parameters and determining a distance between each member of the set of configurations the classification cluster. In contrast, Bigdeli defines a cluster of data points and generates a single GMM to model the cluster. Bigdeli does not define a cluster in the parameter domain. The cluster of data is the input used to generate the GMM model, so it is in the data domain. Bigdeli compares a single GMM generated for an incoming cluster of data to a single GMM generated for an established cluster of data by determining the distance between the GMM models (see Bigdeli, Section V). Thus, Bigdeli does not determine a distance between each member of a set of configurations and a classification cluster defined in a parameter domain as recited in claim 1. 
	The applicant’s argument is not persuasive. The prior art of record, as taken in combination, teaches the claimed invention.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
The applicant’s arguments is a piecemeal attack on Bigdeli without any consideration of the combination of references relied upon, nor the rationale underpinning that was used for the combination. 

	Bigdeli, as relied upon – see pages 12-13 of the non-final rejection, teaches a technique to “label the test clusters by finding the distances between the test GMMs and training GMMs” wherein “each training and test cluster are represented by a GMM” (see emphasized text on page 13).
	So clearly, Bigdeli, is NOT relied upon for “In contrast, Bigdeli defines a cluster of data points and generates a single GMM to model the cluster” as argued by the applicant. For example, NOTHING in Bigdeli as relied upon conveys “a single GMM to model the cluster” – the portions of Bigdeli relied upon clearly recite numerous times that there are a plurality of GMMs and a plurality of clusters. 
	Furthermore, the applicant argues “Bigdeli compares a single GMM generated for an incoming cluster of data to a single GMM generated for an established cluster of data by determining the distance between the GMM models (see Bigdeli, Section V).” – This is not what Bigdeli is relied upon for, nor is it what Bigdeli teaches. Nothing in Bigdeli suggests that this is limited to a “single GMM generated for an incoming cluster of data to a single GMM generated for an established cluster of data” – Bigdeli clearly and repeatedly teaches that there are multiple GMMs for both the test/training clusters, i.e. the teachings of Bigdeli are to “label the In addition, see the combination relied upon – Bigdeli is relied upon for a technique to classify input GMMs [i.e. “Bigdeli’s input is the GMM ensemble of Cohen/Emmott”] (non-final rejection, page 21) by measuring the distance between the GMMs in the ensemble to the GMMs which represent the “test clusters”.
	The applicant’s assertion is rooted in § 5 of Bigdeli, taken entirely on its own and without regard to the relied upon portions of Bigdeli. To clarify this section 5 – refer to it. This section is teaching the specific technique being used to measure the “distance” between “two GMMs” such as “the pairwise distance of each normal component from both GMM”. And while this is teaching for a distance metric between two GMMs, Bigdeli in numerous other portions that were relied upon clearly teaches that the “distance” is being measured between a plurality of “training and testing GMMs” (non-final rejection, page 21-22). In other words, this § 5 is merely disclosing the specific distance measurement between each of the training GMMs and each of the test GMMs. E.g., page 22 of the non-final rejection “If the similarity between a test GMM and one of the training GMMs is more than a threshold,” i.e. each “test GMM” is compared to each training GMM by a “distance” combined with a “threshold”. 
	And, as taken in combination, Bigdeli is being relied upon to classify/label each of the GMMs within the “GMM ensemble of Cohen/Emmott. One of ordinary skill would have found it obvious that given a plurality of input GMMs to find the distance between each of these GMMs and each of the GMMs of the test clusters. For performing this for each GMM in the ensemble, it would have been obvious to merely repeat the technique for each GMM in the ensemble, e.g. see figure 2 below and the other portions of Bigdeli, Bigdeli clearly renders 
	
	For a visual depiction of this see Bigdeli, figure 2, as relied upon (page 13 of the non-final rejection). 

    PNG
    media_image2.png
    534
    852
    media_image2.png
    Greyscale

	
	Furthermore, the applicant argues “Bigdeli does not define a cluster in the parameter domain. The cluster of data is the input used to generate the GMM model, so it is in the data domain.” The claim recites, in part: 
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type;
determining a distance between each member of the set of configurations and the
classification cluster; and

	Bigdeli teaches the above limitation, when taken in combination. Bigdeli teaches determining the “distance” between each of the input GMMs from the ensemble and a set of GMMs which represent training clusters. The set of GMMs which represent training clusters would obviously be encompassed by the “classification cluster”, i.e. each member of the set of the training GMMs is an example of a “classification cluster”. 
	For the GMMs being “in a parameter domain defined by the plurality of model parameters” – the GMMs are being compared based on a distance in a parameter domain, this would obviously include that the parameter domain is defined by the parameters of the ensemble GMM as the parameters of the ensemble GMM are being compared to the parameters of the training GMMs. In other words, the technique of Bigdeli relies upon determining a distance between the test and training GMMs, NOT the data underlying these. To directly compare the GMMs, and not the underlying data, one of ordinary skill would have found it obvious that this comparison is in the parameter domain as the GMMs are defined by their parameters. 
	To further clarify, see § 5 as relied upon which teaches in part “In this approach, instead of computing distance of each two normal components in two GMMs, only distance of the closest components are considered, and they are combined using their relative weights....our proposed method considers the pairwise distance of each normal component from both GMM” wherein this is based on the “Kullback-Leibler distance” (equation 2). Specifically, see equation 2 – at least the portion of equation 2 below shows that the KL distance includes the distance between the mean values of “normal distributions”  - specifically, the technique of Bigdeli is to determine the “distance of the closest components” of each of the GMMs, i.e. this finds the distance between the parameters of the test/training domains, as this is finding the distance between the “components”. One of ordinary skill would have readily found it obvious that this distance between components is NOT in the “data domain” as alleged by the applicant, but in the parameter domain as the parameters for the GMM define the components of the GMM, e.g. the mean of each component is an example of a parameter. 

    PNG
    media_image3.png
    43
    304
    media_image3.png
    Greyscale

	To further clarify, section § 5 then teaches “In the proposed distance, we first find the distance of first GMM components to the second GMM components as follows.” and then see equation 4 – equation 4 clearly recites “KL” wherein this is defined by equation 2, the Kullback-Leibler distance.

	Bigdeli, as taken in combination, teaches: 
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type;
determining a distance between each member of the set of configurations and the
classification cluster; and
	
	To summarize the above – Bigdeli, as relied upon and taken in combination teaches the above claim limitations. 
wherein in order to minimize the effect of noise Bigdeli classifies GMMs in a parameter domain by determining the distance between input/test GMMs [e.g., the GMMs from Emmott’s ensemble] and training GMMs [including the classification cluster] wherein the distance is determined between the “components” of the GMMs, specifically the distance between the parameters of the components, e.g. the mean values of the components. 
	As this “distance” is between the parameters of the GMMs this is obviously in a parameter domain that is defined by the plurality of model parameters – as this “distance” is between the parameters of the GMMs and NOT the data underlying the GMMs – by doing this the technique of Bigdeli minimizes the effect of noise in the dataset which improves the “accuracy” of the “anomaly detection algorithm”. 
	And while Bigdeli does teach using a plurality of training GMMs [examples of classification clusters], 1) the claims are not limited in scope to just a single classification cluster, but rather that there is at least one classification cluster, and 2) see the instant specification § 38 “at least one classification cluster is defined”, i.e. the claim interpretation 1 formerly recited is consistent with the specification.

The applicant further submits (Remarks, page 14): 
The Examiner alleges that Zhang teaches the subject matter of claim 13. To the contrary, as described above, none of the references teaches generating clusters in the parameter domain. Zhang fails to remedy this defect. Further, as described by the Examiner, Zhang teaches screening non-cell pixels from cell pixels. Zhang only analyzes the cell data to identify cancerous cells. Zhang excludes the non-cell pixels, so no configurations would be generated for these cells. There are no configurations generated during the screening phase of Zhang. Zhang teaches only generating modeling signals for the pixels that passed the screening criteria (i.e., only cell pixels). In contrast, claim 13 recites generating a screening set of configurations and determining if at least one of the screening configurations is within a screening cluster. Zhang does not define a screening cluster in the parameter domain or use such a cluster to determine if a diagnostic set of configurations using more random seeds should be generated. Accordingly, claim 13 is allowable for at least these additional reasons.	
The applicant’s argument is not persuasive. The prior art of record, as taken in combination, teaches the claimed invention.
	As an initial matter, claim 13 is not parallel to claim 1, and is narrower in scope. However, as the applicant uses claim 1 as representative for purpose of the previous arguments, the Examiner’s response to the previous arguments use claim 1 as representative, i.e. see the above response to arguments for how the prior art relied upon teaches the claimed invention of claim 11, as well as the actual § 103 rejection for claim 11.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). See the rejection on pages 41-44, the prior art as taken in combination teaches claim 13.
In regards to the assertion that “none of the references teaches generating clusters in the parameter domain” see above, this is not persuasive – the prior art of record teaches this, and 2) the claim does not recite “generating clusters in the parameter domain” – and while limitations are not read in from the specification.  

	The applicant then further argues “Further, as described by the Examiner, Zhang teaches screening non-cell pixels from cell pixels. Zhang only analyzes the cell data to identify cancerous cells. Zhang excludes the non-cell pixels, so no configurations would be generated for these cells. There are no configurations generated during the screening phase of Zhang.” The applicant’s argument is a piecemeal analysis of Zhang without addressing the actual combination that was made of record on at least page 43 of the non-final rejection, i.e. Zhang is not relied upon to teach the entirety of claim 13, but rather the combination of the prior art is relied upon for teaching claim 13.

	Zhang, as relied upon and taken in combination teaches the claimed invention.
	Zhang teaches a “two-step process” in which in step 1 the pixels are screened into “cell and non-cell categories” and then in step 2 “cell pixels are subsequently classified into carcinoma and normal categories”. 	
	As used in the combination of the prior art, “it would have been obvious to use a smaller ensemble of GMMs [e.g., fewer ··'EM initializations''] to first discriminate cell and non-cell pixels, given that the spectra are substantially different, i.e. it would have been obvious that it would require a much smaller ensemble of GMMs to screen out non-cell tissue as the differences between cell and non-cell spectra are much more substantial, e.g. see fig. 1”.
To clarify, Zhang provides a teaching of first classifying pixels into cell and non-cell categories [an example of screening them] using a fast classifier, e.g. Zhang uses a K-means (see Zhang abstract), and then performing a second classification of the screened pixels which are determined to be cells to classify them to either being normal or cancerous, wherein Zhang uses a more accurate classifier of an ANN. 
	To further clarify, Zhang was relied upon, as taken in combination, specifically to teach (non-final rejection, page 42, including the emphasis below): 
generating a screening set of configurations ...
defining a screening cluster in the parameter domain;
generating the diagnostic set... responsive to determining that at least one of the configurations in the screening set is within the screening cluster

Zhang teaches the act of “screening” each pixel in a screen using a first classifier, and then classifying the screened pixels, i.e. the “cell pixels”, as cancerous or normal using a second classifier. 
Zhang, as taken in combination, would have made the present claim 13 obvious to a person of ordinary skill.
The present claim, when read in combination, conveys in part 1) “screening...the selected pixel using a first number of seeds” and then 2) “generating the diagnostic set of configurations using a second number of random seeds greater than the first number responsive to determining....”

Claim 13, when read in light of the specification conveys to a person of ordinary skill that the claimed invention is, in the exemplary form, merely using the claimed algorithm with a smaller number of configurations in order to screen some of the pixels out. In other words, the recited claim limitations of claim 13 convey the embodiment that a small “set of configurations” are generated and then used with a “screening cluster” to screen out some of the pixels, and then a second set of “configurations” that has a larger number of “configurations” is applied to more accurately classify the pixel. To clarify – claim 13 is merely reciting, when read in light of the specification, a scope in which an embodiment is to use a smaller set of configurations with a smaller number of “random seeds” to screen “the selected pixel”, and then repeating the same algorithm with a larger number of seeds for the selected pixel for pixels that are screened into a specific class/”within the screening cluster”. 
This is obvious in view of Zhang, as taken in combination. The other references relied upon teaches the claim technique of classification, however they do not teach the steps of first performing the classification task with a smaller set of configurations in order to screen the pixels and then performing a high accuracy classification of the screened pixels that are “within the screening cluster”.
Zhang, as relied upon, renders it obvious to a person of ordinary skill to perform a 2-step classification of pixels, e.g. the cell/non-cell classes, followed by the cancerous/non-cancerous. One of ordinary skill would have found it obvious from the prior art relied upon to combine Zhang with the other art relied upon to perform this 2-step classification of the pixels as this would have been “substantially faster” (page 44 of the non-final rejection).
In light of Zhang, a person of ordinary skill would have found it obvious to use the classification technique including the ensemble GMM for both steps of Zhang as it would have been obvious that the system would require a much smaller ensemble, i.e. a much faster ensemble, to determine cell from non-cell pixels as “the spectra are substantially different” (non-final rejection, page 43, and as shown in figure 1 of Zhang), and then to use a large ensemble for the pixels that are classified as cells. Also, as the classification is for 4 classes, i.e. cell/non-cell, and then cancerous/normal, it would have been obvious that the training GMMs for each pair of classes would have been different, e.g. the training GMMs for the non-cell classification would have obvious been different then the training GMMs for cancerous/normal pixels based on at least the spectrum shown in figure 1 (the training set, as one of ordinary skill in the art would have known, would have included numerous “labelled” training data sets such as the spectra in figure 1 of Zhang). 

The applicant further submits “Zhang teaches only generating modeling signals for the pixels that passed the screening criteria (i.e., only cell pixels)” – this is incorrect, see Zhang abstract – Zhang models the pixels to classify them as either cell/not cell, and then performs a second pass modeling/classifying for the cell-based pixels. See Zhang, as relied upon. In other words, and as detailed above, Zhang classifies at each step of the two steps, i.e. Zhang teaches modelling/classifying for screening into cell/non-cell, and then for the final classification of relied upon combination of prior art, see above, and see the combination of the prior art. 

The applicant further submits “Zhang does not define a screening cluster in the parameter domain or use such a cluster to determine if a diagnostic set of configurations using more random seeds should be generated”. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). See the rejection on pages 41-44, the prior art as taken in combination teaches claim 13, also see the additional clarifications above from the response to arguments.

Claim Objections
Claims 20-22 are objected to because of the following informalities:  
Using claim 20 as representative, claim 20 recites:
The method of claim 1, wherein the parameter space has a number of dimensions equal to a number of model parameters in each configuration.

Claim 20 should recite “the parameter domain has a number of model parameters for each of the configurations”

In addition (the following is merely a claim interpretation), Claim 1 already recites that there is a “set of configurations” and further recites that “each configuration representations a set of the plurality...”, i.e. that “each configuration” is associated with “a set”. Claim 1 is not limited to the “model parameters” being “in” the configuration – the Examiner infers that claims 20-22 were not intended to limit that the model parameters are “in” each configuration, and as such suggest the above “for each” phrasing instead. The objection is merely to the lack of antecedent basis, the suggested amendment is to also overcome this claim interpretation. 
Claims 21-22 are objected to under a similar rationale as claim 20. 

In addition, claim 21 further limits claim 11 which recites “a diagnostic set of configurations” – claim 21 should read “a number of model parameters for each of the diagnostic set of configurations” to clearly reflect the intended claim scope for antecedent basis. 
Appropriate correction is required.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 11, 14-15, 17, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 


Regarding Claim 11.
Cohen teaches:
	A method for detecting … in a … sample (Cohen, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”), comprising:
acquiring a set of Fourier Transform Infrared (FTIR) spectroscopy data for the … sample, the FTIR data including an energy absorption spectrum signal for each of a plurality of pixels (Cohen, as cited above, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”, i.e. the FTIR data that is acquired is a trace signal for a sample, e.g. see figure 1);
	generating a diagnostic [a] configurations for defining a modeling signal to model a portion of the energy absorption spectrum signal for a selected pixel  (Cohen, as cited above, teaches applying GMM to FTIR data for a sample – see page 250, ¶ 2-3 which teaches that the measured data originally includes a “spectra” with “1577 samples” and then a “range of wavenumbers…was removed” which reduces this to “1528” samples, i.e. a portion of the signal is used for the GMM – the GMM is the modeling signal to model a portion of the signal – for more clarification, see section 3.2 on page 251, ¶ 1 which teaches that “the set of spectra of each image was submitted to both regular GMM, using the EM algorithm, as well as the spatially aware model proposed…cGMM [variant of a GMM”, i.e. a configuration/GMM is generated for the trace signal, Cohen as cited above teaches this is for each pixel, e.g. see section 1 on page 247 ¶ 1) , wherein the modeling signal is defined by a plurality of model parameters  (Cohen, fig. 5 the caption teaches “Spectra re-computed from the model parameters for the same two pixels as in Figure 1…”, i.e. the modelling signal [the re-computed spectra] is defined by “model parameters” for each pixel, which are the parameters for the GMM/cGMM), each configuration represents a set of the plurality of model parameters for an instance of the modeling signal for the portion of the energy absorption spectrum signal and has a score for fitting the portion of the energy absorption spectrum signal, at least 5U.S. Application No. 15/333,888 Response to December 31, 2019 Office Action a first one of the configurations in the set has a maximum score over the set of configurations  (Cohen, fig. 5 the caption teaches “Spectra re-computed from the model parameters for the same two pixels as in Figure 1…”, i.e. the modelling signal [the re-computed spectra] is defined by “model parameters” for each pixel, which are the parameters for the GMM/cGMM – for more clarification page 246, col. 2, ¶ 2 which teaches “Our proposed contribution is based on conditional density estimation by the penalized maximum likelihood technique [maximum score] that allows one to estimate simultaneously the number of meaningful classes and the pixel labels. Density estimation is already at the core of the most classical spectral method in which the observed spectra are modelized as a realization of a Gaussian Mixture Model (GMM)”),
	determining that the selected pixel is associated with [specific] tissue Cohen, page 253, col. 1, ¶ 3-4 teaches that the results of the GMM/cGMM are used for “classification of the pixels” [e.g., the “pixel labels”, as cited above] – for example, “in figure 4, where each pixel is assigned the color corresponding to the most likely component of the Gaussian mixture model”, in other words the pixels in the image are classified such as with a label/color [class type] – this results in an image that shows the classification results for the sample);
	and repeating the generating of the diagnostic set of configurations and the determining of the proximity to the classification cluster for each of the pixels (Cohen, section 1, ¶ 1 teaches “each pixel” is classified and labelled, i.e. the process is repeated for each pixel)

Cohen does not explicitly teach:
detecting malignancy in a tissue sample
generating a diagnostic set of configurations and a subset of the set of configurations includes multiple configurations with scores less than the maximum score;
defining a classification cluster in a parameter domain defined by the plurality of model parameters;
	determining a distance between each member of the set of configurations and the classification cluster;
based on a determination that at least one of the configurations in the diagnostic set has a distance within a distance threshold from the classification cluster

Emmott teaches:
detecting malignancy in a tissue sample (Emmott, section 2, ¶1 teaches detecting the “emergence of cancer cells in normal tissue”)
generating a diagnostic set of configurations and a subset of the set of configurations includes multiple configurations with scores less than the maximum score; (Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the emergence of cancer cells in normal tissue” – the see page 4, section 4.5 which teaches creating an “ensemble Gaussian Mixture Model” in which a “set of models”, i.e. “GMMs” are created by varying the “number of clusters”, the “EM initializations”, and the like to create an ensemble of GMMs [set of configurations] in which any GMM with a “likelihood” is “less than 85% of the best observed value are discarded” in order to rank the data points “by the remaining GMMs”, i.e. an ensemble/set of GMMs [and their respective configurations/model parameters] is 

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen on using a GMM algorithm to analyze FTIR data with the teachings from Emmott on using an ensemble GMM algorithm for anomaly detection such as for detecting cancer cells. The motivation to combine would have been that the technique in Emmott provides a more “robust” solution than “a single Gaussian” , especially in cases where “data points of low density are declared to be anomalies” (Emmott, section 4.5), e.g. cancer detection. 

Cohen, as modified by Emmott, does not explicitly teach:
defining a classification cluster in a parameter domain defined by the plurality of model parameters;
	determining a distance between each member of the set of configurations and the classification cluster;
based on a determination that at least one of the configurations in the diagnostic set has a distance within a distance threshold from the classification cluster

Bigdeli teaches:
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type (Bigdeli, as cited below, in summary teaches a method which includes defining a classification cluster, i.e. a “training…cluster” in the parameter domain defined by the plurality of model parameters [Bigdeli teaches that the input data is clusters as well into separate GMMs – for the modification, Bigdeli’s input is the GMM ensemble of Cohen/Emmott] – the training clusters are “labelled”, i.e. each training cluster has an associated class type, the purpose of Bigdeli is to model the “group behavior” of new data, to mitigate the “negative impact of noise” on the input data
abstract, teaches “In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behaviour rather than individual characteristics of incoming samples…. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs.”, then see page 339-340, section B and fig. 2 and 3- this teaches “In the CPAD method, each training and test cluster are represented by a GMM…” [the training cluster is a classification cluster that is defined in a parameter domain defined by the plurality of model parameters – both the training and test clusters are in the same domain, as the distance between them is measured] 
section B further clarifies “In this case measurement is required to determine the label of each testing GMM. That is why in this paper a distance is require to measure the similarity between the training and testing GMMs…With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs. If the similarity between a test GMM and one of the training GMMs is more than a threshold, the test cluster should be generated by the same distribution as that of training cluster – see section IV for details on the GMM, and section 5 for the distance metric);
	determining a distance between each member of the set of configurations and the classification cluster (Bigdeli, as cited above teaches determining a “distance” between a GMM of testing data and a GMM of training data [and/or multiple training clusters], taken in combination with Cohen, as modified by Emmott, it would be obvious to apply this technique for each GMM in the set of GMMs, as each is a separate testing cluster, i.e. Bigdeli’s technique is to label each test GMM by determining the distance to one or more training GMM(s));
based on a determination that at least one of the configurations in the diagnostic set has a distance within a distance threshold from the classification cluster(Bigdeli, as cited above, teaches determining the “label” for each input GMM [example of determining the class type of the sample, i.e. an example of a class type is a label], see page 340, col. 1, ¶ 1 teaches “all the test samples in a cluster are labelled based on their collective characteristic represented by a GMM” wherein this is based on “distance” – then see section 5 which provides various means of determining the distance between the configurations, e.g. “our proposed method considers the pairwise distance of each normal component from both GMM” [example of distance between each configuration and the training cluster(s)/classification cluster(s), then see page 339, col. 2, last paragraph which teaches “With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs. If the similarity [distances] between a test GMM and one of the training GMMs is more than a threshold [distance in other words Bigdeli determines the distance between each configuration and the classification cluster, wherein a distance threshold is used to determine if the GMMs are similar enough to have the same label)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified by Emmott on a system which generates an ensemble of GMMs from FTIR data for anomaly detection such as for cancer detection with the teachings from Bigdeli on a technique for “anomaly detection” which detections anomalies based on “group behavior”. The motivation to combine would have been that the technique of Bigdeli minimizes the impact of “noise” in the dataset for the classification result (Bigdeli, abstract and section I), i.e. “By clustering test samples before detection process, we remove the effect of noise on test samples, and consequently improve the accuracy of anomaly detection algorithm.”


Regarding Claim 14.
Emmott teaches: 
	The method of claim 11, wherein each set of the plurality of model parameters defines a Gaussian mixture.  (Emmott, as cited above, teaches that each set of model parameters define a GMM,  also see Cohen/Bigdeli, these also teaches using GMMs)

Regarding Claim 15.
Bigdeli teaches: 
	The method of claim 11, wherein the diagnostic cluster comprises an ellipsoid defined in the parameter space. (Bigdeli, fig. 3 shows that the “clusters” defined by the GMMs include ellipsoids, this includes both the classification cluster and the set of configurations – these are both in the parameters domain, as cited above)


Regarding Claim 17.
Bigdeli teaches: 
	The method of claim 11, further comprising:
	defining a plurality of diagnostic clusters in the parameter domain; (Bigdeli, as cited above, teaches having a plurality of “training clusters” [classification clusters, e.g., see fig. 2 – see Bigdeli, section VI, ¶3 which teaches that “some classes as normal and the rest as abnormal”, i.e. the class types of normal/abnormal are both used for a plurality of the classification clusters)
	and determining that the selected pixel is associated with malignant tissue responsive to determining that at least one of the configurations in the diagnostic set has a distance within the distance threshold from any of the plurality of diagnostic clusters. (Bigdeli, as cited above, page 339, col. 2, last paragraph which teaches “With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs [one or more training clusters]. If the similarity [distances] between a test GMM and one of the training threshold [distance threshold], the test cluster should be generated by the same distribution as that of training cluster.”, in other words Bigdeli determines the distance between each configuration and each classification cluster, wherein a distance threshold is used to determine if the GMMs are similar enough to have the same label)

Regarding claim 21
Bigdeli, as taken in combination above, teaches:
The method of claim 11, wherein the parameter space has a number of dimensions equal to a number of model parameters in each configuration (Bigdeli, as taken in combination above, teaches classifying input GMMs [configurations] based on a distance measure to GMMs from training GMMs, see abstract, section B and the like as cited above, i.e. from section B “we can label the test clusters [classify] by finding the distances between the test GMMs [each configuration/GMM member of the ensemble of Emmott] and training GMMs – see § 5which clarifies that this includes “find the distance of first GMM components to the second GMM components” and see equations 2-6, this is based on finding the distance between the components of each GMM such as by the mean [example of a model parameter] – in other words, the parameter domain/space that is being used for classification based on distance is the parameters of the GMMs, e.g. mean, in other words the distance metric being used is in a parameter space/domain as it is finding the distance between GMMs based upon the parameters of the GMMs [parameters of each configuration] – it would have been obvious to a person of ordinary skill that such a technique would have been in a parameter space/domain which has a number of dimensions equal to the number of parameters for each GMM, i.e. the as such the parameter space/domain has a number of dimensions equal to the number of parameters for each configuration, i.e. for classifying a 3-component GMM the “distance” is found to another GMM based on the distance metric in § 5 wherein this “distance metric” is computing the distance based on the mean, the weight, and the co-variance of each component in the test GMM, i.e. “our proposed method considers the pairwise distance of each normal component from both GMM.” and see equations 4-5 – the summations are the “each normal component” being iterated through , to clarify – the claim merely recites that the “parameter space has a number of dimensions equal to a number of model parameters in each configuration” – this is merely limiting the scope of the parameter space/domain to including, i.e. has, a number of dimensions for the number of model parameters for each configuration – the prior as relied upon teaches classifying using a distance between GMM parameters in a parameter space, i.e. the parameter space of the prior art obviously has a number of dimensions equal to the number of parameters for each space, as this is the space in which the distance is being determined between the parameters)

Claim 1-2, 4-5, 7-9, 12, 19, 20, 22 are rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 

Regarding Claim 1
Cohen teaches: 
	A method for characterizing a sample (Cohen, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”), comprising:
	acquiring a trace signal for the sample (Cohen, as cited above, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”, i.e. the FTIR data that is acquired is a trace signal for a sample, e.g. see figure 1);
	generating a … configurations for defining a modeling signal to model a portion of the trace signal, (Cohen, as cited above, teaches applying GMM to FTIR data for a sample – see page 250, ¶ 2-3 which teaches that the measured data originally includes a “spectra” with “1577 samples” and then a “range of wavenumbers…was removed” which reduces this to “1528” samples, i.e. a portion of the signal is used for the GMM – the GMM is the modeling  wherein the modeling signal is defined by a plurality of model parameters (Cohen, fig. 5 the caption teaches “Spectra re-computed from the model parameters for the same two pixels as in Figure 1…”, i.e. the modelling signal [the re-computed spectra] is defined by “model parameters” for each pixel, which are the parameters for the GMM/cGMM), each configuration represents a set of the plurality of model parameters for an instance of the modeling signal for the portion of the trace signal and has a score for fitting the modeling signal to the portion of the trace signal , at least a first one of the configurations in the set has a maximum score … (Cohen, fig. 5 the caption teaches “Spectra re-computed from the model parameters for the same two pixels as in Figure 1…”, i.e. the modelling signal [the re-computed spectra] is defined by “model parameters” for each pixel, which are the parameters for the GMM/cGMM – for more clarification page 246, col. 2, ¶ 2 which teaches “Our proposed contribution is based on conditional density estimation by the penalized maximum likelihood technique [maximum score] that allows one to estimate simultaneously the number of meaningful classes and the pixel labels. Density estimation is already at the core of the most classical spectral method in which the observed spectra are modelized as a realization of a Gaussian Mixture Model (GMM)”)…
	
Cohen does not explicitly teach:
generating a set of configurations…score over the set of configurations…and a subset of the set of configurations includes multiple configurations with scores less than the maximum score;
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type;
	determining a distance between each member of the set of configurations and the classification cluster;
and determining that the sample has the class type associated with the classification cluster based on a determination that at least one of the configurations in the set of configurations has a distance within a distance threshold from the classification cluster. 

Emmott teaches:
generating a set of configurations…score over the set of configurations…and a subset of the set of configurations includes multiple configurations with scores less than the maximum score (Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the emergence of cancer cells in normal tissue” – the see page 4, section 4.5 which teaches creating an “ensemble Gaussian Mixture Model” in which a “set of models”, i.e. “GMMs” are created by varying the “number of clusters”, the “EM initializations”, and the like to create an ensemble of GMMs [set of configurations] in which any GMM with a “likelihood” is “less than 85% of the best observed value are discarded” in order to rank the data points “by the remaining GMMs”, i.e. an ensemble/set of GMMs [and their respective configurations/model parameters] is generated wherein only GMMs with a score higher than 

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen on using a GMM algorithm to analyze FTIR data with the teachings from Emmott on using an ensemble GMM algorithm for anomaly detection such as for detecting cancer cells. The motivation to combine would have been that the technique in Emmott provides a more “robust” solution than “a single Gaussian” , especially in cases where “data points of low density are declared to be anomalies” (Emmott, section 4.5), e.g. cancer detection. 

Cohen, as modified by Emmott, does not explicitly teach:
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type ;
	determining a distance between each member of the set of configurations and the classification cluster;
and determining that the sample has the class type associated with the classification cluster based on a determination that at least one of the configurations in the set of configurations has a distance within a distance threshold from the classification cluster. 

Bigdeli teaches:
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type (Bigdeli, as cited below, in summary teaches a method which includes defining a classification cluster, i.e. a “training…cluster” in the parameter domain defined by the plurality of model parameters [Bigdeli teaches that the input data is clusters as well into separate GMMs – for the modification, Bigdeli’s input is the GMM ensemble of Cohen/Emmott] – the training clusters are “labelled”, i.e. each training cluster has an associated class type, the purpose of Bigdeli is to model the “group behavior” of new data, to mitigate the “negative impact of noise” on the input data
abstract, teaches “In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behaviour rather than individual characteristics of incoming samples…. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs.”, then see page 339-340, section B and fig. 2 and 3- this teaches “In the CPAD method, each training and test cluster are represented by a GMM…” [the training cluster is a classification cluster that is defined in a parameter domain defined by the plurality of model parameters – both the training and test clusters are in the same domain, as the distance between them is measured] 
section B further clarifies “In this case measurement is required to determine the label of each testing GMM. That is why in this paper a distance is require to measure the similarity between the training and testing GMMs…With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs. If the similarity between a test GMM and one of the training GMMs is more than a threshold, the test cluster should be generated by the same distribution as that of training cluster – see section IV for details on the GMM, and section 5 for the distance metric);
	determining a distance between each member of the set of configurations and the classification cluster (Bigdeli, as cited above teaches determining a “distance” between a GMM of testing data and a GMM of training data [and/or multiple training clusters], taken in combination with Cohen, as modified by Emmott, it would be obvious to apply this technique for each GMM in the set of GMMs, as each is a separate testing cluster, i.e. Bigdeli’s technique is to label each test GMM by determining the distance to one or more training GMM(s));
… based on a determination that at least one of the configurations in the set of configurations has a distance within a distance threshold from the classification cluster (Bigdeli, as cited above, teaches determining the “label” for each input GMM [example of determining the class type of the sample, i.e. an example of a class type is a label], see page 340, col. 1, ¶ 1 teaches “all the test samples in a cluster are labelled based on their collective characteristic represented by a GMM” wherein this is based on “distance” – then see section 5 which provides various means of determining the distance between the configurations, e.g. “our proposed method considers the pairwise distance of each normal component from both GMM” [example of distance between each configuration and the training cluster(s)/classification cluster(s), then see page 339, col. 2, last paragraph which teaches “With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs. If the similarity [distances] between a test GMM and one of the training GMMs is more than a threshold [distance threshold], the test cluster should be generated by the same distribution as that of training cluster.”, in other words Bigdeli determines the distance between each configuration and the classification cluster, wherein a distance threshold is used to determine if the GMMs are similar enough to have the same label)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified by Emmott on a system which generates an ensemble of GMMs from FTIR data for anomaly detection such as for cancer detection with the teachings from Bigdeli on a technique for “anomaly detection” which detections anomalies based on “group behavior”. The motivation to combine would have been that the technique of Bigdeli minimizes the impact of “noise” in the dataset for the classification result (Bigdeli, abstract and section I), i.e. “By clustering test samples before detection process, we remove the effect of noise on test samples, and consequently improve the accuracy of anomaly detection algorithm.”

Cohen, as modified by Emmott and Bigdeli, does not explicitly teach
and determining that the sample has the class type associated with the classification cluster 

Fernandez teaches: 
and determining that the sample has the class type associated with the classification cluster   (Fernandez, page 471, col. 2, ¶ 2 teaches determining the “fraction of pixels” for each 

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified above, on classifying pixels in FTIR image data of a sample with the teachings from Fernandez on determining a classification of the sample based on the fraction of pixels with a class type, such as for cancer detection (Fernandez, page 472, col. 1, ¶1). The motivation to combine would have been that the technique in Fernandez would have enabled the system to automatically classify the sample and also provide a classification accuracy for the sample based on the fraction of pixels, e.g. see Fernandez table 1 – this would have allowed a user of the system to more quickly detect cancer cells. 


Regarding Claim 2.
Emmott teaches: 
	The method of claim 1, wherein the sample comprises a tissue sample, and the class type comprises malignant tissue (Emmott, section 2, ¶ 1 teaches using anomaly detect to detect “the emergence of cancer cells in normal tissue” [the cancer cells are an example of malignant tissue])


Regarding Claim 4.
Emmott teaches: 
	The method of claim 1, wherein each set of the plurality of model parameters defines a Gaussian mixture (Emmott, as cited above, teaches that each set of model parameters define a GMM, so see Cohen/Bigdeli, these also teaches using GMMs)

Regarding Claim 5.
Bigdeli teaches: 
	The method of claim 1, wherein the classification cluster comprises an ellipsoid defined in the parameter space (Bigdeli, fig. 3 shows that the “clusters” defined by the GMMs include ellipsoids, this includes both the classification cluster and the set of configurations – these are both in the parameters domain, as cited above)

    PNG
    media_image4.png
    424
    638
    media_image4.png
    Greyscale



Regarding Claim 7.
Bigdeli teaches: 
	The method of claim 1, further comprising:
	defining a plurality of classification clusters in the parameter domain having the class type (Bigdeli, as cited above, teaches having a plurality of “training clusters” [classification clusters, e.g., see fig. 2 – see Bigdeli, section VI, ¶3 which teaches that “some classes as normal and the rest as abnormal”, i.e. the class types of normal/abnormal are both used for a plurality of the classification clusters)
and determining that the sample has the class type responsive to determining that at least one of the configurations in the set has a distance within the distance threshold from any of the plurality of classification clusters (Bigdeli, as cited above, page 339, col. 2, last paragraph which teaches “With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs [one or more training clusters]. If the similarity [distances] between a test GMM and one of the training GMMs is more than a threshold [distance threshold], the test cluster should be generated by the same distribution as that of training cluster.”, in other words Bigdeli determines the distance between each configuration and each classification cluster, wherein a distance threshold is used to determine if the GMMs are similar enough to have the same label)


Regarding Claim 8.
Cohen teaches: 
	The method of claim 1, wherein the trace signal comprises a Fourier Transform Infrared energy absorption spectrum signal (Cohen, section 3, ¶ 1 teaches that the signal is “observed using Fourier Transform Infrared microscopy (FTIR), in regards to this including the absorption spectrum see the example data in fig. 1 of Cohen which shows this – see ¶ 70 of the instant spec for claim interpretation)

Regarding Claim 9.
Cohen teaches: 
The method of claim 7, wherein the trace signal is associated with one of a plurality of pixels generated for the sample, and the method comprises:
	repeating the generating of the set of configurations and the determining that the sample has the class type for each of the plurality of pixels (Cohen, page 246, col. 1, ¶ 1 teaches that there is a “spectrum for each pixel” [a trace signal for each pixel] wherein page 246, col. 2, ¶ 2 teaches that the technique is to determine the class/label of each “pixel”, i.e. the process is repeated for each pixel – to clarify see section 1, ¶ 1 which teaches “Our goal is to assign each pixel a class …to which the spectrum is supposed to belong.”, i.e. each pixel is classified);

Fernandez teaches: 
	and determining a count of pixels having the class type associated with the classification cluster (Fernandez, page 471, col. 2, ¶ 2 teaches determining the “fraction of pixels” for each class, e.g. see fig 1 – the class type of the sample is determined by using a fraction of the pixels that are classified by that class type, the fraction includes counting pixels with the class type)

Regarding Claim 12.
Emmott, as taken in combination, teaches: 
	The method of claim 11, further comprising classifying the tissue sample as being malignant based on … pixels associated with malignant tissue.  (Emmott, section 2, ¶1 teaches detecting the “emergence of cancer cells in normal tissue”, i.e. classifying a tissue sample as being malignant/cancerous, in combination with Cohen and Bigdeli this is applied to FTIR image data on a pixel-by-pixel basis, see above rejection for claim 11)

Cohen, as modified by Bigdeli and Emmott, does not explicitly teach: a count of the pixels

Fernandez teaches:  a count of the pixels (Fernandez, page 471, col. 2, ¶ 2 teaches determining the “fraction of pixels” for each class, e.g. see fig 1 – the class type of the sample is determined by using a fraction of the pixels that are classified by that class type, the fraction includes counting pixels with the class type)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified above, on classifying pixels in FTIR image data of a sample with the teachings from Fernandez on determining a classification of the sample based on the fraction of pixels with a class type, such as for cancer detection (Fernandez, page 472, col. 1, ¶1). The motivation to combine would have been that the technique in Fernandez would have enabled the system to automatically classify the sample and also provide a classification accuracy for the sample based on the fraction of pixels, e.g. see Fernandez table 1 – this would have allowed a user of the system to more quickly detect cancer cells. 


Regarding Claim 19.
Cohen teaches:
A system  (Cohen, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample” – this is on a system/computer, i.e. the method is an “algorithm” then is then tested with “numerical experiments” [on a computer with memory/processor]), comprising:
	a memory to store a plurality of instructions;
	and a processor to execute the instructions to 
acquire a trace signal for a sample (Cohen, as cited above, abstract, teaches using a GMM for a spectral image  - see section 3, ¶ 1 on page 249 – the algorithm is applied to “experimental data” from a “sample” where the sample is observed/measured using “FTIR” for “characterization of the sample”, i.e. the FTIR data that is acquired is a trace signal for a sample, e.g. see figure 1), generating a … configurations for defining a modeling signal to model a portion of the trace signal, (Cohen, as cited above, teaches applying GMM to FTIR data for a sample – see page 250, ¶ 2-3 which teaches that the measured data originally includes a “spectra” with “1577 samples” and then a “range of wavenumbers…was removed” which reduces this to “1528” samples, i.e. a portion of the signal is used for the GMM – the GMM is the modeling signal to model a portion of the signal – for more clarification, see section 3.2 on page 251, ¶ 1 which teaches that “the set of spectra of each image was submitted to both regular GMM, using the EM algorithm, as well as the spatially aware model proposed…cGMM [variant of a GMM”, i.e. a configuration/GMM is generated for the trace signal) wherein the modeling signal is defined by a plurality of model parameters (Cohen, fig. 5 the caption teaches “Spectra re-computed from the model parameters for the same two pixels as in Figure 1…”, i.e. , each configuration represents a set of the plurality of model parameters for an instance of the modeling signal for the portion of the trace signal and has a score for fitting the modeling signal to the portion of the trace signal , at least a first one of the configurations in the set has a maximum score … (Cohen, fig. 5 the caption teaches “Spectra re-computed from the model parameters for the same two pixels as in Figure 1…”, i.e. the modelling signal [the re-computed spectra] is defined by “model parameters” for each pixel, which are the parameters for the GMM/cGMM – for more clarification page 246, col. 2, ¶ 2 which teaches “Our proposed contribution is based on conditional density estimation by the penalized maximum likelihood technique [maximum score] that allows one to estimate simultaneously the number of meaningful classes and the pixel labels. Density estimation is already at the core of the most classical spectral method in which the observed spectra are modelized as a realization of a Gaussian Mixture Model (GMM)”)…
	
Cohen does not explicitly teach:
generating a set of configurations…score over the set of configurations…and a subset of the set of configurations includes multiple configurations with scores less than the maximum score;
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type;
determining a distance between each member of the set of configurations and the classification cluster;
and determining that the sample has the class type associated with the classification cluster based on a determination that at least one of the configurations in the set of configurations has a distance within a distance threshold from the classification cluster. 

Emmott teaches:
generating a set of configurations…score over the set of configurations…and a subset of the set of configurations includes multiple configurations with scores less than the maximum score (Emmott, abstract and section 2, ¶ 1 teaches a system/technique for “anomaly detection” such as to detect “the emergence of cancer cells in normal tissue” – the see page 4, section 4.5 which teaches creating an “ensemble Gaussian Mixture Model” in which a “set of models”, i.e. “GMMs” are created by varying the “number of clusters”, the “EM initializations”, and the like to create an ensemble of GMMs [set of configurations] in which any GMM with a “likelihood” is “less than 85% of the best observed value are discarded” in order to rank the data points “by the remaining GMMs”, i.e. an ensemble/set of GMMs [and their respective configurations/model parameters] is generated wherein only GMMs with a score higher than 85% of the “best”, i.e. maximum, are retained – this ensemble includes both the highest scoring, and GMMS with scores less than the highest)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen on using a GMM algorithm 

Cohen, as modified by Emmott, does not explicitly teach:
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type ;
	determining a distance between each member of the set of configurations and the classification cluster;
and determining that the sample has the class type associated with the classification cluster based on a determination that at least one of the configurations in the set of configurations has a distance within a distance threshold from the classification cluster. 

Bigdeli teaches:
defining a classification cluster in a parameter domain defined by the plurality of model parameters, the classification cluster having an associated class type (Bigdeli, as cited below, in summary teaches a method which includes defining a classification cluster, i.e. a “training…cluster” in the parameter domain defined by the plurality of model parameters [Bigdeli teaches that the input data is clusters as well into separate GMMs – for the modification, Bigdeli’s input is the GMM ensemble of Cohen/Emmott] – the training clusters 
abstract, teaches “In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behaviour rather than individual characteristics of incoming samples…. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs.”, then see page 339-340, section B and fig. 2 and 3- this teaches “In the CPAD method, each training and test cluster are represented by a GMM…” [the training cluster is a classification cluster that is defined in a parameter domain defined by the plurality of model parameters – both the training and test clusters are in the same domain, as the distance between them is measured] 
section B further clarifies “In this case measurement is required to determine the label of each testing GMM. That is why in this paper a distance is require to measure the similarity between the training and testing GMMs…With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs. If the similarity between a test GMM and one of the training GMMs is more than a threshold, the test cluster should be generated by the same distribution as that of training cluster – see section IV for details on the GMM, and section 5 for the distance metric);
	determining a distance between each member of the set of configurations and the classification cluster (Bigdeli, as cited above teaches determining a “distance” between a GMM ;
… based on a determination that at least one of the configurations in the set of configurations has a distance within a distance threshold from the classification cluster (Bigdeli, as cited above, teaches determining the “label” for each input GMM [example of determining the class type of the sample, i.e. an example of a class type is a label], see page 340, col. 1, ¶ 1 teaches “all the test samples in a cluster are labelled based on their collective characteristic represented by a GMM” wherein this is based on “distance” – then see section 5 which provides various means of determining the distance between the configurations, e.g. “our proposed method considers the pairwise distance of each normal component from both GMM” [example of distance between each configuration and the training cluster(s)/classification cluster(s), then see page 339, col. 2, last paragraph which teaches “With this intuition, we can label the test clusters by finding the distances between the test GMMs and training GMMs. If the similarity [distances] between a test GMM and one of the training GMMs is more than a threshold [distance threshold], the test cluster should be generated by the same distribution as that of training cluster.”, in other words Bigdeli determines the distance between each configuration and the classification cluster, wherein a distance threshold is used to determine if the GMMs are similar enough to have the same label)



Cohen, as modified by Emmott and Bigdeli, does not explicitly teach
and determining that the sample has the class type associated with the classification cluster 

Fernandez teaches: 
and determining that the sample has the class type associated with the classification cluster   (Fernandez, page 471, col. 2, ¶ 2 teaches determining the “fraction of pixels” for each class, e.g. see fig 1 – the class type of the sample is determined by using a fraction of the pixels that are classified by that class type, the fraction includes counting pixels with the class type)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified above, on 1). The motivation to combine would have been that the technique in Fernandez would have enabled the system to automatically classify the sample and also provide a classification accuracy for the sample based on the fraction of pixels, e.g. see Fernandez table 1 – this would have allowed a user of the system to more quickly detect cancer cells. 

Regarding claim 20
Bigdeli, as taken in combination above, teaches:
The method of claim 1, wherein the parameter space has a number of dimensions equal to a number of model parameters in each configuration (Bigdeli, as taken in combination above, teaches classifying input GMMs [configurations] based on a distance measure to GMMs from training GMMs, see abstract, section B and the like as cited above, i.e. from section B “we can label the test clusters [classify] by finding the distances between the test GMMs [each configuration/GMM member of the ensemble of Emmott] and training GMMs – see § 5which clarifies that this includes “find the distance of first GMM components to the second GMM components” and see equations 2-6, this is based on finding the distance between the components of each GMM such as by the mean [example of a model parameter] – in other words, the parameter domain/space that is being used for classification based on distance is the parameters of the GMMs, e.g. mean, in other words the distance metric being used is in a parameter space/domain as it is finding the distance between GMMs based upon the as such the parameter space/domain has a number of dimensions equal to the number of parameters for each configuration, i.e. for classifying a 3-component GMM the “distance” is found to another GMM based on the distance metric in § 5 wherein this “distance metric” is computing the distance based on the mean, the weight, and the co-variance of each component in the test GMM, i.e. “our proposed method considers the pairwise distance of each normal component from both GMM.” and see equations 4-5 – the summations are the “each normal component” being iterated through, to clarify – the claim merely recites that the “parameter space has a number of dimensions equal to a number of model parameters in each configuration” – this is merely limiting the scope of the parameter space/domain to including, i.e. has, a number of dimensions for the number of model parameters for each configuration  – the prior as relied upon teaches classifying using a distance between GMM parameters in a parameter space, i.e. the parameter space of the prior art obviously has a number of dimensions equal to the number of parameters for each space, as this is the space in which the distance is being determined between the parameters)

Regarding claim 22
Bigdeli, as taken in combination above, teaches:
The system of claim 19, wherein the parameter space has a number of dimensions equal to a number of model parameters in each configuration.(Bigdeli, as taken in combination above, teaches classifying input GMMs [configurations] based on a distance measure to GMMs from training GMMs, see abstract, section B and the like as cited above, i.e. from section B “we can label the test clusters [classify] by finding the distances between the test GMMs [each configuration/GMM member of the ensemble of Emmott] and training GMMs – see § 5which clarifies that this includes “find the distance of first GMM components to the second GMM components” and see equations 2-6, this is based on finding the distance between the components of each GMM such as by the mean [example of a model parameter] – in other words, the parameter domain/space that is being used for classification based on distance is the parameters of the GMMs, e.g. mean, in other words the distance metric being used is in a parameter space/domain as it is finding the distance between GMMs based upon the parameters of the GMMs [parameters of each configuration] – it would have been obvious to a person of ordinary skill that such a technique would have been in a parameter space/domain which has a number of dimensions equal to the number of parameters for each GMM, i.e. the distance metric is being provided by the parameters [so obviously this is in a parameter space/distance] wherein each GMM component has parameters such as “mean” and “covariance” (page 340, col. 2, ¶ 1-2) and there are a limited number of components [e.g., for a 3 component GMM there is a parameter for each component, and there is a total of 9 as such the parameter space/domain has a number of dimensions equal to the number of parameters for each configuration, i.e. for classifying a 3-component GMM the “distance” is found to another GMM based on the distance metric in § 5 wherein this “distance metric” is computing the distance based on the mean, the weight, and the co-variance of each component in the test GMM, i.e. “our proposed method considers the pairwise distance of each normal component from both GMM.” and see equations 4-5 – the summations are the “each normal component” being iterated through, to clarify – the claim merely recites that the “parameter space has a number of dimensions equal to a number of model parameters in each configuration” – this is merely limiting the scope of the parameter space/domain to including, i.e. has, a number of dimensions for the number of model parameters for each configuration – the prior as relied upon teaches classifying using a distance between GMM parameters in a parameter space, i.e. the parameter space of the prior art obviously has a number of dimensions equal to the number of parameters for each space, as this is the space in which the distance is being determined between the parameters)


Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 and in further view of Mittal et al., “Classification of breast tissue for cancer diagnosis: Application of FT-IR imaging and random forests”, 2013

Regarding Claim 3.
Emmott teaches
	The method of claim 2, wherein the class type comprises … carcinoma (Emmott, section 2, ¶ 1 teaches detecting “cancer cells in normal tissue”)

Cohen, as modified above, does not explicitly teach: ductal carcinoma. 

Mittal teaches: ductal carcinoma (Mittal, abstract teaches “Fourier Transform Infrared (FTIR) imaging has gained wide acceptance for determining the chemical composition of biomedical samples. In particular, there is significant potential for performing quantitative, label-free imaging in tumor biopsies. In this paper, we demonstrate the advantages offered by FTIR imaging for the analysis of breast tumor biopsies from hundreds of patients. We then demonstrate the software that we have developed for tissue classification.”, i.e. using FTIR for determining a class type of breast cancer [ductal carcinoma]).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified above, on a system for analyzing FTIR data such as for cancer detection with the teachings from Mittal on . 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015 and in further view of Fernandez et al., “Infrared spectroscopic imaging for histopathologic recognition”, 2005 and in and in further view of Dobry et al., “Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal”, 2011

Regarding Claim 6.
Cohen, as taken in combination above, does not explicitly teach:
	The method of claim 5, wherein the ellipsoid is defined using a singular value decomposition matrix

Dobry teaches:
	The method of claim 5, wherein the ellipsoid is defined using a singular value decomposition matrix (Dobry, abstract, teaches applying “dimension reduction” to “GMM” in other words Dobry classifies the GMMs by projecting the parameters of the GMMs into a “space” created by SVD – SVD is applied to both the input data and the training data [training data used for classification cluster])

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, modified above, on a system which classifies GMMs by determining the distance between the GMMs from test/training data with the teachings from Dobry on performing the classification in a “lower dimension space” such as formed by SVD. The motivation to combine would have been that dimensional reduction before the classification would have resulted in a “faster and better separability” between the classes (Dobry, page 1975, col. 2, ¶ 1), i.e. performing a dimension 
Dobry is considered an analogous art as Dobry is reasonably pertinent to the problem faced by the inventor of classifying the GMMs using SVD. 
In addition, Dobry is also analogous as Dobry is reasonably pertinent to the problem of classifying GMMs in a parameter domain – specifically, Dobry provides evidence that the applicant’s use of a parameter domain for classification is substantially similar to classifying in a supervector space, which is typically found in speech analysis. One of ordinary skill, when faced with the problem of classifying spectra data from FTIR, would have reasonably turned towards audio/speech analysis techniques as both are dealing with spectral data, just at different frequency ranges, and audio/speech analysis has numerous examples of applying GMMs to spectral signals and then classifying in the supervector/parameter domain (see pertinent prior art of record below). 

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015, and in further view of Zhang et al., “Classification cation of Fourier Transform Infrared Microscopic Imaging Data of Human Breast Cells by Cluster Analysis and Artificial Neural Networks”, 2003

Regarding claim 13
Emmott, as taken in combination with Cohen and Bigdeli, teaches:
generating a … set of configurations for defining modeling signals to model the energy absorption spectrum signal for the selected pixel using a first number of random seeds (Emmott, as taken in combination as cited above, teaches creating an ensemble of GMMs [set of configurations] by varying the “EM initializations” [e.g., by varying the random seed used to initialize the EM algorithm”);
and 6U.S. Application No. 15/333,888 Response to December 31, 2019 Office Actiongenerating the diagnostic set of configurations using a second number of random seeds greater than the first number (Emmott, as taken in combination as cited above, teaches creating an ensemble of GMMs [set of configurations] by varying the “EM initializations” [e.g., by varying the random seed used to initialize the EM algorithm”)

Cohen, as modified by Emmott and Bigdeli, does not explicitly teach:
	The method of claim 11, further comprising:
	generating a screening set of configurations…
	defining a screening cluster in the parameter domain;
	generating the diagnostic set… responsive to determining that at least one of the configurations in the screening set is within the screening cluster. 


Zhang teaches: 
The method of claim 11, further comprising:
	generating a screening set of configurations …(Zhang, abstract, teaches first classifying/screening “pixels” from FTIR data into “cell and non-cell categories”, i.e. the pixels are screened as either cell or non-cell and then the “cell pixels are subsequently classified into carcinoma and normal categories”, and Zhang page 17, col. 1, ¶ 3 teaches that this two-step process is used “Because the chemical composition of the cell and non-cell pixels is significantly different, their infrared spectra differ”, e.g. fig. 1, in other words Zhang teaches performing a two-step classification for detecting cancer pixels in FTIR images in which the pixels are first screened for having cell tissue, and then for pixels with cell tissue they are classified as being cancerous or normal, wherein Zhang also clarifies that this is a two-step process as the spectra for non-cell pixels is substantially different the cell tissue,  taken in combination with Cohen, as modified by Emmott and Bigdeli, it would have been obvious to use a smaller ensemble of GMMs [e.g., fewer “EM initializations”] to first discriminate cell and non-cell pixels, given that the spectra are substantially different, i.e. it would have been obvious that it would require a much smaller ensemble of GMMs to screen out non-cell tissue as the differences between cell and non-cell spectra are much more substantial, e.g. see fig. 1);
defining a screening cluster in the parameter domain (Zhang, as taken in combination above, teaches this, i.e. Zhang makes it obvious to screen out non-cell pixels, such as by using a small ensemble of GMMs, before determining which cell pixels have cancer);
	… responsive to determining that at least one of the configurations in the screening set is within the screening cluster (Zhang, as taken in combination above, teaches this, i.e. Zhang . 


    PNG
    media_image5.png
    748
    551
    media_image5.png
    Greyscale


It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, as modified by Bigdeli and Emmott on a system for analyzing tissue samples using FTIR data wherein the pixels are classified as having or not having cancer with the teachings from Zhang on performing a two-. 

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Cohen et al., “Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection”, 2014, in view of Emmott et al., “Systematic Construction of Anomaly Detection Benchmarks from Real Data”, 2013 and in further view of Bigdeli et al., “A Fast Noise Resilient Anomaly Detection using GMM-Based Collective Labelling”, 2015  and in further view of Dobry et al., “Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal”, 2011

Regarding Claim 16.
Cohen, as taken in combination above, does not explicitly teach:
	The method of claim 15, wherein the ellipsoid is defined using a singular value decomposition matrix.


wherein the ellipsoid is defined using a singular value decomposition matrix (Dobry, abstract, teaches applying “dimension reduction” to “GMM” “supervectors” which are used for classification with an “SVM” wherein the supervectors are projected into a “reduced space” for “training” a classifier, then see section A on page 1976 which clarifies that a “GMM” is fit to a signal, e.g. a “speech utterance” wherein “Each GMM model is represented by GMM supervector , formed by concatenating all the Gaussians’ means” [parameters of each GMM] and then a “dimension reduction approach” is applied to the supervector, before classification, to “reduce the dimension size”, then see section III.A on page 1977 which teaches “PCA” is used which uses “SVD” to perform the “dimensional reduction” and then see page 1980, col. 1, ¶ 3 which teaches that the classification using SVM “involves a distance calculation” between the input and training vectors, in other words Dobry classifies the GMMs by projecting the parameters of the GMMs into a “space” created by SVD – SVD is applied to both the input data and the training data [training data used for classification cluster])

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, modified above, on a system which classifies GMMs by determining the distance between the GMMs from test/training data with the teachings from Dobry on performing the classification in a “lower dimension space” such as formed by SVD. The motivation to combine would have been that dimensional reduction before the classification would have resulted in a “faster and better separability” between the classes (Dobry, page 1975, col. 2, ¶ 1), i.e. performing a dimension 

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Cohen, modified above, on a system which classifies GMMs by determining the distance between the GMMs from test/training data with the teachings from Dobry on performing the classification in a “lower dimension space” such as formed by SVD. The motivation to combine would have been that dimensional reduction before the classification would have resulted in a “faster and better separability” between the classes (Dobry, page 1975, col. 2, ¶ 1), i.e. performing a dimension reduction before classification would have made the system faster at classifying as the classification analysis would have been performed on smaller, i.e. reduced, data sets. 
Dobry is considered an analogous art as Dobry is reasonably pertinent to the problem faced by the inventor of classifying the GMMs using SVD. 
In addition, Dobry is also analogous as Dobry is reasonably pertinent to the problem of classifying GMMs in a parameter domain – specifically, Dobry provides evidence that the applicant’s use of a parameter domain for classification is substantially similar to classifying in a supervector space, which is typically found in speech analysis. One of ordinary skill, when faced with the problem of classifying spectra data from FTIR, would have reasonably turned towards audio/speech analysis techniques as both are dealing with spectral data, just at different frequency ranges, and audio/speech analysis has numerous examples of applying GMMs to . 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chen et al., “Weighted subspace modeling for semantic concept retrieval using Gaussian mixture models”, see figure 1 – this is using GMMs to create a subspace and see § 3
Engster et al., “Local- and Cluster Weighted Modeling for Prediction and State Estimation of Nonlinear Dynamical Systems”, Dissertation from Gottingen, 2010– see § 5.4 and see figures 5.2 and 5.3 – this is creating an ensemble of cluster weighted models 
Glodek et al., “Ensemble Gaussian mixture models for probability density estimation” – see § 3, page 131 for the process of creating a GMM ensemble for a set number of GMM members including selecting a subset of the GMMs with maximum scores to output a GMM ensemble – then see § 4.1 for “classification performance” from this technique – page 134 teaches that “adding prior knowledge to the EM algorithm” substantially improves the accuracy of a GMM [e.g., if a person of ordinary skill has prior knowledge of the data being fit, the person can configure the EM algorithm with such knowledge to improve the accuracy, e.g. the value of k would have been the most obvious to set], also see § 4.2  - this recites that each GMM in an ensemble is a “configuration” 
Melchior et al., “Filling the gaps: Gaussian mixture models from noisy, truncated or incomplete samples”, 2020 – see § 3.5 “Even with well-chosen initial values and split-and-
Moerland, Mixture Models for Unsupervised and supervised learning”, 2000, PhD Dissertation from Ecole Polytechnique Federale De Lausanne – see page 57 – this provides an example of fitting a GMM with a varying “number of parameters” to data wherein the y-axis is the “negative log-likelihood” – the bottom shows another example wherein “All mixture models have 14 components” wherein the “mixture models” are varied by “20 random initializations of the data”

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID A. HOPKINS whose telephone number is (571)272-0537.  The examiner can normally be reached on Monday to Friday, 8:30AM to 5 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/D.A.H./Examiner, Art Unit 2128              

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128