Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application claims the benefit as a continuation-in-part of U.S. application Ser. No. 15/013,580, filed Feb. 2, 2016, which claims priority to U.S. Provisional Appln. Ser. No. 62/181,333 filed Jun. 18, 2015 and U.S. Provisional Appln. Ser. No. 62/118,930 filed Feb. 20, 2015.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 08/25/2021 has been entered.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/25/2021 was filed after the mailing date of the Final Office Action on 05/25/2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
DETAILED ACTION
This office Action is in response to a Request for Continued Examination (RCE) application received on 08/25/2021. In the RCE, applicant has amended independent claims 1, 12 and 19. Claims 2-11, 13-18 and 20 remain original. No claim has been cancelled and no new claims has been added. 
For this office action, claims 1-20 have been received for consideration and have been examined.  
Response to Arguments
Claim Rejection under 35 U.S.C. § 103
Applicant’s arguments, filed 08/25/2021, with respect to the rejections of claims under 35 U.S.C. § 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new grounds of rejection is made in view of new amendments to the independent claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

s 1-8, 10-17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek et al., (US20140185815A1)  in view of Yu et al., (US20110144986A1) and further in view of NPL Document titled “Trial-Based Calibration for Speaker Recognition in Unseen Conditions” publicly available dated 16-19 June 2014.
Regarding claim 1, Roblek discloses:
	A method for improving accuracy of an audio-based identification, recognition, or detection system by adapting calibration to conditions of a trial, the method comprising:
determining (See FIG. 1; a comparing component 112, a matching component 118; i.e. comparing / matching) characterization data of a probe audio sample (See [0005] i.e. probe audio sample), characterization data of a reference audio sample (See [0005] i.e. reference audio sample), and a score (See [0005] & FIG. 4; i.e. ranking score) ([0005] The acts comprise receiving a probe audio sample, and comparing the probe audio sample to a plurality of reference audio samples to identify at least one matching reference audio sample. In response to identifying a plurality of matching reference audio samples, the acts further comprise assigning respective ranking scores to the matching reference audio samples; [0007] The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples. A plurality of matching reference audio samples that satisfy a sufficient match threshold are identified. The matching reference audio samples are scored); 
wherein the score is determined by a trial (i.e. continuously match comparison objects) that compares the probe audio sample to the reference audio sample ([0030] FIG. 3 illustrates a system 300 that operates as a matching system in accordance with various embodiments disclosed herein. For example, the system 300 operates to continuously match comparison objects, such as a sample audio stream with reference objects (e.g., a reference audio), continuously ranks the match results and generates greater confidence for outputting match results by retaining matches until a predetermined score threshold is satisfied. For example, the system 300 includes the components discussed above and further includes a scoring component 302 that compares ranking scores, updates rankings and determines the sufficient match result);
wherein the trial is conducted by a computer that is communicatively coupled to the audio-based identification, recognition, or detection system  ([0006] Another example of an embodiment includes a system, comprising a memory that stores computer executable components, and a microprocessor that executes computer executable components stored in the memory. The computer executable components comprise a receiving component that receives a first portion of audio streaming content. A comparing component generates a comparison of the first portion of audio streaming content and a plurality of reference audio samples; [0007] in response to execution, cause a computing system comprising a processor to perform operations. The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples). 
Roblek fails to disclose:
	determining a condition of the trial based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample; 
However, Yu discloses:
	determining a condition (i.e. dynamic usage scenario such as a noisy situation/condition) of the trial based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample ([0005] The usage scenario may correspond to a current condition, with another calibration model (or no calibration model) used when a different condition exists, e.g., one calibration model may be used during a noisy condition and another during a non-noisy condition; [0018] The calibration model 106 is one that is trained for the usage scenario, which may be specific to the application and/or possibly dynamically substituted based upon current conditions that correspond to the usage scenario);
using a computer-implemented mathematical model (See FIG.1; i.e. calibration model 104 comprises mathematical model such as naïve Bayes, neural network, and/or logistic regression) ([0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
training the computer-implemented mathematical model (See FIG.1; i.e. calibration model 104 comprises mathematical model such as naïve Bayes) using the selected candidate data ([0019] Note that the calibration model 106 is trained for that application 110 and/or usage scenario based upon transcribed calibration data typically collected under real usage scenarios for the application; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
determining a value (i.e. confidence score 104) of one or more parameters (i.e. model parameters) using the computer-implemented mathematical model ([0017] a speech recognition engine 102 outputs a confidence score 104, which is received by a calibration model 106 using model parameters obtained via training (as described below)); 
adjusting the score to produce a calibrated score (i.e. adjusted confidence score 108) by mathematically (i.e. algorithm such as naïve Bayes, neural network, and/or logistic regression) applying the value of the one or more parameters to the score ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received; [0019] As a result, the adjusted confidence score 108 is more accurate than the original confidence score 104; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression);
outputting (i.e. providing of adjusted confidence score to application 110) the calibrated score to the audio-based identification, recognition, or detection system ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the Roblek reference and include a calibration model into a speech recognition system to train the model and adjust the confidence score, as disclosed by Yu (See Yu: Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios). 
	The motivation to include the calibration model is to improve the quality of speech recognition engine’s confidence score. 
The combination of Roblek and Yu fails to disclose:

However, NPL Document discloses:
	wherein the condition of the trial is unknown (See Abstract: i.e. unseen conditions) prior to the trial (See Page # 1; Abstract: This work presents Trial-Based Calibration (TBC), a novel, automated calibration technique robust to both unseen and widely varying conditions; Page # 5; section 7.3. Trial-Based Calibration (TBC); These results indicate that TBC can more readily adapt to unseen conditions than metadata-based calibration and provide better-calibrated scores for making identification decisions across various conditions);
responsive to the condition of the trial (See Page # 1; Abstract; This work presents Trial-Based Calibration (TBC), a novel, automated calibration technique robust to both unseen and widely varying conditions … Evaluated on a diverse, pooled collection of 5 different databases with 14 distinct conditions), 
selecting candidate data as a subset of available development data  (See Page # 1; Abstract: An audio characterization system is used to select a small subset of candidate calibration audio samples that best match the conditions of the enrollment sample and a subset that resembles the test conditions; See Page # 1, Section 3. Existing Calibration Methods: The process of calibration transforms scores to log-likelihood ratios (LLR). This in turn allows identification scores in isolation to be meaningfully interpreted. Common to all calibration techniques is the need to learn a set of calibration parameters (typically a scale and shift) from a development set. The development set contains both target and impostor scores representative of the conditions expected to be encountered during end use of the system; See Page # 2, Section 3.1 Logistic Regression (Global) Calibration: a single model is trained using all development data. This approach optimizes calibration globally for all conditions in the development data), 
wherein the subset includes probe audio (i.e. test conditions ) and reference audio (i.e. enrollment sample) sample pairs that match the combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample (See Page # 1; Abstract; An audio characterization system is used to select a small subset of candidate calibration audio samples that best match the conditions of the enrollment sample and a subset that resembles the test conditions. Calibration parameters learned from the target and impostor trials generated by pairing up these samples are then used to calibrate the score output from the speaker identification system; See Page # 1, Section 1. Introduction: we explore the problem of calibration when the trial conditions are variable. We wish to obtain a set of calibrated scores for which the optimal decision threshold computed for each pair of enrollment and test conditions is independent of these conditions; See Page # 3; Section 5.2. Calibration Data: Matched Data: A small collection of 1503 segments from the NIST and Fisher corpora of speech data was assembled as an initial held-out dataset. Data was chosen with the goal of matching or approximating conditions in the FBI provided corpus, although it was not possible to represent certain trial conditions (cross-language) and languages. Both telephone and microphone channels were represented with speakers in most languages offering cross-channel trials. Table 2 details the characteristics of this data. The segments provided 10736 target trials and 2.1 million impostor trials from which calibration parameters could be learned).
It would have been obvious to one of the ordinary skilled in the art before the effective filing date of the claimed invention to modify the Roblek and Yu references and include automated calibration technique in speaker recognition, as disclosed by NPL document (See NPL Document, Page # 1; Section 2. Universal Audio Characterization).
The motivation to include automated calibration technique in speaker recognition is to improve the estimation of the parameters for the calibration model and, therefore, improve the final accuracy and reliability of the audio-based recognition, identification, or detection system.
Regarding claim 2, the combination of Roblek, Yu and NPL Document discloses:
The method of claim 1, wherein the characterization data of the probe audio sample and the characterization data of the reference audio sample, respectively, are indicative of a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the probe audio sample and a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the reference audio sample, respectively (Roblek: [0021] an audio matching system receives as input an excerpt of an audio signal (a probe) and tries to locate a corresponding audio excerpt in a large repository of reference audio signals. For example, a mobile phone could record music playing in a noisy environment (e.g., a noisy bar, or elsewhere) that can be utilized by the matching system to return information about the music playing by matching the noisy probe to a large repository of references
Regarding claim 3, the combination of Roblek, Yu and NPL Document discloses:
The method of claim 2, wherein the behavioral characteristic or acoustic environmental characteristic or transmission artifacts characteristic of the probe audio sample and the behavioral characteristic or acoustic environmental characteristic or transmission artifacts characteristic of the reference audio sample, respectively, comprise (i) channel data or (ii) noise data or (iii) reverberation data or (iv) language data or (v) speaker gender data or (vi) sample length data or (vii) at least two of (i), (ii), (iii), (iv), (v), (vi) (Roblek: [0021] an audio matching system receives as input an excerpt of an audio signal (a probe) and tries to locate a corresponding audio excerpt in a large repository of reference audio signals. For example, a mobile phone could record music playing in a noisy environment (e.g., a noisy bar, or elsewhere) that can be utilized by the matching system to return information about the music playing by matching the noisy probe to a large repository of references).
Regarding claim 4, the combination of Roblek, Yu and NPL Document discloses:
The method of claim 1, comprising, when the candidate data does not match the condition of the trial including at least one of 1) insufficiently matched candidate data or 2) no available candidate data, skipping the step of generating the calibrated score and causing the trial to not be used by the audio-based identification, recognition, or detection system (Roblek: [0007] The matching reference audio samples are scored according to a set of parameters and the matching reference audios samples are retained from being outputted that not satisfy a score threshold).
Regarding claim 5, the combination of Roblek, Yu and NPL Document discloses:
Yu: [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression).
Regarding claim 6, the combination of Roblek, Yu and NPL Document discloses:
The method of claim 1, comprising augmenting the candidate data with noise data that is computationally-generated based on the condition (Yu: [0036] This approach basically uses distinct weights on the raw confidence score but shares the same bias weight for different words; [0037] In a third approach, two more features are added for each frame, in addition to the features used in the second approach).
Regarding claim 7, the combination of Roblek, Yu and NPL Document discloses:
The method of claim 1, comprising generating the parameters by incorporating regularization weight data into a linear logistic regression (LLR)-based algorithmic process performed on the score to generate the calibrated score (Yu: [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression).
Regarding claim 8, the combination of Roblek, Yu and NPL Document discloses:
The method of claim 1, wherein the model is a learned model of a type that is selected based on the audio-based identification, recognition, or detection system, wherein the type is a neural network-based model or a probabilistic linear discriminant analysis (PLDA)-based model or a linear logistic regression-based model (Yu: [0020]).
Regarding claim 10, the combination of Roblek, Yu and NPL Document discloses:
The method of claim 1, wherein the parameters include one or more of a scale or a shift or a bias (NPL Document: Page # 1; 3. Existing Calibration Methods The process of calibration transforms scores to log-likelihood ratios (LLR). This in turn allows identification scores in isola- tion to be meaningfully interpreted. Common to all calibration techniques is the need to learn a set of calibration parameters (typically a scale and shift) from a development set).
Regarding claim 11, the combination of Roblek, Yu and NPL Document discloses:
The method of claim 1, wherein the audio-based identification, recognition, or detection system executes a speaker recognition task or a speech activity detection system or a language recognition task or a keyword spotting task or a gender recognition task or a channel recognition task or a speaking style recognition task or an active voice-based biometrics task or a passive voice-based biometrics task or a speech transcription task or a speaker segmentation task, and the method comprises determining the parameters based on the audio-based Yu: [0005] Briefly, various aspects of the subject matter described herein are directed towards a technology by which a calibration model is inserted into a speech recognition system to adjust the confidence score output by a speech recognition engine (recognizer), and thereby provide a calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from at least one previous corresponding usage scenario. The usage scenario may correspond to a current condition, with another calibration model (or no calibration model) used when a different condition exists, e.g., one calibration model may be used during a noisy condition and another during a non-noisy condition).
Regarding claim 12, Roblek discloses:
A system comprising: one or more computer processors; a calibration system coupled to the one or more computer processors, wherein the system performs operations comprising:
determining (See FIG. 1; a comparing component 112, a matching component 118; i.e. comparing / matching) characterization data of a probe audio sample (See [0005] i.e. probe audio sample), characterization data of a reference audio sample (See [0005] i.e. reference audio sample), and a score (See [0005] & FIG. 4; i.e. ranking score) ([0005] The acts comprise receiving a probe audio sample, and comparing the probe audio sample to a plurality of reference audio samples to identify at least one matching reference audio sample. In response to identifying a plurality of matching reference audio samples, the acts further comprise assigning respective ranking scores to the matching reference audio samples; [0007] The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples. A plurality of matching reference audio samples that satisfy a sufficient match threshold are identified. The matching reference audio samples are scored); 
wherein the score is determined by a trial (i.e. continuously match comparison objects) that compares the probe audio sample to the reference audio sample ([0030] FIG. 3 illustrates a system 300 that operates as a matching system in accordance with various embodiments disclosed herein. For example, the system 300 operates to continuously match comparison objects, such as a sample audio stream with reference objects (e.g., a reference audio), continuously ranks the match results and generates greater confidence for outputting match results by retaining matches until a predetermined score threshold is satisfied. For example, the system 300 includes the components discussed above and further includes a scoring component 302 that compares ranking scores, updates rankings and determines the sufficient match result);
wherein the trial is conducted by a computer that is communicatively coupled to the audio-based identification, recognition, or detection system  ([0006] Another example of an embodiment includes a system, comprising a memory that stores computer executable components, and a microprocessor that executes computer executable components stored in the memory. The computer executable components comprise a receiving component that receives a first portion of audio streaming content. A comparing component generates a comparison of the first portion of audio streaming content and a plurality of reference audio samples; [0007] in response to execution, cause a computing system comprising a processor to perform operations. The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples). 
Roblek fails to disclose:
	determining a condition of the trial based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample; wherein the condition of the trial is unknown prior to the trial; responsive to the condition of the trial, selecting candidate data as a subset of available development data, wherein the subset includes probe audio and reference audio sample pairs that match the combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample; training the computer-implemented mathematical model using the selected candidate data; determining a value of one or more parameters using the computer-implemented mathematical model; adjusting the score to produce a calibrated score by mathematically applying the value of the one or more parameters to the score; outputting the calibrated score to the audio-based identification, recognition, or detection system.
However, Yu discloses:
	determining a condition (i.e. dynamic usage scenario such as a noisy situation/condition) of the trial based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample ([0005] The usage scenario may correspond to a current condition, with another calibration model (or no calibration model) used when a different condition exists, e.g., one calibration model may be used during a noisy condition and another during a non-noisy condition; [0018] The calibration model 106 is one that is trained for the usage scenario, which may be specific to the application and/or possibly dynamically substituted based upon current conditions that correspond to the usage scenario);
using a computer-implemented mathematical model (See FIG.1; i.e. calibration model 104 comprises mathematical model such as naïve Bayes, neural network, and/or logistic regression) ([0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
training the computer-implemented mathematical model (See FIG.1; i.e. calibration model 104 comprises mathematical model such as naïve Bayes) using the selected candidate data ([0019] Note that the calibration model 106 is trained for that application 110 and/or usage scenario based upon transcribed calibration data typically collected under real usage scenarios for the application; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
determining a value (i.e. confidence score 104) of one or more parameters (i.e. model parameters) using the computer-implemented mathematical model ([0017] a speech recognition engine 102 outputs a confidence score 104, which is received by a calibration model 106 using model parameters obtained via training (as described below)); 
i.e. adjusted confidence score 108) by mathematically (i.e. algorithm such as naïve Bayes, neural network, and/or logistic regression) applying the value of the one or more parameters to the score ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received; [0019] As a result, the adjusted confidence score 108 is more accurate than the original confidence score 104; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression);
outputting (i.e. providing of adjusted confidence score to application 110) the calibrated score to the audio-based identification, recognition, or detection system ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the Roblek reference and include a calibration model into a speech recognition system to train the model and adjust the confidence score, as disclosed by Yu (See Yu: Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios). 
	The motivation to include the calibration model is to improve the quality of speech recognition engine’s confidence score. 
The combination of Roblek and Yu fails to disclose:
	wherein the condition of the trial is unknown prior to the trial; responsive to the condition of the trial, selecting candidate data as a subset of available development data, wherein the subset includes probe audio and reference audio sample pairs that match the combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample.
However, NPL Document discloses:
	wherein the condition of the trial is unknown (See Abstract: i.e. unseen) prior to the trial (See Page # 1; Abstract: This work presents Trial-Based Calibration (TBC), a novel, automated calibration technique robust to both unseen and widely varying conditions; Page # 5; section 7.3. Trial-Based Calibration (TBC); These results indicate that TBC can more readily adapt to unseen conditions than metadata-based calibration and provide better-calibrated scores for making identification decisions across various conditions);
responsive to the condition of the trial (See Page # 1; Abstract; This work presents Trial-Based Calibration (TBC), a novel, automated calibration technique robust to both unseen and widely varying conditions … Evaluated on a diverse, pooled collection of 5 different databases with 14 distinct conditions), 
See Page # 1; Abstract: An audio characterization system is used to select a small subset of candidate calibration audio samples that best match the conditions of the enrollment sample and a subset that resembles the test conditions; See Page # 1, Section 3. Existing Calibration Methods: The process of calibration transforms scores to log-likelihood ratios (LLR). This in turn allows identification scores in isolation to be meaningfully interpreted. Common to all calibration techniques is the need to learn a set of calibration parameters (typically a scale and shift) from a development set. The development set contains both target and impostor scores representative of the conditions expected to be encountered during end use of the system; See Page # 2, Section 3.1 Logistic Regression (Global) Calibration: a single model is trained using all development data. This approach optimizes calibration globally for all conditions in the development data), 
wherein the subset includes probe audio (i.e. test conditions) and reference audio (i.e. enrollment sample) sample pairs that match the combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample (See Page # 1; Abstract; An audio characterization system is used to select a small subset of candidate calibration audio samples that best match the conditions of the enrollment sample and a subset that resembles the test conditions. Calibration parameters learned from the target and impostor trials generated by pairing up these samples are then used to calibrate the score output from the speaker identification system; See Page # 1, Section 1. Introduction: we explore the problem of calibration when the trial conditions are variable. We wish to obtain a set of calibrated scores for which the optimal decision threshold computed for each pair of enrollment and test conditions is independent of these conditions; See Page # 3; Section 5.2. Calibration Data: Matched Data: A small collection of 1503 segments from the NIST and Fisher corpora of speech data was assembled as an initial held-out dataset. Data was chosen with the goal of matching or approximating conditions in the FBI provided corpus, although it was not possible to represent certain trial conditions (cross-language) and languages. Both telephone and microphone channels were represented with speakers in most languages offering cross-channel trials. Table 2 details the characteristics of this data. The segments provided 10736 target trials and 2.1 million impostor trials from which calibration parameters could be learned).
It would have been obvious to one of the ordinary skilled in the art before the effective filing date of the claimed invention to modify the Roblek and Yu references and include automated calibration technique in speaker recognition, as disclosed by NPL document (See NPL Document, Page # 1; Section 2. Universal Audio Characterization).
The motivation to include automated calibration technique in speaker recognition is to improve the estimation of the parameters for the calibration model and, therefore, improve the final accuracy and reliability of the audio-based recognition, identification, or detection system. 
Regarding claim 13, the combination of Roblek, Yu and NPL Document discloses:
The system of claim 12, wherein the characterization data of the probe audio sample and the characterization data of the reference audio sample, respectively, are indicative of a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the probe audio sample and a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the reference audio Roblek: [0021] an audio matching system receives as input an excerpt of an audio signal (a probe) and tries to locate a corresponding audio excerpt in a large repository of reference audio signals. For example, a mobile phone could record music playing in a noisy environment (e.g., a noisy bar, or elsewhere) that can be utilized by the matching system to return information about the music playing by matching the noisy probe to a large repository of references).
Regarding claim 14, the combination of Roblek, Yu and NPL Document discloses:
The system of claim 12, wherein the calibration system performs operations comprising, when the candidate data does not match the condition of the trial including at least one of 1) insufficiently matched candidate data or 2) no available candidate data, skipping the step of generating the calibrated score and causing the trial to not be used by the audio-based identification, recognition, or detection system (Roblek: [0007] The matching reference audio samples are scored according to a set of parameters and the matching reference audios samples are retained from being outputted that not satisfy a score threshold).
Regarding claim 15, the combination of Roblek, Yu and NPL Document discloses:
The system of claim 12, wherein the calibration system performs operations comprising training a machine learning-based model using the one or more parameters, and using the Yu: [0020-0021] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression).
Regarding claim 16, the combination of Roblek, Yu and NPL Document discloses:
The system of claim 12, wherein the calibration system performs operations comprising augmenting the candidate data with noise data that is computationally-generated based on the condition (Yu: [0036] This approach basically uses distinct weights on the raw confidence score but shares the same bias weight for different words; [0037] In a third approach, two more features are added for each frame, in addition to the features used in the second approach).
Regarding claim 17, the combination of Roblek, Yu and NPL Document discloses:
The system of claim 12, wherein the calibration system performs operations comprising generating the parameters by incorporating regularization weight data into a linear logistic regression (LLR)-based algorithmic process performed on the score to generate the calibrated score (Yu: [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression).
Regarding claim 19, Roblek discloses:

one or more non-transitory computer-readable storage media comprising instructions which, when executed by one or more processors, cause:
determining (See FIG. 1; a comparing component 112, a matching component 118; i.e. comparing / matching) characterization data of a probe audio sample (See [0005] i.e. probe audio sample), characterization data of a reference audio sample (See [0005] i.e. reference audio sample), and a score (See [0005] & FIG. 4; i.e. ranking score) ([0005] The acts comprise receiving a probe audio sample, and comparing the probe audio sample to a plurality of reference audio samples to identify at least one matching reference audio sample. In response to identifying a plurality of matching reference audio samples, the acts further comprise assigning respective ranking scores to the matching reference audio samples; [0007] The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples. A plurality of matching reference audio samples that satisfy a sufficient match threshold are identified. The matching reference audio samples are scored); 
wherein the score is determined by a trial (i.e. continuously match comparison objects) that compares the probe audio sample to the reference audio sample ([0030] FIG. 3 illustrates a system 300 that operates as a matching system in accordance with various embodiments disclosed herein. For example, the system 300 operates to continuously match comparison objects, such as a sample audio stream with reference objects (e.g., a reference audio), continuously ranks the match results and generates greater confidence for outputting match results by retaining matches until a predetermined score threshold is satisfied. For example, the system 300 includes the components discussed above and further includes a scoring component 302 that compares ranking scores, updates rankings and determines the sufficient match result);
wherein the trial is conducted by a computer that is communicatively coupled to the audio-based identification, recognition, or detection system  ([0006] Another example of an embodiment includes a system, comprising a memory that stores computer executable components, and a microprocessor that executes computer executable components stored in the memory. The computer executable components comprise a receiving component that receives a first portion of audio streaming content. A comparing component generates a comparison of the first portion of audio streaming content and a plurality of reference audio samples; [0007] in response to execution, cause a computing system comprising a processor to perform operations. The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples). 
Roblek fails to disclose:
	determining a condition of the trial based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample; wherein the condition of the trial is unknown prior to the trial; responsive to the condition of the trial, selecting candidate data as a subset of available development data, wherein the subset includes probe audio and reference audio sample pairs that match the combination of the characterization data of the probe audio sample and the characterization data of the 
However, Yu discloses:
	determining a condition (i.e. dynamic usage scenario such as a noisy situation/condition) of the trial based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample ([0005] The usage scenario may correspond to a current condition, with another calibration model (or no calibration model) used when a different condition exists, e.g., one calibration model may be used during a noisy condition and another during a non-noisy condition; [0018] The calibration model 106 is one that is trained for the usage scenario, which may be specific to the application and/or possibly dynamically substituted based upon current conditions that correspond to the usage scenario);
using a computer-implemented mathematical model (See FIG.1; i.e. calibration model 104 comprises mathematical model such as naïve Bayes, neural network, and/or logistic regression) ([0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
See FIG.1; i.e. calibration model 104 comprises mathematical model such as naïve Bayes) using the selected candidate data ([0019] Note that the calibration model 106 is trained for that application 110 and/or usage scenario based upon transcribed calibration data typically collected under real usage scenarios for the application; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
determining a value (i.e. confidence score 104) of one or more parameters (i.e. model parameters) using the computer-implemented mathematical model ([0017] a speech recognition engine 102 outputs a confidence score 104, which is received by a calibration model 106 using model parameters obtained via training (as described below)); 
adjusting the score to produce a calibrated score (i.e. adjusted confidence score 108) by mathematically (i.e. algorithm such as naïve Bayes, neural network, and/or logistic regression) applying the value of the one or more parameters to the score ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received; [0019] As a result, the adjusted confidence score 108 is more accurate than the original confidence score 104; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression);
outputting (i.e. providing of adjusted confidence score to application 110) the calibrated score to the audio-based identification, recognition, or detection system ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the Roblek reference and include a calibration model into a speech recognition system to train the model and adjust the confidence score, as disclosed by Yu (See Yu: Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios). 
	The motivation to include the calibration model is to improve the quality of speech recognition engine’s confidence score. 
The combination of Roblek and Yu fails to disclose:
	wherein the condition of the trial is unknown prior to the trial; responsive to the condition of the trial, selecting candidate data as a subset of available development data, wherein the subset includes probe audio and reference audio sample pairs that match the 
However, NPL Document discloses:
	wherein the condition of the trial is unknown (See Abstract: i.e. unseen) prior to the trial (See Page # 1; Abstract: This work presents Trial-Based Calibration (TBC), a novel, automated calibration technique robust to both unseen and widely varying conditions; Page # 5; section 7.3. Trial-Based Calibration (TBC); These results indicate that TBC can more readily adapt to unseen conditions than metadata-based calibration and provide better-calibrated scores for making identification decisions across various conditions);
responsive to the condition of the trial (See Page # 1; Abstract; This work presents Trial-Based Calibration (TBC), a novel, automated calibration technique robust to both unseen and widely varying conditions … Evaluated on a diverse, pooled collection of 5 different databases with 14 distinct conditions), 
selecting candidate data as a subset of available development data  (See Page # 1; Abstract: An audio characterization system is used to select a small subset of candidate calibration audio samples that best match the conditions of the enrollment sample and a subset that resembles the test conditions; See Page # 1, Section 3. Existing Calibration Methods: The process of calibration transforms scores to log-likelihood ratios (LLR). This in turn allows identification scores in isolation to be meaningfully interpreted. Common to all calibration techniques is the need to learn a set of calibration parameters (typically a scale and shift) from a development set. The development set contains both target and impostor scores representative of the conditions expected to be encountered during end use of the system; See Page # 2, Section 3.1 Logistic Regression (Global) Calibration: a single model is trained using all development data. This approach optimizes calibration globally for all conditions in the development data), 
wherein the subset includes probe audio (i.e. test conditions) and reference audio (i.e. enrollment sample) sample pairs that match the combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample (See Page # 1; Abstract; An audio characterization system is used to select a small subset of candidate calibration audio samples that best match the conditions of the enrollment sample and a subset that resembles the test conditions. Calibration parameters learned from the target and impostor trials generated by pairing up these samples are then used to calibrate the score output from the speaker identification system; See Page # 1, Section 1. Introduction: we explore the problem of calibration when the trial conditions are variable. We wish to obtain a set of calibrated scores for which the optimal decision threshold computed for each pair of enrollment and test conditions is independent of these conditions; See Page # 3; Section 5.2. Calibration Data: Matched Data: A small collection of 1503 segments from the NIST and Fisher corpora of speech data was assembled as an initial held-out dataset. Data was chosen with the goal of matching or approximating conditions in the FBI provided corpus, although it was not possible to represent certain trial conditions (cross-language) and languages. Both telephone and microphone channels were represented with speakers in most languages offering cross-channel trials. Table 2 details the characteristics of this data. The segments provided 10736 target trials and 2.1 million impostor trials from which calibration parameters could be learned).
See NPL Document, Page # 1; Section 2. Universal Audio Characterization).
The motivation to include automated calibration technique in speaker recognition is to improve the estimation of the parameters for the calibration model and, therefore, improve the final accuracy and reliability of the audio-based recognition, identification, or detection system. 

Claims 9, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek et al., (US20140185815A1)  in view of Yu et al., (US20110144986A1) in view of NPL Document titled “Trial-Based Calibration for Speaker Recognition in Unseen Conditions” dated 16-19 June 2014 and further in view of Huo et al., (US20150199960A1).
Regarding claim 9, the combination of Roblek, Yu and NPL Document fails to disclose:
The method of claim 1, wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation.
However, Huo discloses:
	wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio [0003] Described herein are techniques for using clustering training data in speech recognition. An i-vector may be extracted from a training speech segment of a training data (e.g., a training corpus). The extracted i-vectors of the training data may then be clustered into multiple clusters to identify multiple acoustic conditions. The multiple clusters may be used to train acoustic models associated with the multiple acoustic conditions. The trained acoustic models may be used in speech recognition; [0005] In some aspects, an i-vector may be extracted from an unknown speech segment. One or more clusters may be selected based on similarities between the i-vector and the one or more clusters. One or more acoustic models corresponding to the one or more clusters may then be determined. The unknown speech segment may be recognized using the one or more determined acoustic models).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the references of Roblek, Yu and NPL Document and train speech recognition data using i-vector based clustering, as disclosed by Huo (See Huo: Abstract: Methods and systems for i-vector based clustering training data in speech recognition are described. An i-vector may be extracted from a speech segment of a speech training data to represent acoustic information). 
	The motivation to train speech recognition data using i-vector based clustering is to identify multiple acoustic conditions and used in speech recognition.
Regarding claim 18, the combination of Roblek, Yu and NPL Document discloses:
The system of claim 12, wherein the model is a learned model of a type that is selected based on the audio-based identification, recognition, or detection system, wherein the type is a Yu: [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression). 
The combination of Roblek, Yu and NPL Document fails to disclose:
wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation, and wherein the parameters are determined based on a speaker recognition task or a speech activity detection system or a language recognition task or a keyword spotting task or a gender recognition task or a channel recognition task or a speaking style recognition task or an active voice-based biometrics task or a passive voice-based biometrics task or a speech transcription task or a speaker segmentation task.
However, Huo discloses:
	wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation, and wherein the parameters are determined based on a speaker recognition task or a speech activity detection system or a language recognition [0003] An i-vector may be extracted from a training speech segment of a training data (e.g., a training corpus). The extracted i-vectors of the training data may then be clustered into multiple clusters to identify multiple acoustic conditions. The multiple clusters may be used to train acoustic models associated with the multiple acoustic conditions. The trained acoustic models may be used in speech recognition).
It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the references of Roblek, Yu and NPL Document and train speech recognition data using i-vector based clustering, as disclosed by Huo (See Huo: Abstract: Methods and systems for i-vector based clustering training data in speech recognition are described. An i-vector may be extracted from a speech segment of a speech training data to represent acoustic information). 
	The motivation to train speech recognition data using i-vector based clustering is to identify multiple acoustic conditions and used in speech recognition.
Regarding claim 20, the combination of Roblek, Yu and NPL Document discloses:
The computer program product of claim 19, wherein the characterization data of the probe audio sample and the characterization data of the reference audio sample, respectively, are indicative of a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the probe audio sample and a behavioral characteristic Yu: [0021] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression). 
The combination of Roblek, Yu and NPL Document fails to disclose:
wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation, and wherein the instructions, when executed by 
However, Huo discloses:
	wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation, and wherein the instructions, when executed by the one or more processors, cause determination of the parameters based on a speaker recognition task or a speech activity detection system or a language recognition task or a keyword spotting task or a gender recognition task or a channel recognition task or a speaking style recognition task or an active voice-based biometrics task or a passive voice-based biometrics task or a speech transcription task or a speaker segmentation task ([0003] An i-vector may be extracted from a training speech segment of a training data (e.g., a training corpus). The extracted i-vectors of the training data may then be clustered into multiple clusters to identify multiple acoustic conditions. The multiple clusters may be used to train acoustic models associated with the multiple acoustic conditions. The trained acoustic models may be used in speech recognition).

	The motivation to train speech recognition data using i-vector based clustering is to identify multiple acoustic conditions and used in speech recognition.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYED M AHSAN whose telephone number is (571)272-5018.  The examiner can normally be reached on 8:30 AM - 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jeffery L. Nickerson can be reached on 469-295-9235.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

/S.M.A./Patent Examiner, Art Unit 2432               

/Jeffrey Nickerson/Supervisory Patent Examiner, Art Unit 2432