Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application claims the benefit as a continuation-in-part of U.S. application Ser. No. 15/013,580, filed Feb. 2, 2016, which claims priority to U.S. Provisional Appl. Ser. No. 62/181,333 filed Jun. 18, 2015 and U.S. Provisional Appl. Ser. No. 62/118,930 filed Feb. 20, 2015.
DETAILED ACTION
	This office action is in response to an amendment application received on 02/17/2021. In the amendment, applicant has amended claims 1, 8, 12 and 18-20. Claim 2-7, 9-11 and 13-17 remain original. No claim has been cancelled and no new claim has been added. 
	For this office action, claims 1-20 have been received for consideration and have been examined. 
Response to Arguments
Claim Objections
	Applicant’s amendment to claim 20 has been reviewed by the examiner and appears to overcome the claim objection to claim 20. Therefore examiner has withdrawn this objection. 
Claim Rejection under 35 U.S.C. § 112
	Applicant’s amendment to independent claim has been reviewed by the examiner and appears to overcome the 112(b) indefiniteness rejection. Therefore examiner has withdrawn this rejection. 
Claim Rejection under 35 U.S.C. § 103
	Applicant’s remarks with respect to claim rejection under 35 U.S.C. § 103 have been fully considered, however, examiner does not find them to be persuasive. Applicant’s remarks regarding rejection of claims 1-8, 10-17 and 19 on page # 10-11 are mentioned as follows:

“The Office Action has acknowledged that Roblek does not describe a calibration model. Yu's alleged calibration model is trained in a "training environment" using "labeled training data." Therefore, Yu's calibration model is not trained using training data that is responsive to a condition of a trial, where the condition is determined based on characterization data of probe and reference audio samples that are compared by the trial. Yu does not describe a model that is used to determine a parameter value that is applied to a score to produce a calibrated score.”
Examiner’s Response
	Examiner respectfully disagrees with applicant’s remarks that secondary reference Yu’s calibration model is not trained using training data that is responsive to a condition of a trial. Examiner would like to point out that Yu also discloses an invention which shows improvement in the quality of a speech recognition engine's confidence score by calibrating the score for each specific usage scenario. Examiner would like to mention that ‘specific usage scenario’ is equivalent to claimed ‘a condition of a trial’. Yu clearly teaches training calibration model for different usage scenarios such as noisy conditions, different context free grammar, and different speakers with different dialect/accent which are condition(s) unknown prior to the trial as claimed (See Yu’s para [0017-0021]). 

	Based on above explanation, examiner believe that combination of cited references still teach the amended claim language. Therefore, examiner is compelled to maintain the rejection. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8, 10-17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek et al., (US20140185815A1)  in view of Yu et al., (US20110144986A1).
Regarding claim 1, Roblek discloses:
	A method for improving accuracy of an audio-based identification, recognition, or detection system by adapting calibration to conditions of a trial, the method comprising: 
i.e. comparing / matching) characterization data of a probe audio sample (i.e. probe audio sample), characterization data of a reference audio sample (i.e. reference audio sample), and a score (i.e. ranking score) ([0005] The acts comprise receiving a probe audio sample, and comparing the probe audio sample to a plurality of reference audio samples to identify at least one matching reference audio sample. In response to identifying a plurality of matching reference audio samples, the acts further comprise assigning respective ranking scores to the matching reference audio samples; [0007] The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples. A plurality of matching reference audio samples that satisfy a sufficient match threshold are identified. The matching reference audio samples are scored);
wherein the score is determined by a trial (i.e. continuously match comparison objects) that compares the probe audio sample to the reference audio sample ([0030] FIG. 3 illustrates a system 300 that operates as a matching system in accordance with various embodiments disclosed herein. For example, the system 300 operates to continuously match comparison objects, such as a sample audio stream with reference objects (e.g., a reference audio), continuously ranks the match results and generates greater confidence for outputting match results by retaining matches until a predetermined score threshold is satisfied. For example, the system 300 includes the components discussed above and further includes a scoring component 302 that compares ranking scores, updates rankings and determines the sufficient match result);
[0006] Another example of an embodiment includes a system, comprising a memory that stores computer executable components, and a microprocessor that executes computer executable components stored in the memory. The computer executable components comprise a receiving component that receives a first portion of audio streaming content. A comparing component generates a comparison of the first portion of audio streaming content and a plurality of reference audio samples; [0007] in response to execution, cause a computing system comprising a processor to perform operations. The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples).
Roblek fails to disclose:
	determining a condition based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample; wherein the condition is unknown prior to the trial; responsive to the condition, selecting candidate data; training a computer-implemented mathematical model using the selected candidate data; determining a value of one or more parameters using the computer-implemented mathematical model; adjusting the score to produce a calibrated score by mathematically applying the value of the one or more parameters to the score; outputting the calibrated score to the audio-based identification, recognition, or detection system.
However, Yu discloses:
i.e. dynamic usage scenario such as a noisy situation/condition) based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample ([0018] The calibration model 106 is one that is trained for the usage scenario, which may be specific to the application and/or possibly dynamically substituted based upon current conditions that correspond to the usage scenario); 
wherein the condition is unknown prior to the trial (i.e. current usage scenario is construed as ‘unknown condition’) ([0018] For example, in a noisy situation, a calibration model trained under noisy conditions may be used in place of a normal noise-level calibration model. Other variable conditions may include grammar (e.g., different context free grammar or n-gram for different dialog turn), different speakers (e.g., dialect or accent, and/or a low versus high voice)); 
responsive (i.e. training the calibration model in current/dynamic usage scenario) to the condition, selecting candidate data (i.e. selecting a usage scenario such as location of the telephone call or accent/dialect) ([0018] For example, if a telephone call is received from one location versus another (e.g., as detected via caller ID), a calibration model trained for that location's accent/dialect may be dynamically selected for use. Alternatively, the accent/dialect may be otherwise detected and used to select an appropriate calibration model.); 
training a computer-implemented mathematical model (i.e. calibration model 104 comprises mathematical model such as naïve Bayes) using the selected candidate data ([0019] Note that the calibration model 106 is trained for that application 110 and/or usage scenario based upon transcribed calibration data typically collected under real usage scenarios for the application; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
determining a value (i.e. confidence score 104) of one or more parameters (i.e. model parameters) using the computer-implemented mathematical model ([0017] a speech recognition engine 102 outputs a confidence score 104, which is received by a calibration model 106 using model parameters obtained via training (as described below)); 
adjusting the score to produce a calibrated score (i.e. adjusted confidence score 108) by mathematically (i.e. algorithm such as naïve Bayes, neural network, and/or logistic regression) applying the value of the one or more parameters to the score ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received; [0019] As a result, the adjusted confidence score 108 is more accurate than the original confidence score 104; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression);
outputting (i.e. providing of adjusted confidence score to application 110) the calibrated score to the audio-based identification, recognition, or detection system ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the Roblek reference and include a calibration model into a speech recognition system to train the model and adjust the confidence score, as disclosed by Yu (See Yu: Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios). 
	The motivation to include the calibration model is to improve the quality of speech recognition engine’s confidence score. 
Regarding claim 2, the combination of Roblek and Yu discloses:
The method of claim 1, wherein the characterization data of the probe audio sample and the characterization data of the reference audio sample, respectively, are indicative of a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the probe audio sample and a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the reference audio sample, respectively (Roblek: [0021] an audio matching system receives as input an excerpt of an audio signal (a probe) and tries to locate a corresponding audio excerpt in a large repository of reference audio signals. For example, a mobile phone could record music playing in a noisy environment (e.g., a noisy bar, or elsewhere) that can be utilized by the matching system to return information about the music playing by matching the noisy probe to a large repository of references).
Regarding claim 3, the combination of Roblek and Yu discloses:
The method of claim 2, wherein the behavioral characteristic or acoustic environmental characteristic or transmission artifacts characteristic of the probe audio sample and the behavioral characteristic or acoustic environmental characteristic or transmission artifacts characteristic of the reference audio sample, respectively, comprise (i) channel data or (ii) noise data or (iii) reverberation data or (iv) language data or (v) speaker gender data or (vi) sample length data or (vii) at least two of (i), (ii), (iii), (iv), (v), (vi) (Roblek: [0021] an audio matching system receives as input an excerpt of an audio signal (a probe) and tries to locate a corresponding audio excerpt in a large repository of reference audio signals. For example, a mobile phone could record music playing in a noisy environment (e.g., a noisy bar, or elsewhere) that can be utilized by the matching system to return information about the music playing by matching the noisy probe to a large repository of references).
Regarding claim 4, the combination of Roblek and Yu discloses:
The method of claim 1, comprising, when the candidate data does not match the condition of the trial including at least one of 1) insufficiently matched candidate data or 2) no available candidate data, skipping the step of generating the calibrated score and causing the trial to not be used by the audio-based identification, recognition, or detection system (Roblek: [0007] The matching reference audio samples are scored according to a set of parameters and the matching reference audios samples are retained from being outputted that not satisfy a score threshold).
Regarding claim 5, the combination of Roblek and Yu discloses:
The method of claim 1, comprising training a machine learning-based model using the one or more parameters, and using the machine learning-based model to improve calibration of either the audio-based identification, recognition or detection system or another audio-based identification, recognition, or detection system (Yu: [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression).
Regarding claim 6, the combination of Roblek and Yu discloses:
The method of claim 1, comprising augmenting the candidate data with noise data that is computationally-generated based on the condition (Yu: [0036] This approach basically uses distinct weights on the raw confidence score but shares the same bias weight for different words; [0037] In a third approach, two more features are added for each frame, in addition to the features used in the second approach).
Regarding claim 7, the combination of Roblek and Yu discloses:
The method of claim 1, comprising generating the parameters by incorporating regularization weight data into a linear logistic regression (LLR)-based algorithmic process Yu: [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression).
Regarding claim 8, the combination of Roblek and Yu discloses:
The method of claim 1, wherein the model is a learned model of a type that is selected based on the audio-based identification, recognition, or detection system, wherein the type is a neural network-based model or a probabilistic linear discriminant analysis (PLDA)-based model or a linear logistic regression-based model (Yu: [0020]).
Regarding claim 10, the combination of Roblek and Yu discloses:
The method of claim 1, wherein the parameters include one or more of a scale or a shift or a bias (Yu: [0035-0037]).
Regarding claim 11, the combination of Roblek and Yu discloses:
The method of claim 1, wherein the audio-based identification, recognition, or detection system executes a speaker recognition task or a speech activity detection system or a language recognition task or a keyword spotting task or a gender recognition task or a channel recognition task or a speaking style recognition task or an active voice-based biometrics task or a passive voice-based biometrics task or a speech transcription task or a speaker segmentation Yu: [0005]).
Regarding claim 12, Roblek discloses:
A system comprising: one or more computer processors; a calibration system coupled to the one or more computer processors, wherein the system performs operations comprising:
determining (i.e. comparing / matching) characterization data of a probe audio sample (i.e. probe audio sample), characterization data of a reference audio sample (i.e. reference audio sample), and a score (i.e. ranking score) ([0005] The acts comprise receiving a probe audio sample, and comparing the probe audio sample to a plurality of reference audio samples to identify at least one matching reference audio sample. In response to identifying a plurality of matching reference audio samples, the acts further comprise assigning respective ranking scores to the matching reference audio samples; [0007] The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples. A plurality of matching reference audio samples that satisfy a sufficient match threshold are identified. The matching reference audio samples are scored);
wherein the score is determined by a trial (i.e. continuously match comparison objects) that compares the probe audio sample to the reference audio sample ([0030] FIG. 3 illustrates a system 300 that operates as a matching system in accordance with various embodiments disclosed herein. For example, the system 300 operates to continuously match comparison objects, such as a sample audio stream with reference objects (e.g., a reference audio), continuously ranks the match results and generates greater confidence for outputting match results by retaining matches until a predetermined score threshold is satisfied. For example, the system 300 includes the components discussed above and further includes a scoring component 302 that compares ranking scores, updates rankings and determines the sufficient match result);
wherein the trial is conducted by a computer that is communicatively coupled to the audio-based identification, recognition, or detection system ([0006] Another example of an embodiment includes a system, comprising a memory that stores computer executable components, and a microprocessor that executes computer executable components stored in the memory. The computer executable components comprise a receiving component that receives a first portion of audio streaming content. A comparing component generates a comparison of the first portion of audio streaming content and a plurality of reference audio samples; [0007] in response to execution, cause a computing system comprising a processor to perform operations. The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples).
Roblek fails to disclose:
	determining a condition based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample; wherein the condition is unknown prior to the trial; responsive to the condition, selecting candidate data; training a computer-implemented mathematical model using the selected candidate data; determining a value of one or more parameters using the computer-implemented mathematical model; adjusting the score to produce a calibrated score by mathematically 
However, Yu discloses:
determining a condition (i.e. dynamic usage scenario such as a noisy situation/condition) based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample ([0018] The calibration model 106 is one that is trained for the usage scenario, which may be specific to the application and/or possibly dynamically substituted based upon current conditions that correspond to the usage scenario); 
wherein the condition is unknown prior to the trial (i.e. current usage scenario is construed as ‘unknown condition’) ([0018] For example, in a noisy situation, a calibration model trained under noisy conditions may be used in place of a normal noise-level calibration model. Other variable conditions may include grammar (e.g., different context free grammar or n-gram for different dialog turn), different speakers (e.g., dialect or accent, and/or a low versus high voice)); 
responsive (i.e. training the calibration model in current/dynamic usage scenario) to the condition, selecting candidate data (i.e. selecting a usage scenario such as location of the telephone call or accent/dialect) ([0018] For example, if a telephone call is received from one location versus another (e.g., as detected via caller ID), a calibration model trained for that location's accent/dialect may be dynamically selected for use. Alternatively, the accent/dialect may be otherwise detected and used to select an appropriate calibration model.); 
training a computer-implemented mathematical model (i.e. calibration model 104 comprises mathematical model such as naïve Bayes) using the selected candidate data ([0019] Note that the calibration model 106 is trained for that application 110 and/or usage scenario based upon transcribed calibration data typically collected under real usage scenarios for the application; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
determining a value (i.e. confidence score 104) of one or more parameters (i.e. model parameters) using the computer-implemented mathematical model ([0017] a speech recognition engine 102 outputs a confidence score 104, which is received by a calibration model 106 using model parameters obtained via training (as described below)); 
adjusting the score to produce a calibrated score (i.e. adjusted confidence score 108) by mathematically (i.e. algorithm such as naïve Bayes, neural network, and/or logistic regression) applying the value of the one or more parameters to the score ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received; [0019] As a result, the adjusted confidence score 108 is more accurate than the original confidence score 104; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression);
i.e. providing of adjusted confidence score to application 110) the calibrated score to the audio-based identification, recognition, or detection system ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the Roblek reference and include a calibration model into a speech recognition system to train the model and adjust the confidence score, as disclosed by Yu (See Yu: Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios). 
	The motivation to include the calibration model is to improve the quality of speech recognition engine’s confidence score. 
Regarding claim 13, the combination of Roblek and Yu discloses:
The system of claim 12, wherein the characterization data of the probe audio sample and the characterization data of the reference audio sample, respectively, are indicative of a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the probe audio sample and a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the reference audio Roblek: [0021] an audio matching system receives as input an excerpt of an audio signal (a probe) and tries to locate a corresponding audio excerpt in a large repository of reference audio signals. For example, a mobile phone could record music playing in a noisy environment (e.g., a noisy bar, or elsewhere) that can be utilized by the matching system to return information about the music playing by matching the noisy probe to a large repository of references).
Regarding claim 14, the combination of Roblek and Yu discloses:
The system of claim 12, wherein the calibration system performs operations comprising, when the candidate data does not match the condition of the trial including at least one of 1) insufficiently matched candidate data or 2) no available candidate data, skipping the step of generating the calibrated score and causing the trial to not be used by the audio-based identification, recognition, or detection system (Roblek: [0007] The matching reference audio samples are scored according to a set of parameters and the matching reference audios samples are retained from being outputted that not satisfy a score threshold).	
Regarding claim 15, the combination of Roblek and Yu discloses:
	The system of claim 12, wherein the calibration system performs operations comprising training a machine learning-based model using the one or more parameters, and using the Yu: [0020-0021] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression).
Regarding claim 16, the combination of Roblek and Yu discloses:
The system of claim 12, wherein the calibration system performs operations comprising augmenting the candidate data with noise data that is computationally-generated based on the condition (Yu: [0036] This approach basically uses distinct weights on the raw confidence score but shares the same bias weight for different words; [0037] In a third approach, two more features are added for each frame, in addition to the features used in the second approach).
Regarding claim 17, the combination of Roblek and Yu discloses:
The system of claim 12, wherein the calibration system performs operations comprising generating the parameters by incorporating regularization weight data into a linear logistic regression (LLR)-based algorithmic process performed on the score to generate the calibrated score (Yu: [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression).
Regarding claim 19, the combination of Roblek and Yu discloses:

one or more non-transitory computer-readable storage media comprising instructions which, when executed by one or more processors, cause:
determining (i.e. comparing / matching) characterization data of a probe audio sample (i.e. probe audio sample), characterization data of a reference audio sample (i.e. reference audio sample), and a score (i.e. ranking score) ([0005] The acts comprise receiving a probe audio sample, and comparing the probe audio sample to a plurality of reference audio samples to identify at least one matching reference audio sample. In response to identifying a plurality of matching reference audio samples, the acts further comprise assigning respective ranking scores to the matching reference audio samples; [0007] The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples. A plurality of matching reference audio samples that satisfy a sufficient match threshold are identified. The matching reference audio samples are scored);
wherein the score is determined by a trial (i.e. continuously match comparison objects) that compares the probe audio sample to the reference audio sample ([0030] FIG. 3 illustrates a system 300 that operates as a matching system in accordance with various embodiments disclosed herein. For example, the system 300 operates to continuously match comparison objects, such as a sample audio stream with reference objects (e.g., a reference audio), continuously ranks the match results and generates greater confidence for outputting match results by retaining matches until a predetermined score threshold is satisfied. For example, the system 300 includes the components discussed above and further includes a scoring component 302 that compares ranking scores, updates rankings and determines the sufficient match result);
wherein the trial is conducted by a computer that is communicatively coupled to the audio-based identification, recognition, or detection system ([0006] Another example of an embodiment includes a system, comprising a memory that stores computer executable components, and a microprocessor that executes computer executable components stored in the memory. The computer executable components comprise a receiving component that receives a first portion of audio streaming content. A comparing component generates a comparison of the first portion of audio streaming content and a plurality of reference audio samples; [0007] in response to execution, cause a computing system comprising a processor to perform operations. The operations comprise receiving, via the processor, a first portion of a probe audio sample. The operations further comprise comparing the first portion to a plurality of reference audio samples to identify a plurality of matching reference audio samples).
Roblek fails to disclose:
	determining a condition based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample; wherein the condition is unknown prior to the trial; responsive to the condition, selecting candidate data; training a computer-implemented mathematical model using the selected candidate data; determining a value of one or more parameters using the computer-implemented mathematical model; adjusting the score to produce a calibrated score by mathematically applying the value of the one or more parameters to the score; outputting the calibrated score to the audio-based identification, recognition, or detection system.
However, Yu discloses:
determining a condition (i.e. dynamic usage scenario such as a noisy situation/condition) based on a combination of the characterization data of the probe audio sample and the characterization data of the reference audio sample ([0018] The calibration model 106 is one that is trained for the usage scenario, which may be specific to the application and/or possibly dynamically substituted based upon current conditions that correspond to the usage scenario); 
wherein the condition is unknown prior to the trial (i.e. current usage scenario is construed as ‘unknown condition’) ([0018] For example, in a noisy situation, a calibration model trained under noisy conditions may be used in place of a normal noise-level calibration model. Other variable conditions may include grammar (e.g., different context free grammar or n-gram for different dialog turn), different speakers (e.g., dialect or accent, and/or a low versus high voice)); 
responsive (i.e. training the calibration model in current/dynamic usage scenario) to the condition, selecting candidate data (i.e. selecting a usage scenario such as location of the telephone call or accent/dialect) ([0018] For example, if a telephone call is received from one location versus another (e.g., as detected via caller ID), a calibration model trained for that location's accent/dialect may be dynamically selected for use. Alternatively, the accent/dialect may be otherwise detected and used to select an appropriate calibration model.); 
training a computer-implemented mathematical model (i.e. calibration model 104 comprises mathematical model such as naïve Bayes) using the selected candidate data ([0019] Note that the calibration model 106 is trained for that application 110 and/or usage scenario based upon transcribed calibration data typically collected under real usage scenarios for the application; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression); 
determining a value (i.e. confidence score 104) of one or more parameters (i.e. model parameters) using the computer-implemented mathematical model ([0017] a speech recognition engine 102 outputs a confidence score 104, which is received by a calibration model 106 using model parameters obtained via training (as described below)); 
adjusting the score to produce a calibrated score (i.e. adjusted confidence score 108) by mathematically (i.e. algorithm such as naïve Bayes, neural network, and/or logistic regression) applying the value of the one or more parameters to the score ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received; [0019] As a result, the adjusted confidence score 108 is more accurate than the original confidence score 104; [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression);
outputting (i.e. providing of adjusted confidence score to application 110) the calibrated score to the audio-based identification, recognition, or detection system ([0017] In general, the calibration model 106 adjusts the confidence score 104 to an adjusted confidence score 108, which is then provided to an application 110, such as one that makes a decision based upon the adjusted confidence score 108 received).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the Roblek reference and include a calibration model into a speech recognition system to train the model and adjust the confidence score, as disclosed by Yu (See Yu: Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios). 
	The motivation to include the calibration model is to improve the quality of speech recognition engine’s confidence score. 


Claims 9, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Roblek et al., (US20140185815A1) in view of Yu et al., (US20110144986A1) and further in view of Huo et al., (US20150199960A1).
Regarding claim 9, the combination of Roblek and Yu fails to disclose:
	The method of claim 1, wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear 
However, Huo discloses:
	wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation ([0003] Described herein are techniques for using clustering training data in speech recognition. An i-vector may be extracted from a training speech segment of a training data (e.g., a training corpus). The extracted i-vectors of the training data may then be clustered into multiple clusters to identify multiple acoustic conditions. The multiple clusters may be used to train acoustic models associated with the multiple acoustic conditions. The trained acoustic models may be used in speech recognition; [0005] In some aspects, an i-vector may be extracted from an unknown speech segment. One or more clusters may be selected based on similarities between the i-vector and the one or more clusters. One or more acoustic models corresponding to the one or more clusters may then be determined. The unknown speech segment may be recognized using the one or more determined acoustic models).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the references of Roblek and Yu and train speech recognition data using i-vector based clustering, as disclosed by Huo (See Huo: Abstract: Methods and systems for i-vector based clustering training data in speech recognition are described. An i-vector may be extracted from a speech segment of a speech training data to represent acoustic information). 
	The motivation to train speech recognition data using i-vector based clustering is to identify multiple acoustic conditions and used in speech recognition.
Regarding claim 18, the combination of Roblek and Yu discloses:
The system of claim 12, wherein the model is a learned model of a type that is selected based on the audio-based identification, recognition, or detection system, wherein the type is a neural network-based model or a probabilistic linear discriminant analysis (PLDA)-based model or a linear logistic regression-based model (Yu: [0020] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression). 
The combination of Roblek and Yu fails to disclose:
wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation, and wherein the parameters are determined based on a speaker recognition task or a speech activity detection system or a language recognition task or a keyword spotting task or a gender recognition task or a channel recognition task or a 
However, Huo discloses:
	wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation, and wherein the parameters are determined based on a speaker recognition task or a speech activity detection system or a language recognition task or a keyword spotting task or a gender recognition task or a channel recognition task or a speaking style recognition task or an active voice-based biometrics task or a passive voice-based biometrics task or a speech transcription task or a speaker segmentation task ([0003] An i-vector may be extracted from a training speech segment of a training data (e.g., a training corpus). The extracted i-vectors of the training data may then be clustered into multiple clusters to identify multiple acoustic conditions. The multiple clusters may be used to train acoustic models associated with the multiple acoustic conditions. The trained acoustic models may be used in speech recognition).
It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the references of Roblek and Yu and train speech recognition data using i-vector based clustering, as disclosed by Huo (See Huo: Abstract: Methods and systems for i-vector based clustering training data in speech recognition 
	The motivation to train speech recognition data using i-vector based clustering is to identify multiple acoustic conditions and used in speech recognition.
Regarding claim 20, the combination of Roblek and Yu discloses:
	The computer program product of claim 19, wherein the characterization data of the probe audio sample and the characterization data of the reference audio sample, respectively, are indicative of a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the probe audio sample and a behavioral characteristic or an acoustic environmental characteristic or a transmission artifacts characteristic of the reference audio sample, respectively, and wherein the behavioral characteristic or acoustic environmental characteristic or transmission artifacts characteristic of the probe audio sample and the behavioral characteristic or acoustic environmental characteristic or transmission artifacts characteristic of the reference audio sample, respectively, comprise (i) channel data or (ii) noise data or (iii) reverberation data or (iv) language data or (v) speaker gender data or (vi) sample length data or (vii) at least two of (i), (ii), (iii), (iv), (v), (vi), and wherein the instructions, when executed by the one or more processors, cause generating the parameters by incorporating regularization weight data into a linear logistic regression (LLR)-based algorithmic process performed on the score to generate the calibrated score, and wherein the model is a learned model of a type that is selected based on the audio-based identification, recognition, or detection system, wherein the type is a neural network-based model or a probabilistic linear Yu: [0021] The calibration model 106 and its associated learned model parameters 107 may comprise any suitable classifier. While a maximum entropy classifier is used herein in the various examples, other types of classifiers may perform such calibration, including those based upon naïve Bayes, neural network, and/or logistic regression). 
The combination of Roblek and Yu fails to disclose:
wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation, and wherein the instructions, when executed by the one or more processors, cause determination of the parameters based on a speaker recognition task or a speech activity detection system or a language recognition task or a keyword spotting task or a gender recognition task or a channel recognition task or a speaking style recognition task or an active voice-based biometrics task or a passive voice-based biometrics task or a speech transcription task or a speaker segmentation task.
However, Huo discloses:
	wherein the probe audio sample is compared to the reference audio sample by determining a similarity metric using a condition probabilistic linear discriminant analysis (CPLDA)-based computation or an i vector (IV)-based computation or a universal audio characterization (UAC)-based computation, and wherein the instructions, when executed by the one or more processors, cause determination of the parameters based on a speaker [0003] An i-vector may be extracted from a training speech segment of a training data (e.g., a training corpus). The extracted i-vectors of the training data may then be clustered into multiple clusters to identify multiple acoustic conditions. The multiple clusters may be used to train acoustic models associated with the multiple acoustic conditions. The trained acoustic models may be used in speech recognition).
	It would have been obvious to one of the ordinary person skilled in the art before the effective filing date of the claimed invention to modify the references of Roblek and Yu and train speech recognition data using i-vector based clustering, as disclosed by Huo (See Huo: Abstract: Methods and systems for i-vector based clustering training data in speech recognition are described. An i-vector may be extracted from a speech segment of a speech training data to represent acoustic information). 
	The motivation to train speech recognition data using i-vector based clustering is to identify multiple acoustic conditions and used in speech recognition.



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYED M AHSAN whose telephone number is (571)272-5018.  The examiner can normally be reached on 8:30 AM - 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jeffery L. Nickerson can be reached on 469-295-9235.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.




/S.M.A./Patent Examiner, Art Unit 2432                                                                                                                                                                                                        
/SYED A ZAIDI/Primary Examiner, Art Unit 2432