Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
DETAILED ACTION
Claims 29-48 are pending and were filed by a preliminary amendment on 8/20/2020.  Claims 29, 40, and 48 are independent and have been amended.  The Specification has also been amended for typographical errors regarding component numbers of the drawing components.
This Application is published as 20200357413.
The priority date of the chain of continuations is July 2, 2008.
This application is a continuation of U.S. patent applications Nos. 16/885,116, 15/171,374, 14/064,755, 13/750,807, 12/166,822, which have been issued, respectively as U.S. 10,699,714, U.S. 10,049,672, U.S. 9,373,329, U.S. 8,571,860, and U.S. 8,364,481.  Terminal disclaimers over all 5 parents in the chain is required.  

Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
This action is Final.
Response to Amendments and Arguments
Claim 29 is amended as follows (and the other independent Claims similarly):
29. A computer-implemented method comprising: 

processing, by the computer, the obtained recognition results based on the confidence values for the speech recognition task; 
generating, by the computer, one or more weighted confidence values for the recognition results based on the recognition results, the confidence values for the speech recognition task, and contextual information related to the recognition results, a value of the one or more weighted confidence values corresponding to a particular recognition result from the recognition results, the contextual information, which is related to the particular recognition result, including an average confidence value of a subset of the recognition results identifying a same candidate representation with the particular recognition result; 
ranking, by the computer, the processed recognition results based on the weighted confidence values for the recognition results; and 
generating, by the computer, a final recognition result for the received audio signal based on the ranked recognition results. 

The limitation at issue is:
generating … weighted confidence values for the recognition results based on the recognition results, the confidence values for the speech recognition task, and … an 
This is taught by Baker (U.S. 6122613) as shown below.  Baker has two recognizers: one offline recognizer which is more accurate and one real-time recognizer which is faster.  Baker combines the results of the two in a weighted average that is shown in the equation in col. 8, lines 29-56 reproduced below:

    PNG
    media_image1.png
    586
    490
    media_image1.png
    Greyscale


Accordingly, Applicant’s arguments are moot in view of the new grounds of rejection that were necessitated by the amendments.
Examiner located the location of support as follows:
[0089] FIGS. 5A-C are diagrams of example recognition results and confidence values generated by SRS's and different method of selecting a final recognition result. Specifically, FIGS. 5A-C show SRSA output 502 from SRSA, SRSB output 504 from SRSB, and SRSC output 506 from SRSC. In this example, the output is generated in response to each SRS attempting to decode an audio signal that represents the word "carry" Because each of the SRS's may be different, the recognition results produced by the SRS's may be different as illustrated by FIGS. 5A-C. …[0096] FIG. 5C shows an example selection algorithm that takes into account weighting factors in a selection of the recognition result. In some implementations, the weights may be based on a frequency of occurrence of the recognition result. For example, a table 550 lists three weights that may be multiplied times the combined confidence scores previously discussed to create new weighted confidence scores. [0097] In this example, a weight of "1" is multiplied times the combined confidence score if the recognition result is generated by a single SRS (e.g., if the result occurs with a frequency of"one"). Consequently, if the recognition result only occurs once, it will not receive any benefit from the weighting. If a recognition result occurs twice, it may be weighted using a factor of 1.02, which slightly favors the recognition result over another recognition result that only occurs once. If a recognition result occurs three times, it may be weighted by a factor 1.04. [0098] In the example of FIG. 5C, the combined confidence value for the recognition result "Cory" would be weighted against a factor of 1.04, which results in a weighted value of 0.6344. The combined confidence value for the recognition result "quarry" would be weighted against a factor of 1.02, which results in a weighted value of 0.6324. In this case, the selection module 113 may select the result "Cory" over the result "quarry" because the weighted combined confidence score of the former is the higher than that of the latter even though the unweighted combined confidence score of "Cory" is less than that of the result "quarry." [0099] Values used to select the final recognition result may be weighted based on several criteria including, but not limited to, the distribution of confidence scores generated by an SRS, characteristics of a SRS that generated the recognition result (e.g., overall accuracy, accuracy in a particular context, accuracy over a defined time period, etc.), and the similarity between the SRS's that produce the same recognition result. [0100] In other implementations, the final recognition result may be weighted using a correlation of recognition confidence values with recognition errors for a recognizer and for the final composite recognizer. For example, during training the system can count a number of times that a particular recognizer comes back with a confidence value of 0.3, and also count how often those "0.3 confidence recognition results" are errors for that recognizer and how often the final combined recognition is also a recognition error. The system may use the same normalization counting when combining similar recognition results. The combined confidence can be estimated from a number of times that the recognizers had the same result (with given confidence values) and that the common result was correct.


    PNG
    media_image2.png
    309
    631
    media_image2.png
    Greyscale

Using two recognizers in parallel or in series and obtaining a combined confidence score for candidate recognition results of two recognizers is taught by Baker.  Figures 5 and 6 and associated written description.
Using several recognizers in parallel and giving a higher weight/confidence to a result that is generated by more of the recognizers is taught by Bennett, Figure 1, and “[0028] The individual-result confidence values may be used in a simple voting mechanism where several recognizers return a particular result. For example, the result may be "The quick brown fox." If 6 of the available recognizers return that particular result, that result will be given a higher confidence value than results that were returned only by one recognizer….”
Using several recognizers in parallel and picking from the n-best list of each and combining the results of the recognizers such that, in a recognized sentence, each word may be from a different recognizer is taught by Gao.  See Figures 6 and 8, for example.
Drawings
The drawings are objected to because:
In Figure 5B the phrase “running average” is inaccurate.  A “running average” implies averaging over time.  Such is not the case in Figure 5B.  The Written Description provides: “[0094] FIG. 5B shows an example selection algorithm that selects a recognition result based on which result has a highest combined confidence value. For example, more than one SRS may generate the same recognition result, but may assign a different confidence value to the result. In some implementations, multiple confidence scores for the same result can be averaged (or otherwise combined) to create a combined confidence score. …”  Published Application.  
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: 
The phrase “candidate representation” that is used in the independent Claims has no antecedent basis in the Specification.

(Please search the Specification for phrases you wish to include in the Claims.  “[0090] In some implementations, the SRS output includes a top N recognition results (where N can represent any positive integer or 0) that are selected based on which recognition results are associated with the greatest confidence values….”  Further, as a result of introducing “candidate representations” into the Claims, the independent Claims have become unduly broad.  The Disclosure (see Figures 5A,5B, 5C) provides a confidence value for each of the N-best candidates whereas the Claim refers to confidence values for recognition results and later says that “generating, by the computer, one or more weighted confidence values for the recognition results based on the recognition results …” which is not limiting.  Applicant may argue that “top N recognition results” of the Specification is paraphrased as “candidate representation” and proceed to change occurrences of “recognition result” to “candidate representation” where it should be so.)
Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claims 29-32, 36-43, and 47-48 are rejected under 35 U.S.C. 103(a) as being unpatentable over Gao (U.S. 2008/0077386) in view of Baker (U.S. 6,122,613).

    PNG
    media_image3.png
    410
    489
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    674
    495
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    696
    518
    media_image5.png
    Greyscale

Regarding Claim 29, Gao teaches:
29. A computer-implemented method comprising: 
obtaining, by a computer and from a speech recognition system, recognition results and confidence values for a speech recognition task initiated for a received audio signal, wherein the recognition results identify a plurality of candidate representations of the received audio signal and the confidence values identify a corresponding plurality of probabilities that the recognition results are correct; [Gao, Figures, 1, 2, 8.  Input 110 is provided (which may be speech 210 or text) to a set of N modules which may be Automatic Speech Recognition (ASR) modules 1 … N.  The generated output from each ASR Module is an N-best list (130-X) of recognition candidates.  [0026]-[0027].  Each of the N-best results is associated with a probability/confidence value indicating the confidence/likelihood/probability of their accuracy.  See [0005] and [0029].  This limitation asks merely for “a speech recognition system” and is taught by a single ASR module 1 of Figures 1 or 2.]
processing, by the computer, the obtained recognition results based on the confidence values for the speech recognition task; [Gao, Figures, 1, 2, 8.  Any of the steps 140 of Figure 1, 230 of Figure 2, or 830 of Figure 8 teaches this limitation which doesn’t say what it does by its “processing” other than that it is “based on the confidence values.”  E.g., Figure 8:  “Unify, sort and display M-best ASR results 230” is “based on confidence values.”]
generating, by the computer, one or more weighted confidence values for the recognition results based on the recognition results, the confidence values for the speech recognition task, and contextual information related to the recognition results, a value of the one or more weighted confidence values corresponding to a particular recognition result from the recognition results, the contextual information, which is related to the particular recognition result, including an average confidence value of a subset of the recognition results identifying a same candidate representation with the particular recognition result; [Gao teaches that the confidence/probability score may be calculated using various factors including that some modules may be considered “preferred” and their results assigned a higher score.  This teaches weighting according to context which is not defined and is taught by the identity of the particular ASR module.  See [0029]-[0031].  “[0038] In step 240, the user may select a candidate in the M-best list as the primary recognition output. The selection frequency for each ASR module may be used to further determine the order of the M-best ASR lists via feedback 245. For example, the next output of the ASR module which generated the primary recognition output may be ordered at the top of the list regardless of posterior probability.”]
ranking, by the computer, the processed recognition results based on the weighted confidence values for the recognition results; and [Gao teaches that it re-ranks the N-Best results according to the secondary factors/weights into an M-Best with M<N at 230 in Figure 2.  “[0033] … Such an apparatus may generate N-best lists based on multiple speech recognizers running in parallel and re-rank N-best lists according to user feedbacks….”  The re-ranking is ranking according to the weighting of the results.]
generating, by the computer, a final recognition result for the received audio signal based on the ranked recognition results. [Gao, Figures 3-7, the top line shows the “primary result” / “final recognition result” and the lines below show the alternative results.]
While Gao teaches the use of identity of the recognizer module as the “context” which affects the weight associated with a recognition result, it does not teach obtaining a weighted average of the confidence values of different recognizers to obtain the confidence value for a particular result (e.g. one recognized word).
Baker teaches:
generating, by the computer, one or more weighted confidence values for the recognition results based on the recognition results, the confidence values for the speech recognition task, and contextual information related to the recognition results, a value of the one or more weighted confidence values corresponding to a particular recognition result from the recognition results, the contextual information, which is related to the particular recognition result, including an average confidence value of a subset of the recognition results identifying a same candidate representation with the particular recognition result; [Baker, Figures 1 or 5 teach conducting speech recognition on at least two recognizers (this Claim requires only one) which may be operating in series or parallel and each generates a confidence score associated with the recognition result.   “A speech sample is recognized with a computer system by processing the speech sample with at least two speech recognizers, each of which has a different performance characteristic.. … The performance characteristics of the recognizers may be based on style or subject matter, and the recognizers may operate serially or in parallel. Sets of candidates produced by the two recognizers may be combined by merging the scores to generate a combined set of candidates that corresponds to the union of the two sets. …”  Abstract.  The confidence score associated with each candidate is the probability that the particular candidate was correctly recognized:  “As shown in FIG. 5, the offline recognizor 503 and the real-time recognizor 505 generate separate sets of likely candidates--i.e., phrases, words, phonemes or other speech units that likely match a corresponding portion of the input speech--and associated scores for each of the candidates. Scores typically are maintained as negative logarithmic values for ease of processing. Accordingly, a lower score indicates a better match (a higher probability) while a higher score indicates a less likely match (a lower probability), with the likelihood of the match decreasing as the score increases.”  Col. 8, lines 19-29.  Baker does not mention generation of an N-Best list and thus dos not teach “wherein the recognition results identify a plurality of candidate representations of the received audio signal” from a single recognizer.  For a candidate recognition word “w,” each recognizer has its own confidence value S(w) such that Baker has So(w) for the offline recognizer and Sr(w) for the real-time recognizer.  Baker calculates a combined confidence score Sc(w) for the recognition candidate w which is a weighted combination of the two confidence scores.  See, e.g., Col. 8, lines 45-55.  See also Figures 5 and 6A,6B,6C provided below.  Accordingly, Baker teaches the limitation.]

    PNG
    media_image6.png
    623
    1344
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    652
    1374
    media_image7.png
    Greyscale

Gao and Baker pertain to the use of multiple speech recognizers that are being applied to the same input speech and to processing and determination of the best result and the associated confidence value and it would have been obvious to combine the express teaching of weighting the confidence value associated with a result according to the identity/type of the recognizer from Baker with the system of Gao that teaches that the output results from the preferred recognizers are assigned a higher weight for the teaching that the confidence value of a word is a weighted average of the confidence values associated with that word by each of the multiple recognizers in play.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 30, Gao teaches that the frequency with which each ASR module is selected is used as a factor/weight for ordering the M-Best from among the N-Best outputs.  [00038].
Baker teaches and therefore suggests:
30. The computer-implemented method of claim 29, wherein the contextual information includes a frequency with which the particular recognition result occurs in the recognition results over a period of time. Baker, col, 10, lines 54-57 where the credibility of a speech recognizer and thus the weight Lambda assigned to its results increases as the recognizer generates more accurate results over time.  “See also “Combining the recognition results from two recognizors based on weighting factors allows the speech recognition system to accord greater weight to the recognition result from the recognizor known to be more accurate.”  Baker Col. 3, lines 38-42. See also Ikeda in the Conclusion, Figure 2, where the score of a recognizer depends of “number of times of use” which means the number of times the results of this recognizer were considered correct and were used.]	Rationale for combination as provided for Claim 29.  
 
Regarding Claim 31, Gao teaches:
31. The computer-implemented method of claim 30, further comprising obtaining, as the contextual information related to the recognition results, recognition results and confidence values for the speech recognition task from at least one other speech recognition system. [Gao, Figures, 1, 2, 8.  Input 110 is provided (which may be speech 210 or text) to a set of N modules which may be Automatic Speech Recognition (ASR) modules 1 … N.  The outputs from the other ASR modules is considered as a factor/weight in determining the rank of the final results.  This teaches the “at least one other speech recognition system” of the Claim.]

Regarding Claim 32, Gao teaches that it calculates the difference between the ASR results (Figure 8, 830) but does not teach combining confidence values when two results are the same.
Regarding Claim 32, Baker teaches:
32. The computer-implemented method of claim 31, wherein a particular weighted confidence value from the one or more weighted confidence values corresponds to a particular recognition result from the recognition results and comprises 
a combination of two or more of the confidence values that correspond to the particular recognition result for each of the speech recognition system and the at least one other speech recognition system. [Baker, Figure 3, “combiner 311”, Figure 5, “combined candidates and scores 513” combines the speech recognition results from the two recognizers 503, 505 and also generates a combined confidence value.  See claims 14-17.  Figure 7, “combine scores 705.”  “In one embodiment, the first speech recognizor identifies a first set of candidates that likely match the speech sample and calculates a corresponding first set of scores. Similarly, the second speech recognizor identifies a second set of candidates that likely match the speech sample and calculates a corresponding second set of scores. The scores calculated by the first and second recognizors are based on a likelihood of matching the speech sample.”  Col. 2, lines 48-56.  “The first and second sets of candidates are combined, for example, by taking their union to generate a combined set of candidates. The first and second sets of scores are merged to generate a combined set of scores, for example, by calculating a weighted average for each corresponding pair of scores. The combined sets of candidates are presented to a transcriptionist in an order of priority determined by the candidates' respective combined scores….”  Col. 2, lines 57 to Col. 3, line 2.  The equation for obtaining a weighted combination of the confidence scores is shown at Col. 8, line 50.  Figures 6A, 6B, and 6C show the combination process.]

    PNG
    media_image8.png
    479
    752
    media_image8.png
    Greyscale


Gao and Baker pertain to the use of multiple speech recognizers (including in parallel) to obtain optimal results and it would have been obvious to combine the feature of Baker which yields a combined confidence value with the system of Gao that discloses having confidence values associated with each of the results to obtain a confidence value associated with a selected result that reflects the role of multiple recognizers.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
	
Regarding Claim 36, Gao does not teach combining confidence scores.
Baker teaches:
36. The computer-implemented method of claim 32, wherein the combination of the two or more of the confidence values are weighted further based on one or more characteristics of the speech recognition system and the at least one other speech recognition system that generated the particular recognition result. [Baker teaches that two or more recognizers may be used and the recognizers that are used may be optimized for a particular property/characteristics:  “Although the speech recognition system of FIG. 3 uses two recognizors, other embodiments could employ three or more recognizors, each optimized for a different property. Referring again to FIG. 11, the recognizors (whether two or more in number) may be optimized for properties other than real-time responsiveness and high accuracy, for example, for different writing styles 1115 or for different subject matters 1120. Depending on the goals of the system designer, the optimized properties may, but need not be, complementary to one another.”  Col. 7, lines 17-26.  See Figure 11.]
Rationale as provided for Claim 29 because Baker was cited for a combined confidence number.

Regarding Claim 37, Gao does not teach combining confidence scores.
Baker teaches:
37. The computer-implemented method of claim 36, wherein the one or more characteristics include one or more characteristics selected from a group consisting of one or more overall levels of accuracy for a respective speech recognition system, one or more contextual levels of accuracy within a context for the audio signal for the respective speech recognition system, and one or more temporal levels of accuracy for one or more periods of time for the respective speech recognition system.  [Baker, Figure 3, the “offline recognizer 309” is considered to be more accurate than the “real time recognizer 303” and its result given a higher weight in the combination Baker also teaches, Figure 11, that it may use speech recognizers that are tailored to a particular task / optimized for a characteristic such as one for each area of law or medicine.  Col. 7, lines 26-42.  Col. 8, lines 45-56 teach the equation used for combining the confidence scores of the different recognizers with a weight Lambda.  Lambda depends on the confidence of the combiner in each of the recognizers:  “Once the two sets of recognition results have been time aligned, a combined score for each candidate is calculated using the equation set forth above (step 705). The particular value of .lambda. used by the combiner in calculating the combined scores depends on confidence levels that the combiner maintains for each of the recognizors. These confidence levels may be adapted over time as the combiner learns the types of speech that are better recognized by one recognizor or the other….”  Col. 10, lines 49-59.]
Rationale as provided for Claim 29 because Baker was cited for a combined confidence number.

Regarding Claim 38, Gao does not teach combining confidence scores.
Baker teaches:
38. The computer-implemented method of claim 32, wherein the combination of the two or more of the confidence values are weighted further based on a level of similarity between respective speech recognition systems that generated the particular recognition result. [Baker teaches, Figure 11, that it may use speech recognizers that are tailored to a particular task / optimized for a characteristic such as one for each area of law or medicine.  Col. 7, lines 26-42.  Col. 8, lines 45-56 teach the equation used for combining the confidence scores of the different recognizers with a weight Lambda.  Lambda depends on the confidence of the combiner in each of the recognizers:  “Once the two sets of recognition results have been time aligned, a combined score for each candidate is calculated using the equation set forth above (step 705). The particular value of .lambda. used by the combiner in calculating the combined scores depends on confidence levels that the combiner maintains for each of the recognizors. These confidence levels may be adapted over time as the combiner learns the types of speech that are better recognized by one recognizor or the other….”  Col. 10, lines 49-59.]
Rationale as provided for Claim 29 because Baker was cited for a combined confidence number.

Regarding Claim 39, Gao does not teach combining confidence scores.  (This Claim appears to correspond to Figure 9 where the error rate of a SRS results in the SRS being given a lower or higher weight. [0118]-[0120].)
Baker teaches and suggests:
39. The computer-implemented method of claim 32, wherein the combination of the two or more of the confidence values are weighted further based on a rate with which respective speech recognition systems that generated the particular recognition result have correctly identified results for audio signals [Baker.  This portion just means that a recognizer that is more accurate and therefore generates correct results at a higher rate is given a higher weight.  “The particular value of .lambda. used by the combiner in calculating the combined scores depends on confidence levels that the combiner maintains for each of the recognizors. These confidence levels may be adapted over time as the combiner learns the types of speech that are better recognized by one recognizor or the other.”  Col. 10, lines 51-57.  Baker does not say that the more accurate/higher confidence recognizer is determined according to “a rate with which respective speech recognition systems … have correctly identified results” but this is strongly suggested by the teachings of Baker that the confidence in a recognizer (and therefore the weight of the recognizer) is adapted as the recognizer becomes better at recognizing a type of speech.  See the Conclusion as well.]
when the respective speech recognition systems have identified a same recognition result [Baker, Col. 8, lines 50 equation showing that the confidence values for the same word w are being combined.] and with confidence values that are within a threshold value of the two or more confidence values. [Baker, Figure 7 step 707 as expanded in Figure 13.  Step 1310 compares the lowest and highest confidence scores of a recognition result and if the difference is NOT less than a threshold value (N) then the correctness of the recognition result is uncertain even if each of the recognizers is certain of its result.  This teaches that for a result to be considered credible the confidence values of the two recognizers must be close (within a threshold vaue).  “The combiner uses the combined scores to identify instances of uncertainty between the two recognizors about the correct recognition of a speech unit (step 707). Referring also to FIG. 13, the correct recognition of a speech unit is uncertain 1300 if (a) the real-time recognizor is unsure of its results 1305, (b) the offline recognizor is unsure of its results 1305, or (c) the two recognizors disagree 1310 (even if both are certain of their respective results)….”  Col. 11, lines 12-19.]
Rationale as provided for Claim 39 because Baker was cited for a combined confidence number.

Claim 40 is a device claim with a limitation similar to the limitation of Claim 29.  Additionally:
40. A device, comprising: 
processing circuitry configured to [Gao, Figure 9, “processor 910.”]
…

Claim 41 is a device claim with a limitation similar to the limitation of Claim 30.
Claim 42 is a device claim with a limitation similar to the limitation of Claim 31.
Claim 43 is a device claim with a limitation similar to the limitation of Claim 32.
Claim 46 is a device claim with a limitation similar to the limitation of Claim 36.
Claim 47 is a device claim with a limitation similar to the limitation of Claim 38.


Claim 48 is a CRM device claim with a limitation similar to the limitation of Claim 29.  Addtionally:
48. A computer program product embodied in a computer readable storage device storing instructions that, when executed, cause a computer to perform operations comprising: [Gao, Figure 9, “memory 920.”]
…
Claims 33-34 and 44-45 are rejected under 35 U.S.C. 103(a) as being unpatentable over Gao and Baker in view of Bennett (U.S. 2002/0194000).
Regarding Claim 33, Gao teaches that the frequency with which each ASR module is selected is used as a factor/weight for ordering the M-Best from among the N-Best outputs.  [00038].
Baker teaches that using a larger number of recognizers increases the accuracy:  “A multiple-recognizor speech recognition system offers several advantages over a single recognizor system. First, an increased number of recognizors tends to increase the number of resulting recognition candidates for a given speech sample. This larger assortment of candidates, which is more likely to contain the correct choice, provides more information to a human transcriptionist or system user. In addition, a multiple recognizor system has an increased capability to identify instances of recognition uncertainty. The likelihood that a recognition result is incorrect is greater if the recognizors disagree about the recognition of a given utterance, or if either or both of the recognizors are uncertain of the accuracy of their respective recognition results. These instances of uncertainty may be highlighted for the transcriptionist or system user.”  Col. 3, lines 46-61.  Thus, Baker hints/suggests that when several recognizers are used and a lot of them agree on a result, the result has a higher credibility.  But because Baker discusses a two-recognizer system running once, the frequency of occurrence is not discussed.
Bennett teaches:
33. The computer-implemented method of claim 32, further comprising
weighting the combination of the two or more of the confidence values based on a frequency with which the particular recognition result occurs in the recognition results for the speech recognition system and the at least one other speech recognition system. [Bennett teaches a voting/polling among the results of parallel recognizers which teaches this limitation:  “[0028] The individual-result confidence values may be used in a simple voting mechanism where several recognizers return a particular result. For example, the result may be "The quick brown fox." If 6 of the available recognizers return that particular result, that result will be given a higher confidence value than results that were returned only by one recognizer….”]
Gao/Baker and Bennett pertain to the use of multiple speech recognizers that are being applied to the same input speech to obtain the best result and both assign weights to the results of each of the recognizers and it would have been obvious to combine the weighting according to voting/polling of Bennett with the system of Gao/Baker as an additional or substitute method of weighting the results.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 34, Gao does not teach combining confidence scores.  (This Claim might pertain to the following part of the Specification of the instant Application:  “[0140] A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, a combined, or joint, confidence score of multiple SRS's may include features such as consistency of hypotheses, or guesses as to an utterance's identity. For example, three SRS's outputting a first result with a confidence of 0.8 may be more reliable than one SRS outputting a second result with a confidence of 0.9.”)
Baker teaches:
34. The computer-implemented method of claim 32, wherein the combination of the two or more of the confidence values are weighted by a predetermined weighting factor that is selected, based on the frequency with which the particular recognition result occurs in the recognition results, from among a plurality of predetermined weighting factors. [Baker “Combining the recognition results from two recognizors based on weighting factors allows the speech recognition system to accord greater weight to the recognition result from the recognizor known to be more accurate.”  Col. 3, lines 38-42.  ]	Rationale for combination as provided for Claim 29.
Baker does not teach the frequency polling.
Bennett teaches the voting/polling/frequency feature:
34. The computer-implemented method of claim 32, wherein the combination of the two or more of the confidence values are weighted by a predetermined weighting factor that is selected, based on the frequency with which the particular recognition result occurs in the recognition results, from among a plurality of predetermined weighting factors. [Bennett teaches a voting/polling among the results of parallel recognizers which teaches that the number of recognizers generating a particular conforming result determines the weight given to the result and overrides the confidence associated with a particular result:  “[0028] The individual-result confidence values may be used in a simple voting mechanism where several recognizers return a particular result. … If 6 of the available recognizers return that particular result, that result will be given a higher confidence value than results that were returned only by one recognizer…Note that individual-result confidence values are not necessary to implement a voting mechanism, nor are they required for implementation of these feedback and performance tracking mechanisms.”]
Rationale as provided for Claim 33.

Claim 44 is a device claim with a limitation similar to the limitation of Claim 33.
Claim 45 is a device claim with a limitation similar to the limitation of Claim 34.

Claim 35 is rejected under 35 U.S.C. 103(a) as being unpatentable over Gao and Baker and further in view of Ueda (U.S. 2002/0055845).
Regarding Claim 35, Gao and Bennett do not teach combining confidence scores.  Baker does not teach weighting according to the distribution of confidence values.  (Support in Figure 6 of the instant Application.  An outlier is given a higher weight.)  Note that the Claim is very broadly stated with no definition for “distribution” and no particularity as to how the “distribution of confidence values” is used to weight the results.  
Ueda teaches:
35. The computer-implemented method of claim 32, wherein the combination of the two or more of the confidence values are weighted further based on a distribution of the confidence values for one or more of the speech recognition system and the at least one other speech recognition system. [Ueda in Figure 4, teaches that when the recognition result by the internal recognizer has a low confidence (No out of S403) then further results are obtained from external recognizers (S405) and if two or more of the results are all above the threshold (YES out of S406) then none is given effect and the user is asked to choose (S407).  Whereas if only one has a high value above threshold (No out of S406) that single high value result is given effect.  This Claim does not need all of this information.  But, Ueda teaches the concept shown in the supporting Specification that if the results are bunched together use the outlier and don’t use the bunched together results.  See paragraphs [0046], [0051] and [0054].  This is effectively giving a weight of 1 to the outlier result that is also higher than threshold.  Ueda does not teach combining confidence values.]
Gao, Baker, and Ueda pertain to the use of multiple speech recognizers for the purpose of obtaining a result with a better confidence value and it would have been obvious to modify the system of combination which combines the confidence values from multiple recognizers with the concept taught in Ueda that a single high confidence outlier result out to be given priority (higher weight) as applying a known technique to a known device (method, or product) ready for improvement to yield predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ljolje (U.S. 20110137648) for combining of confidence values.
Ikeda (U.S. 2005/0010422)
Regarding the confidence associated with a speech recognizer depending on the rate at which it has been correct (Claim 39) see Figure 15, “increase score of SR Server” for A (S153), B (S1514) or S1507.  Depending on the result of which server result is accepted the score of that server is increased.  “[0097] … Therefore, the client increases the score as shown in FIG. 2 of the server by using the recognition result (step S1513 or step S1514).”  Figure 2 shows for each server the number of times of access and the number of times of use or wrong processing and then a score associated with the SR Server which is the weight assigned to the result of that server.
See also Figure 13, S1311 and S1312.  “[0085] Then, the client determines whether a response is received from the SR server A (step S1309). If the response is received (Yes), the client analyzes the contents of the response, and determines whether the transmitted request is normally accepted (step S1310). If the transmitted request is normally accepted (Yes), the client extracts a recognition result from the response by parsing tags representing the recognition result (step S1311). Additionally, the client increases the score as shown in FIG. 2 of the SR server A (step S1312).”  
“[0133] As described earlier, the scores of speech recognition servers are stored in a storage unit 104 of a client 102 as indicated by 201 in FIG. 2. For example, the score is increased when the client uses a result returned from the server, and decreased when the result is wrong (when wrong recognition is performed). The server scores are held by using this reference. Whether a result is wrong can be determined in accordance with, for example, whether the user has tried speech recognition again.”
Ikeda teaches a “degree of localization” which is like considering the “distribution” in Claim 35:  “[0124] In the examples shown in FIG. 21, "Kobe" (confidence=60) and "Tokyo" (confidence=40) are obtained as recognition results from the SR server A, and "Tokyo" (confidence=90) and "Yokohama" (confidence=10) are obtained as recognition results from the SR server B. Assuming that the degree of confidence is "the highest confidence/the sum of confidences", the degree of localization of the highest confidence of the SR server A is 0.6, and the degree of localization of the highest confidence of the SR server B is 0.9. That is, the localization degree of the confidence of the SR server B is higher, so the recognition result is "Tokyo".”  Figure 20, “S2016:  Processing using Confidence.”

For a running average of confidence scores see Dhanakshirur (U.S. 20080077402):
0039] In addition to storing historical information regarding speech events and RDC instantiation events, the listeners 130 and 135 can perform various statistical processing techniques and/or apply one or more predetermined rules to the data. In one embodiment, for example, the attribute listener 135 can keep a running average of the values of particular events returned by the SRE. Such processing techniques can be applied on an RDC by RDC basis and further on a field by field basis for selected RDCs.
[0041] More particularly, as speech is processed to fill different fields of the address RDC, the attribute listener 135 can detect the speech events generated for the fields. For example, the attribute listener can be configured to monitor for all confidence score related speech events for the address RDC for one or more fields of the address RDC, i.e., the zip code field. In that case, the attribute listener 135 can keep a running average of the last "N" confidence score values for the zip code field of the address RDC. This information can be used to set a tunable parameter of the address RDC. Thus, for example, the value of the tunable parameter relating to the minimally acceptable confidence score for a recognition result for the zip code field of the address RDC can be set to the result obtained from statistically processing this data.
[0046] In this regard, the model 140 can include a listing of the RDCs that have been instantiated and a counter for each RDC indicating the number of times that RDC has been instantiated. As noted, the session listener can detect instantiations of RDCs and maintain the count information within the model 140. Suggested values for the various tunable parameters of each field of the RDCs also can be maintained within the model 140. The attribute listener can detect particular speech events for the different fields of an RDC and process that information, storing the result within the model 140. For example, the attribute listener can maintain the last "N" values for a particular type of speech event detected for field 1 of RDC 1. As each new value is detected, the attribute listener can re-compute a running average of the last "N" values and store the result within the model 140, i.e., as P1.

Suggestion:
Examiner presents this suggestion merely to demonstrate language that more closely tracks the Disclosure of Figure 5C and is not making any representations as to its allowability.   (Claims 29-32 together allude to the main idea of Figure 5C.)
A computer-implemented method comprising: 
receive input audio signal at three or more speech recognition systems (SRSs);
generate, from each of the speech recognition systems and corresponding to the input audio signal, top N best recognition results and a confidence score associated with each of the recognition results, where N is 0 or any positive integer;
obtain an average confidence value for each recognition result by obtaining a sum of all confidence scores, provided by those of the speech recognition systems that generated the recognition result as associated with the recognition result, and dividing the sum by a total number of those of the speech recognition systems that generated the recognition result, wherein all or fewer than all of the speech recognition systems may generate a same recognition result;
track frequency of occurrence of each recognition result among the speech recognition systems, the frequency of occurrence of a particular recognition result being equal to a number of speech recognition systems, from among the three or more speech recognition systems, that generated the particular speech recognition result;
assign a weight to each frequency of occurrence, a higher frequency of occurrence being assigned a higher weight;
weight each average confidence value for each recognition result by the weight assigned to the frequency of occurrence of the recognition result to obtain a weighted average recognition result; and
generate a final recognition result for the received audio signal as the recognition result having the highest weighted average recognition result,
where each of the three or more speech recognition systems is an automated speech recognizer operating on a computer.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499.  The examiner can normally be reached on Monday through Thursday 9am to 4pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/FARIBA SIRJANI/
Primary Examiner, Art Unit 2659