Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
DETAILED ACTION
Claims 29-48 are pending.  Claims 29, 40, and 48 are independent.  All of the Claims have been amended.  The Examiner’s Amendment Cancels Claims 30, 32-34, 41, and 43-45 and amends Claims 29, 31, 35-40, 42, and 46-48.
This Application is published as 20200357413.
The priority date of the chain of continuations is July 2, 2008.

This application is a continuation of U.S. patent applications Nos. 16/885,116, 15/171,374, 14/064,755, 13/750,807, 12/166,822, which have been issued, respectively as U.S. 10,699,714, U.S. 10,049,672, U.S. 9,373,329, U.S. 8,571,860, and U.S. 8,364,481.  Obviousness Double Patenting is not present in view of the amendments. 

Subject to the Examiner’s Amendment below, the pending Claims are allowed.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 6/24/2022 has been entered.

Response to Amendments
Objection to Figure 5B is withdrawn in view of the amendments that remove the phrase “running average.”
The Objection to the Specification for lacking antecedent basis for “candidate representations” is withdrawn in view of the amendments to the Claims that remove the phrase. 
Applicant has made additional amendments to Figures 5B and 5C by showing the products with more significant digits.  (Note error introduced for Result=quarry).  Applicant has amended Figure 7E to say “0.1” instead of “0.01.”  Applicant has made amendments to the Specification that change some of the confidence values.

Examiner’s Amendments
Authorization for this examiner’s amendment was granted in an interview with Mr. Tao Feng on 7/31/2022.
Cancel Claims 30, 32-34, 41, and 43-45.
Amend Claims 29, 31, 35-40, 42, and 46-48 as follows:
29. (Currently Amended) A computer-implemented method comprising: 
receiving an audio signal at a plurality of automatic speech recognition systems (SRSs); 
generating at each of the SRSs, a plurality of speech recognition results and corresponding confidence values for a speech recognition task initiated at each of the SRSs

receiving the plurality of speech recognition results and the corresponding confidence values for the plurality of speech recognition results at a computer;
generating, by the computer, speech recognition results wherein a weighted confidence value for a plurality of same particular speech recognition results generated by a subset of the SRSs is obtained by:
first, calculating an average of the confidence values each corresponding to a respective one of the plurality of same particular speech recognition results to obtain an average confidence value for the plurality of same particular speech recognition results, and 
next, weighting the average confidence value by a predetermined weight assigned to a number of the plurality of same particular speech recognition results to obtain the weighted confidence value for the plurality of same particular speech recognition results, wherein the number of the plurality of same particular speech recognition results provides a frequency of generation of the same particular speech recognition result for the received audio signal;  
ranking, by the computer, the speech recognition resultsaccording to their corresponding 
generating, by the computer, a final speech recognition result for the received audio signal based on the ranked speech recognition results.

30. (Canceled) 

31. (Currently Amended) The computer-implemented method of claim 29, wherein each of the plurality of same particular speech recognition results is obtained from a different one of the subset of the 

32. (Canceled) 
33. (Canceled)
34. (Canceled)

35. (Currently Amended) The computer-implemented method of claim [[32]] 29, wherein [[the]] a combination of further weighted based on a distribution of the confidence values obtained from the subset of the  speech recognition results.
	
36. (Currently Amended) The computer-implemented method of claim [[32]] 29, wherein [[the]] a combination of further weighted based on one or more characteristics of the subset of the speech recognition results. 

37. (Currently Amended) The computer-implemented method of claim 36, wherein the one or more characteristics include one or more characteristics selected from a group consisting of one or more overall levels of accuracy for a respective SRS of the subset of the speech recognition results, one or more contextual levels of accuracy within a context for the audio signal for the respective SRS, and one or more temporal levels of accuracy for one or more periods of time for the respective SRS. 

38. (Currently Amended) The computer-implemented method of claim [[32]] 29, wherein [[the]] a combination of further weighted based on a level of similarity between respective SRSs of the subset of the speech recognition results. 

39. (Currently Amended) The computer-implemented method of claim [[32]] 29, wherein [[the]] a combination of further weighted based on error rates of the subset of the SRSs that generated the plurality of same particular speech recognition results. 

40. (Currently Amended) A device, comprising: 
processing circuitry configured to 
receive an audio signal at a plurality of automatic speech recognition systems (SRSs),
generate at each of the SRSs, a plurality of speech recognition results and corresponding confidence values for a speech recognition task initiated at each of the SRSs,  


generate weighted confidence values for the speech recognition results, wherein a weighted confidence value for a plurality of same particular speech recognition results generated by a subset of the SRSs is obtained by:
first, calculating an average of the confidence values each corresponding to a respective one of the plurality of same particular speech recognition results to obtain an average confidence value for the plurality of same particular speech recognition results, and 
next, weighting the average confidence value by a predetermined weight assigned to a number of the plurality of same particular speech recognition results to obtain the weighted confidence value for the plurality of same particular speech recognition results, wherein the number of the plurality of same particular speech recognition results provides a frequency of generation of the same particular speech recognition result for the received audio signal,  
rank the [[processed]] speech recognition results according to their corresponding weighted confidence values, and  
generate a final speech recognition result for the received audio signal based on the ranked speech recognition results.  

41. (Canceled) 

42. (Currently Amended) The device of claim 40, wherein each of the plurality of same particular speech recognition result is obtained from a different one of the subset of the [[multiple]] SRSs.

43. (Canceled)   
44. (Canceled) 
45. (Canceled) 

46. (Currently Amended) The device of claim [[43]] 40, wherein [[the]] a combination of [[the two or more of]] the confidence values is further weighted based on one or more characteristics of the subset of the [[multiple]] SRSs that generated the plurality of same particular speech recognition results.

47. (Currently Amended) The device of claim [[43]] 40, wherein [[the]] a combination of [[the two or more of]] the confidence values is further weighted based on a level of similarity between respective SRSs of the subset of the [[multiple]] SRSs that generated the plurality of same particular speech recognition results.

48. (Currently Amended) A computer program product embodied in a computer readable storage device storing instructions that, when executed, cause a computer to perform operations comprising: 





receiving an audio signal at a plurality of automatic speech recognition systems (SRSs); 
generating at each of the SRSs, a plurality of speech recognition results and corresponding confidence values for a speech recognition task initiated at each of the SRSs; 
receiving the plurality of speech recognition results and the corresponding confidence values for the plurality of speech recognition results at a computer;
generating, by the computer, weighted confidence values for the speech recognition results, wherein a weighted confidence value for a plurality of same particular speech recognition results generated by a subset of the SRSs is obtained by:
first, calculating an average of the confidence values each corresponding to a respective one of the plurality of same particular speech recognition results to obtain an average confidence value for the plurality of same particular speech recognition results, and 
next, weighting the average confidence value by a predetermined weight assigned to a number of the plurality of same particular speech recognition results to obtain the weighted confidence value for the plurality of same particular speech recognition results, wherein the number of the plurality of same particular speech recognition results provides a frequency of generation of the same particular speech recognition result for the received audio signal;  
ranking, by the computer, the speech recognition results according to their corresponding weighted confidence values; and 
generating, by the computer, a final speech recognition result for the received audio signal based on the ranked speech recognition results.

Support for the above amendments may be found in Figure 5C and the corresponding Written Description.

    PNG
    media_image1.png
    316
    534
    media_image1.png
    Greyscale

Allowable Subject Matter
Subject to the Examiner’s Amendments above, the pending Claims 29, 31, 35-40, 42, and 46-48 are allowed.
The following is an examiner’s statement of reasons for allowance: In view of each of the particular limitations of the independent Claims when considered in the order established by the Claim language and in the context of the language of the independent Claims when each Claim is considered as a whole, the independent Claims of this Application were not found in the prior art that was viewed.
In particular, the feature having two or more speech recognizers operating on the same speech input in parallel and receiving an N-best output from each of the recognizers where (N>1) and obtaining the final recognition result by first obtaining an average confidence values for each of the intermediate recognition results (the N-best outputs) that match between the recognizers and then weighting that average value according to the frequency of occurrence of that particular result, when considered in the context of the independent Claims and including all of the limitations of each was not found in the prior art.  As shown in Figure 5C, if the particular recognition result “Cory” is output with a frequency of 3 (i.e. by three speech recognizers), the confidence values associated with each recognition are averaged together and then multiplied by the weight 1.04 which corresponds to the frequency=3.  Then, when another recognition result such as quarry is generated with a higher average confidence value but at a lower frequency (i.e. by fewer recognizers), because the weight (1.02) multiplied by the confidence of quarry is lower, the final outcome of average then weight ends up lower for quarry and Cory wins.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Close Art of Record
The closest art to the current Claims is Bennett (U.S. 2002/0194000) that teaches the feature of voting which assigns a confidence value according to the frequency with which a particular result has been generated from different recognizers; the more recognizers that vote for a particular result, the higher its confidence.
Baker (U.S. 6,122, 613) is also quite close because it generates a weighted combination of recognition results obtained from two different recognizers.  See Figure 5 and col. 8, line 50.

The independent Claims are supported by Figure 5c.
Figure 5C shows 3 recognizers with each recognizer generating an N-Best list where N>1.
The method averages the confidence value of the matching recognition results and then weights the resulting average by a frequency/voting factor.  If more of the recognizers generate the same result, the particular averaged confidence value gets a boost by the weight which is based on the Frequency of occurrence of the same result.  This is a “voting” feature; if a larger number of recognizers “vote” for a particular result, that result is given more weight/credence.  In effect, the method combines a first average and then vote feature to pick the speech recognition result.

Using several recognizers in parallel and giving a higher weight/confidence to a result that is generated by more of the recognizers is taught by Bennett, Figure 1, and “[0028] The individual-result confidence values may be used in a simple voting mechanism where several recognizers return a particular result. For example, the result may be "The quick brown fox." If 6 of the available recognizers return that particular result, that result will be given a higher confidence value than results that were returned only by one recognizer….”
Baker, Figure 5, obtains a combined confidence score by combining the confidence values (“scores”) of the results of two recognizers in a weighted average where a higher weight is assigned to the confidence value (‘score”) of the more reliable recognizer.  Col. 8, line 50. 
Neither teaches first average the confidence values of the matching result and then weight the average according to voting (frequency) of output of the particular result.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Baker (U.S. 6122613):

    PNG
    media_image2.png
    351
    759
    media_image2.png
    Greyscale

“The first and second sets of candidates are combined, for example, by taking their union to generate a combined set of candidates. The first and second sets of scores are merged to generate a combined set of scores, for example, by calculating a weighted average for each corresponding pair of scores.”  Col. 2, lines 56-61.
“Combining the real-time speech recognition results with the results from another speech recognizor that is optimized for high accuracy (an "offline" recognizor) provides final speech recognition results that are likely to be more accurate. Combining the recognition results from two recognizors based on weighting factors allows the speech recognition system to accord greater weight to the recognition result from the recognizor known to be more accurate.”  Col. 3, lines 35-45.

    PNG
    media_image3.png
    216
    451
    media_image3.png
    Greyscale

Col. 8, lines 45-55.

Note the application of Bennett (U.S. 2002/0194000) (issued as U.S. 6,996525) to the independent Claim prior to the amendments:
Regarding Claim 29, Bennett teaches:
29. A computer-implemented method comprising: 
obtaining, by a computer and from multiple speech recognition systems (SRSs), recognition results and confidence values for a speech recognition task initiated for a received audio signal, wherein the confidence values identify a plurality of probabilities that the recognition results are correct; [Bennett, Figure 1, Recognizers 14a to 14n that are operating in parallel on the same “input audio stream 12.”  “[0022] In one embodiment, recognizers return their converted speech accompanied by one or more values that indicates the confidence the recognizer has in a particular result. We call these values individual-result confidence values….”
processing, by the computer, the obtained recognition results based on the confidence values for the speech recognition task; [Bennett, Figure 1 teaches this limitation which doesn’t say what it does by its “processing” other than that it is “based on the confidence values.”  In Figure 1 the results output from each recognizer 14a, 14b, 14n are provided to an “output switch 16.”  The “results” are accompanied with their “confidence value.”  See [0022]-[0023].]
generating, by the computer, one or more weighted confidence values for the recognition results based on the recognition results, the confidence values for the speech recognition task, and contextual information related to the recognition results, the contextual information, which is related to  a plurality of same particular recognition results that is obtained from a subset of the multiple SRSs, including a number of the plurality of same particular recognition results;  [Bennett teaches that one of the methods by which the “output switch 16” selects the “recognized text 20” is by a “voting mechanism” in which a result that is returned by a larger number of the recognizers is given a higher weight.  “[0028] The individual-result confidence values may be used in a simple voting mechanism where several recognizers return a particular result. For example, the result may be "The quick brown fox." If 6 of the available recognizers return that particular result, that result will be given a higher confidence value than results that were returned only by one recognizer. This information can be leveraged with feedback and performance history, as will be discussed further. ….” ]
ranking, by the computer, the processed recognition results based on the one or more weighted confidence values for the recognition results; and [Bennett, the results are ranked based on the “voting scheme.”  See [0026] above.]
generating, by the computer, a final recognition result for the received audio signal based on the ranked recognition results. [Bennett, Figure 1, “Recognized Text 20” output from the “Output Switch 16.”  “[0013] The enabled recognizers perform the speech recognition tasks. The output of the enabled recognizers would then be sent to an output switch 16. The predictor then selects a set of results from the results presented to the output switch. The basis of that selection is discussed in more detail below.”]

    PNG
    media_image4.png
    473
    697
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    448
    559
    media_image5.png
    Greyscale
 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499.  The examiner can normally be reached on Monday through Thursday 9am to 4pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/FARIBA SIRJANI/
Primary Examiner, Art Unit 2659