DETAILED ACTION

Introduction

1.	This office action is in response to Applicant's submission filed on 05/27/2020. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20 are currently pending and examined below. 

Drawings

2.	The drawings filed on 05/27/2020 have been accepted and considered by the Examiner. 

Information Disclosure Statement

3.	The Information Statement (IDS) filed on 05/27/2020 has been accepted and considered in this office action and is in compliance with the provisions of 37 CFR 1.97.




Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 1-2, 4-6 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Endo (U.S. Patent # 7228275 B1) in view of Talieh (U.S. Patent # 11315569 B1).


With regards to claim 1, Endo teaches a method to transcribe communications, the method comprising obtaining a performance of a first transcription generation technique with respect to generating transcriptions of audio of a first communication session associated with a user (Col. 2, lines 28-67 and figure 2, teach a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores);

obtaining a performance of a second transcription generation technique with respect to generating transcriptions of the audio of the first communication session (Col. 2, lines 28-67 and figure 2, teach a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores);

determining a report based on the performance of the first transcription generation technique and the performance of the second transcription generation technique (Col. 2, lines 28-67 and figure 2, further teach that the recognized speech texts and their confidence scores are sent to a decision module which generates a report of which confidence score is higher);

directing the report to a first device associated with the user (Col. 2, lines 28-67 along with figure 2 and Col. 11, lines 45-67, further teach that the speech recognition text with a higher raw or adjusted confidence score is selected and sent as an input to a user device such as a car navigation system or home appliance);

However, Endo may not explicitly detail that in response to the report, obtaining an indication from the first device. This is taught by Talieh (Columns 11-12, teach that multiple speaker specific transcripts can me merged to create a meeting transcript. A user can then correct any errors in this meeting transcript);

Talieh also teaches directing a transcription of a second communication session to a second device for presentation to the user, the transcription being generated by the second transcription generation technique in response to the indication from the first device (Columns 11-12, teach also that once the user has corrected the meeting transcript, this updated transcript can then be either stored or displayed to all the speakers of the meeting).

Endo and Talieh can be considered as analogous art as they belong to a similar field of endeavor in speech transcription services. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Talieh (User directed display of a transcription to a particular device) with those of Endo (Use of a transcription comparator and decision module to pick best transcription) so as to provide high accuracy transcription in case of multiple speakers (Talieh, col. 1). 

With regards to claim 2, Endo teaches the method of claim 1, wherein the performance of the second transcription generation technique is based on one or more of the following transcription accuracy, transcription latency, and number of transcription corrections (Col. 2, lines 28-67 and figure 2, further teach that the first speech recognizer recognizes the input speech signal and generates a first speech text and a first confidence score indicating the level of accuracy of the first speech text. Likewise, the second speech recognizer also recognizes the input speech signal and generates a second speech text and a second confidence score indicating the level of accuracy of the second speech text. The decision module selects either the first speech text or the second speech text as the output speech text depending upon which of the first and second confidence scores is higher).

With regards to claim 4, Endo teaches the method of claim 1, wherein the report includes a recommendation for the second transcription generation technique and the indication includes a selection of the second transcription generation technique (Col. 2, lines 28-67 and figure 2, further teach that the first speech recognizer recognizes the input speech signal and generates a first speech text and a first confidence score indicating the level of accuracy of the first speech text. Likewise, the second speech recognizer also recognizes the input speech signal and generates a second speech text and a second confidence score indicating the level of accuracy of the second speech text. The decision module selects either the first speech text or the second speech text as the output speech text depending upon which of the first and second confidence scores is higher).

With regards to claim 5, Endo may not explicitly detail the limitation wherein the first device and the second device are the same device. However, Talieh teaches this (Claim 14, teaches providing at the first client device a graphical user interface for user selection of one or more of the speaker-specific transcripts, receiving a user selection of the first speaker-specific transcript and displaying the first speaker-specific transcript at the first client device).

Endo and Talieh can be considered as analogous art as they belong to a similar field of endeavor in speech transcription services. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Talieh (User directed display of a transcription to a particular device) with those of Endo (Use of a transcription comparator and decision module to pick best transcription) so as to provide high accuracy transcription in case of multiple speakers (Talieh, col. 1). 

With regards to claim 6, Endo teaches the method of claim 1, further comprising before determining the report, directing a second transcription of the first communication session, the second transcription generated by the first transcription generation technique (Col. 8, lines 50-67 and figure 4, teach an example of the speech recognition system of the present invention attempting to recognize the input speech "Ten University Avenue, Palo Alto" using two grammar-based speech recognizers and a statistical speech recognizer. The first grammar-based speech recognizer may recognize the input speech as "Ten University Avenue, Palo Alto" with a confidence score of 66. The second grammar-based speech recognizer may recognize the input speech as "Ten University Avenue, Palo Cedro" with a confidence score of 61. The statistical speech recognizer may recognize the input speech as "When University Avenue, Palo Alto" with a confidence score of 60. The speech recognition system will select the speech recognition result "Ten University Avenue Palo Alto" from the first grammar-based speech recognizer, since it has the highest confidence score);

However, Endo may not explicitly detail that the communication session that involves the second device to the second device. However, Talieh teaches this (Columns 11-12, teach that multiple speaker specific transcripts can me merged to create a meeting transcript. A user can then correct any errors in this meeting transcript. Once the user has corrected the meeting transcript, this updated transcript can then be either stored or displayed to all the speakers of the meeting. Claim 14, teaches providing at the first client device a graphical user interface for user selection of one or more of the speaker-specific transcripts, receiving a user selection of the first speaker-specific transcript and displaying the first speaker-specific transcript at the first client device).

Endo and Talieh can be considered as analogous art as they belong to a similar field of endeavor in speech transcription services. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Talieh (User directed display of a transcription to a particular device) with those of Endo (Use of a transcription comparator and decision module to pick best transcription) so as to provide high accuracy transcription in case of multiple speakers (Talieh, col. 1). 

With regards to claim 10, this is a computer readable medium (CRM) claim for the corresponding method claim 1. These two claims are related as method and CRM of using the same, with each claimed CRM element's function corresponding to the claimed method step. Accordingly, claim 10 is similarly rejected under the same rationale as applied above with respect to method claim 1.

5.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Endo in view of Talieh and further in view of Engelke (U.S. Patent Application Publication # 2001/0005825 A1).


With regards to claim 9, Endo and Talieh may not explicitly detail the limitation wherein one of the first transcription generation technique and the second transcription generation technique includes a revoicing of audio before transcription generation. However, Engelke teaches this (Para 24, teaches revoicing before transcription). 

Endo, Talieh and Engelke can be considered as analogous art as they belong to a similar field of endeavor in speech transcription services. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Engelke (Use of revoicing before transcription) with those of Endo and Talieh (Use of transcription services) so as to provide speech recognition programs that don’t need to be trained to a particular speaker and thus can handle direct translation of speech from a variety of users (Engelke, para 4). 

6.	Claims 11-16 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Endo in view of Stefanov (U.S. Patent Application Publication #  2021/0224695 A1).

With regards to claim 11, Endo teaches a method to transcribe communications, the method comprising selecting a first transcription generation technique from among a plurality of transcription generation techniques for generating transcriptions of audio of one or more communication sessions that involve a user device user (Col. 2, lines 28-67 along with figure 2 and Col. 11, lines 45-67, further teach that the speech recognition text with a higher raw or adjusted confidence score is selected and sent as an input to a user device such as a car navigation system or home appliance);

obtaining performances of the plurality of transcription generation techniques with respect to generating the transcriptions of the audio (Col. 2, lines 28-67 and figure 2, teach a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores);

monitoring comparisons between the performances of the plurality of transcription generation techniques (Col. 2, lines 28-67 and figure 2, further teach that the recognized speech texts and their confidence scores are sent to a decision module which generates a report of which confidence score is higher);

Endo may not explicitly detail obtaining input from the user with respect to the comparisons. However, Stefanov teaches this (Para 47 and figures 1-2, teach human supervision involving comparing a particular transcription with the document for transcription accuracy); 

Stefanov also teaches selecting a second transcription generation technique from among the plurality of transcription generation techniques based on the input from the user (Para 53 and figures 1-3, further that a user of the second computer device can share the transcription, the transcription score, the first ground-truth transcription, the refined transcription, the refined transcription score, and/or the second ground-truth transcription, with the first compute device).

Endo and Stefanov can be considered as analogous art as they belong to a similar field of endeavor in speech transcription services. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Stefanov (User selection of a particular transcription based on a comparison) with those of Endo (Use of a transcription comparator and decision module to pick best transcription) so as to provide highly reliable transcription systems by utilizing suitable fine-tuned machine learning models (Stefanov, para 10). 

With regards to claim 12, Endo teaches the method of claim 11, wherein the performances of the plurality of transcription generation techniques are based on one or more of the following transcription accuracy and transcription latency (Col. 2, lines 28-67 and figure 2, further teach that the first speech recognizer recognizes the input speech signal and generates a first speech text and a first confidence score indicating the level of accuracy of the first speech text. Likewise, the second speech recognizer also recognizes the input speech signal and generates a second speech text and a second confidence score indicating the level of accuracy of the second speech text. The decision module selects either the first speech text or the second speech text as the output speech text depending upon which of the first and second confidence scores is higher).

With regards to claim 13, Endo may not explicitly detail the limitation further comprising directing a report to the user based on the comparison, wherein the input is obtained in response to report (Para 47 and figures 1-2, teach that the second computer device executes the first trained machine learning model and the second trained machine learning model to generate a transcription and/or a transcription confidence score from a document and/or a data record. If the transcription confidence score is above a threshold, the transcription is accepted and is sent to an output such as for example, a monitor of the second compute device, a memory, a print out of the transcription, and/or the like. If the transcription is below the threshold, the transcription can be sent to a set of users of the second compute device for human supervision to generate a corrected transcription. In some instances, the human supervision involves observing the document, the data record, the transcription, and/or the transcription confidence score. The human supervision further involves comparing the transcription with the document for transcription accuracy. The human supervision can further optionally include assessing a corrected confidence score and/or generating the corrected transcription by, for example, typing the contents of the document to a word file). 

Endo and Stefanov can be considered as analogous art as they belong to a similar field of endeavor in speech transcription services. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Stefanov (User selection of a particular transcription based on a comparison) with those of Endo (Use of a transcription comparator and decision module to pick best transcription) so as to provide highly reliable transcription systems by utilizing suitable fine-tuned machine learning models (Stefanov, para 10). 

With regards to claim 14, Endo teaches the method of claim 11, wherein the second transcription generation technique does not generate a transcription of the audio such that the performance of the second transcription generation technique is an estimated performance (Columns 5-6, teach that each speech recognizer recognizes the input speech signal output from the microphone according to its own speech recognition mechanism, e.g. a grammar-based speech recognizer or a statistical speech recognizer, and outputs the recognized speech text along with an associated raw confidence score. The decision module can then adjust this raw confidence score). 

With regards to claim 15, Endo teaches the method of claim 11, wherein the selection of the first transcription generation technique is based on the performance of the first transcription generation technique (Col. 8, lines 50-67 and figure 4, teach an example of the speech recognition system of the present invention attempting to recognize the input speech "Ten University Avenue, Palo Alto" using two grammar-based speech recognizers and a statistical speech recognizer. The first grammar-based speech recognizer may recognize the input speech as "Ten University Avenue, Palo Alto" with a confidence score of 66. The second grammar-based speech recognizer may recognize the input speech as "Ten University Avenue, Palo Cedro" with a confidence score of 61. The statistical speech recognizer may recognize the input speech as "When University Avenue, Palo Alto" with a confidence score of 60. The speech recognition system will select the speech recognition result "Ten University Avenue Palo Alto" from the first grammar-based speech recognizer, since it has the highest confidence score).

With regards to claim 16, Endo teaches the method of claim 11, wherein monitoring comparisons between the performances of the plurality of transcription generation techniques occur with respect to a first communication session that involves the user device (Col. 8, lines 50-67 and figure 4, teach an example of the speech recognition system of the present invention attempting to recognize the input speech "Ten University Avenue, Palo Alto" using two grammar-based speech recognizers and a statistical speech recognizer. The first grammar-based speech recognizer may recognize the input speech as "Ten University Avenue, Palo Alto" with a confidence score of 66. The second grammar-based speech recognizer may recognize the input speech as "Ten University Avenue, Palo Cedro" with a confidence score of 61. The statistical speech recognizer may recognize the input speech as "When University Avenue, Palo Alto" with a confidence score of 60. The speech recognition system will select the speech recognition result "Ten University Avenue Palo Alto" from the first grammar-based speech recognizer, since it has the highest confidence score. This input speech is the first communication session. Col. 11, lines 45-67, further teach that the user device herein could be a car navigation system).

With regards to claim 19, this is a CRM claim for the corresponding method claim 11. These two claims are related as method and CRM of using the same, with each claimed CRM element's function corresponding to the claimed method step. Accordingly, claim 19 is similarly rejected under the same rationale as applied above with respect to method claim 11.

With regards to claim 20, this is a system claim for the corresponding method claim 11. These two claims are related as method and system of using the same, with each claimed system element's function corresponding to the claimed method step. Accordingly, claim 20 is similarly rejected under the same rationale as applied above with respect to method claim 11.

Allowable Subject Matter

7.	Claims 3, 7-8 and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The prior art of record, alone or in combination, does not currently suggest or teach the invention as outlined in these claims. The Examiner shall outline more detailed reasons for allowance as and when the Application goes to allowability.

Conclusion

8.	The following prior art, made of record but not relied upon, is considered pertinent to applicant's disclosure: Kahn (U.S. Patent Application Publication # 2006/0149558 A1), Nelson (U.S. Patent Application Publication # 2019/0333517 A1). These references are also included in the PTO-892 form attached with this office action.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. If you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). In case you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEERAJ SHARMA whose contact information is given below.  The examiner can normally be reached on Monday to Friday 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis-Desir can be reached on 571-272-7799 (Direct Phone).  The fax number for the organization where this application or proceeding is assigned is 571-273-8300.

/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659
571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)
neeraj.sharma@uspto.gov (Direct Email)