DETAILED ACTION
This action is in response to the RCE filed on 10/29/2021.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/29/2021 has been entered.

Response to Amendment
Applicant’s amendment filed on 10/29/2021 has been entered. Claims 1 and 10 have been amended. No claims have been canceled. No claims have been added. Claims 1, 3 -10 and 12 – 20 are still pending in this application, with claims 1 and 10 being independent.
Since the nonstationary double patent rejection presented in the prior office action has not been overcome by terminal disclaimer, argument or amendment, the nonstationary double patent rejection is maintained in this office action and updated below.

Allowable Subject Matter
Aside from the non-prior art rejections, the prior art fails to teach or suggest in reasonable combination the limitations recited in the independent claims.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or 

Claims 1, 3 – 10 and 12 – 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 – 20 of U.S. Patent No.11,043,209. Although the claims at issue are not identical, they are not patentably distinct from each other.
Current Application

1. (Currently Amended) A method for training a neural network to predict a word error rate of a plurality of transcription engines, the method comprising: segmenting the media file into a plurality of segments; inputting each segment, one segment at a time, of the plurality of segments into a first neural network trained to perform speech recognition; extracting outputs, one segment at a time, from one or more hidden layers of the first neural network; and training a second neural network to generate a predicted-WER (word error rate) of the plurality of transcription engines for each segment based at least on outputs from the one or more hidden layers of the first neural network, wherein training the second neural network to generate a predicted-WER of the plurality of transcription engines further comprises: transcribing each segment using the plurality of transcription engines to generate a transcription of each segment; generating a WER of each transcription engine for each segment based at least on ground truth data and the transcription of each segment; and training the second neural network to learn relationships between the generated WER of each transcription engine and outputs from the one or more hidden layers of the first neural network for each segment.

2. (Cancelled)

3. (Original) The method of claim 1, wherein the first neural network comprises a deep neural network.

4. (Original) The method of claim 3, wherein the deep neural network comprises a recurrent neural network, and the second neural network comprises a convolutional neural network.

5. (Original) The method of claim 4, wherein the convolution neural network comprises two hidden layers and a pooling layer in between the two hidden layers.

6. (Currently Amended) The method of claim 1, wherein extracting outputs from one or more hidden layers of the first neural network comprises extracting outputs from a last hidden layer of the deep neural network.

7. (Currently Amended) The method of claim 1, wherein extracting outputs from one or more hidden layers of the first neural network comprises extracting outputs from a first and last hidden layers of a deep neural network.

8. (Original) The method of claim 1, further comprising using an autoencoder neural network to reduce a number of input features from each segment such that a number of outputs from the first neural network are reduced.

9. (Original) The method of claim 8, wherein the autoencoder comprises approximately 256 channels.

10. (Currently Amended) A system for training a neural network to transcribe a media file, the system comprising: a memory; and one or more processors coupled to the memory, the one or more processor configured to: segment the media file into a plurality of segments; input each segment of the plurality of segments into a first neural network trained to perform speech recognition; extract outputs from one or more hidden layers of the first neural network; and train a second neural network to generate a predicted-WER (word error rate) of a plurality of transcription engines for each segment based at least on outputs from the one or more hidden layers of the first neural network, wherein the one or more processors are configured to train the second neural network to generate a predicted-WER further comprises configuring the one or more processor to: transcribe each segment using the plurality of transcription engines to generate a transcription of each segment; generate a WER of each transcription engine for each segment based at least on ground truth data and the transcription of each segment; and train the second neural network to learn relationships between the generated WER of each transcription engine and outputs from the one or more hidden layers of the first neural network for each segment.

11. (Cancelled)



13. (Original) The system of claim 12, wherein the deep neural network comprises a recurrent neural network, and the second neural network comprises a convolutional neural network.

14. (Original) The system of claim 13, wherein the convolution neural network comprises two hidden layers and a pooling layer in between the two hidden layers.

15. (Original) The system of claim 10, wherein the one or more processors are configured to extract outputs from one or more layers of the first neural network further comprises configuring the one or more processors to extract outputs from a last hidden layer of the deep neural network.

16. (Previously Presented) The system of claim 10, wherein the one or more processors are configured to extract outputs from one or more layers of the first neural network further comprises configuring the one or more processors to extract outputs from a first and last hidden layers of a deep neural network.

17. (Original) The system of claim 10, wherein the one or more processors are further configured to use an autoencoder neural network to reduce a number of input features from each segment such that a number of outputs from the one or more layers of the first neural network are reduced.

18. (Original) The system of claim 17, wherein the autoencoder comprises approximately 256 channels.

19. (Original) The system of claim 10, wherein the media file is segmented into segments having a duration ranging between 2 to 10 seconds.

20. (Original) The system of claim 19, wherein each segment comprises a 5-second segment.


 1.  A method for transcribing a media file, the method comprising: segmenting the media file into a plurality of segments;  extracting, using a first neural network, audio features of a first and second segment of the 
plurality of segments, wherein the first neural network is trained to perform speech recognition;  and identifying, using a second neural network, a best-candidate engine for each of the first and second segments based at least on audio features of the first and second segments, wherein the best-candidate 
engine is a neural network having a highest predicted transcription accuracy among a collection of neural networks. 
 
 2.  The method of claim 1, further comprising: requesting a first best-candidate engine for the first segment to transcribe the first segment;  requesting a second best-candidate engine for the second segment to transcribe the second segment;  receiving a first transcribed portion of the first segment from the first best-candidate engine in response to requesting the first 
best-candidate engine to transcribe the first segment;  receiving a second transcribed portion of the second segment from the second best-candidate engine 
in response to requesting the second best-candidate engine to transcribe the second segment;  and generating a merged transcription using the first and 
second transcribed portions. 
 
 3.  The method of claim 1, wherein segmenting the media file comprises segmenting the media file at location of the media file where no speech is detected. 
 
4.  The method of claim 1, wherein extracting using the first neural network comprises using a deep neural 
 
 5.  The method of claim 4, wherein using the deep neural network to extract audio features comprises using outputs of one or more hidden layers of the deep 
neural network as inputs to the second neural network. 
 
 6.  The method of claim 5, wherein the deep neural network comprises a recurrent neural network, and the second neural network comprises a convolutional neural network. 
 
    7.  The method of claim 5, wherein using outputs of one or more hidden layers of the deep neural network as inputs comprises using outputs of a last hidden layer of the deep neural network as inputs to the second neural network. 
 
8.  The method of claim 1, wherein the second neural network is trained to predict a word error rate (WER) of a plurality of transcription engines based 
at least on audio features extracted from each segment. 
 
9.  The method of claim 8, wherein identifying the best-candidate engine for each of the first and second segments comprises identifying a transcription 
engine with a lowest WER for each segment. 
 
10.  A system for transcribing a media file, the system comprising: a memory;  and one or more processors coupled to the memory, the one or more 
processor configured to: segment the media file into a plurality of segments;  extract, using a first neural network, audio features of a first and second 
segment of the plurality of segments, wherein the first neural network is trained to perform speech recognition;  and identify, using a second neural 
network, a best-candidate engine for each of the first and second segments based at least on audio features of the first and second segments, wherein the 
best-candidate engine is a neural network having a highest predicted transcription accuracy among a collection of neural networks. 
 
11.  The system of claim 10, wherein the one or more processors are further configured to: request a first best-candidate engine for the first segment to 
transcribe the first segment;  request a second best-candidate engine for the second segment to transcribe the second segment;  receive a first transcribed 
portion of the first segment from the first best-candidate engine in response to requesting the first best-candidate engine to transcribe the first segment;  

engine to transcribe the second segment;  and generate a merged transcription using the first and second transcribed portions. 
 
12.  The system of claim 10, wherein the one or more processors are configured to extract audio features of the first and second segments using a deep neural network. 
 
13.  The system of claim 12, wherein the one or more processors are configured to extract audio features of the first and second segments using outputs of one or more hidden layers of the deep neural network as inputs to the second neural network. 
 
14.  The system of claim 13, wherein the deep neural network comprises a recurrent neural network, and the second neural network comprises a convolutional neural network. 
 
15.  The system of claim 13, wherein the one or more processors are further configured to: using an autoencoder neural network to reduce a number of 
outputs from the first neural network by reducing a number of inputs to the first neural network to reduce overfitting. 
 
 16.  The system of claim 10, wherein the second neural network is trained to predict a word error rate (WER) of a plurality of transcription engines 
based at least on audio features extracted from each segment. 
 
17.  A method for transcribing an audio file, the method comprising: using an audio file as inputs to a deep neural network trained to perform speech 
recognition;  and using outputs of one or more hidden layers of the deep neural network as inputs to a second neural network that is trained to identify a 
first transcription engine having a highest predicted transcription accuracy among a group of transcription engines for the audio file based at least on the 
outputs of the one or more hidden layers of the deep neural network. 
 
18.  The method of claim 17, wherein the second neural network is trained to predict a word error rate (WER) of the group of transcription engines based 
at least on outputs of the one or more hidden layers of the deep neural network and on characteristics of each respective engine of the group of transcription 

 
 19.  The method of claim 17, wherein the deep and second neural networks comprise a recurrent neural network and a convolutional neural network, 
respectively. 
 
 20.  The method of claim 17, wherein using outputs of one or more hidden layers comprises using outputs of a first and last layer of the hidden layers of the deep neural network.


	As shown above, the limitations recited by claims 1 – 20 of US 11,043,209 further recite the limitations which are recited by claims 1, 3 – 10 and 12 – 20 of the current application except for the following: the actual training of second neural network; the 256 channels of the autoencoder; the hidden layers and pooling layer of the convolutional neural network; and the segments of the media file. However, each of these features are considered to be obvious variants of those features recited in claims 1 – 20 of US 11,043,209.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SONIA L GAY whose telephone number is (571)270-1951. The examiner can normally be reached Monday-Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SONIA L GAY/Primary Examiner, Art Unit 2657