DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

This Office Action is in response to correspondence filed 04 November 2020 in reference to application 17/089,179.  Claims 11-30 are pending and have ben examined.

Response to Amendment
The amendment filed 12 January 2021 has been accepted and considered in this office action.  Claims 1-10 have been cancelled and claims 11-30 added.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 11-30 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-6, 10, and 12-16 of U.S. Patent No. 10,861,438. Although the claims at issue are not identical, they are not patentably distinct from each other because patent 10,891,438 anticipates the instant claims and laid out in the chart below.

Instant Application
US Patent 10,891,438
Claim 11: A memory device having instructions stored thereon that, in response to execution by a processor, cause the processor to perform operations comprising: 
Claim 1:  A memory device having instructions stored thereon that, in response to execution by a processing device, cause the processing device to perform operations comprising:
generating a transcription of a first portion of audio data using a voice model, wherein the audio data comprises a single audio file with a recording of a plurality of speakers; 
 determining when a first speaker is speaking and when a second speaker is speaking in audio data comprising both speech of the first speaker and speech of the second speaker, wherein the second speaker is different from the first speaker, wherein the audio data comprises a single audio file; transcribing a first portion of the audio data based on one voice model to generate text data sets,
receiving a correction to the transcription of the first portion of the audio data; 
receiving at least one corrected text data set corresponding to the at least one of the text data sets
generating an updated voice model based on the correction to the transcription of the first portion of the audio data; and 
updating the voice model based on the at least one corrected text data set
generating a transcription of a second portion of audio data using the updated voice model.
transcribing a second portion of the audio data based on the voice model as updated
Claim 12: The memory device of claim 11, wherein the voice model comprises a voice- independent model.
Claim 1: wherein the voice model comprises a voice-independent model;
Claim 13: The memory device of claim 11, wherein at least one of the first portion of the audio data and the second portion of the audio data originates from a VoiP voicemail server.
Claim 2: The memory device of claim 1, wherein the first portion of the audio data or the second portion of the audio data originates from a VoiP voicemail server.
Claim 14: The memory device of claim 11, wherein at least one of the first portion of the audio data and the second portion of the audio data originates from a client computer coupled to at least one computer network.
Claim 3: The memory device of claim 1, wherein the first portion of the audio data or the second portion of the audio data originates from a client computer coupled to at least one computer network.
Claim 15: The memory device of claim 11, wherein the operations further comprise extracting at least one of the first portion of the audio data and the second portion of the audio data from an e-mail message.
Claim 4: The memory device of claim 1, wherein the operations further comprise extracting the first portion of the audio data or the second portion of the audio data from an e-mail message.
Claim 16: The memory device of claim 11, wherein the operations further comprise requesting at least one of the first portion of the audio data and the second portion of the audio data from a remote source.
Claim 5: The memory device of claim 1, wherein the operations further comprise requesting the first portion of the audio data or the second portion of the audio data from a remote audio data source.
Claim 17: The memory device of claim 11, wherein the operations further comprise prioritizing the first portion of the audio data.
Claim 6:  The memory device of claim 5, wherein the operations further comprise prioritizing the first portion of the audio data.
Claim 18: The memory device of claim 11, wherein the first portion of the audio data comprises a first remotely user-selected portion of the audio data, and wherein the second portion of the audio data includes a second remotely user-selected portion of the audio data.
Claim 10: The memory device of claim 1, wherein the first portion of the audio data includes a first remotely user-selected portion of the audio data, and wherein the second portion of the audio data includes a second remotely user-selected portion of the audio data.
Claim 19: The memory device of claim 11, wherein the operations further comprise determining when each of the plurality of speakers is speaking.
Claim 1: determining when a first speaker is speaking and when a second speaker is speaking in audio data
Claim 20: The memory device of claim 11, wherein the transcription of the first portion of audio data comprises a first text data set associated with one of the plurality of speakers and a second text data set associated with another one of the plurality of speakers.
Claim 1: wherein a first text data set of the text data sets is associated with the first speaker and a second text data set of the text data sets is associated with the second speaker.
Claim 21: A method comprising: 
Claim 12: A method comprising:
generating a transcription of a first portion of audio data using a voice model, wherein the audio data comprises a single audio file with a recording of a plurality of speakers; 
transcribing a first portion of audio data based on one voice model to generate first text data sets, wherein one of the first text data sets is associated with the first speaker, wherein another one of the first text data sets is associated with the second speaker
receiving a correction to the transcription of the first portion of the audio data; 
receiving at least one corrected text data set,
generating an updated voice model based on the correction to the transcription of the first portion of the audio data; and 
updating the voice mod(based on the at least one corrected text data set;

generating a transcription of a second portion of audio data using the updated voice model.
 transcribing a second portion of the audio data based on the voice model voice as updated to generate second text data sets.
Claim 22: The method of claim 21, wherein the voice model comprises a voice- independent model.
Claim 12: wherein the voice model comprises a voice-independent model;
Claim 23: The method of claim 21, wherein at least one of the first portion of the audio data and the second portion of the audio data originates from a VoiP voicemail server.
Claim 13: The method of claim 12, wherein the first portion of the audio data or the second portion of the audio data originates from a VoiP voicemail server.
Claim 24: The method of claim 21, wherein at least one of the first portion of the audio data and the second portion of the audio data originates from a client computer coupled to at least one computer network.
Claim 14: The method of claim 12, wherein the first portion of the audio data or the second portion of the audio data originates from a client computer coupled to at least one computer network.
Claim 25: The method of claim 21, further comprising extracting at least one of the first portion of the audio data and the second portion of the audio data from an e-mail message.
Claim 15: The method of claim 12, further comprising extracting the first portion of the audio data or the second portion of the audio data from an e-mail message.
Claim 26: The method of claim 21, further comprising requesting at least one of the first portion of the audio data and the second portion of the audio data from a remote source.
Claim 16: The method of claim 12, further comprising requesting the first portion of the audio data or the second portion of the audio data from a remote audio data source.
Claim 27: The method of claim 21, wherein the first portion of the audio data comprises a first remotely user-selected portion of the audio data, and wherein the second portion of the audio data includes a second remotely user-selected portion of the audio data.
Claim 10: The memory device of claim 1, wherein the first portion of the audio data includes a first remotely user-selected portion of the audio data, and wherein the second portion of the audio data includes a second remotely user-selected portion of the audio data.
Claim 28: The method of claim 21, further comprising determining when each of the plurality of speakers is speaking.
Claim 12: determining when a first speaker is speaking and when a second speaker is speaking in audio data comprising both speech of the first speaker and speech of the second speaker
Claim 29: The method of claim 21, wherein the transcription of the first portion of audio data comprises a first text data set associated with one of the plurality of speakers and a second text data set associated with another one of the plurality of speaker
Claim 12: wherein one of the first text data sets is associated with the first speaker, wherein another one of the first text data sets is associated with the second speaker,
Claim 30: A system comprising: 
Claim 1:  A memory device having instructions stored thereon that, in response to execution by a processing device, cause the processing device to perform operations comprising:
means for generating a transcription of a first portion of audio data using a voice model, wherein the audio data comprises a single audio file with a recording of a plurality of speakers; 
 determining when a first speaker is speaking and when a second speaker is speaking in audio data comprising both speech of the first speaker and speech of the second speaker, wherein the second speaker is different from the first speaker, wherein the audio data comprises a single audio file; transcribing a first portion of the audio data based on one voice model to generate text data sets,
means for receiving a correction to the transcription of the first portion of the audio data; 5
receiving at least one corrected text data set corresponding to the at least one of the text data sets
means for generating an updated voice model based on the correction to the transcription of the first portion of the audio data; and 
updating the voice model based on the at least one corrected text data set
means for generating a transcription of a second portion of audio data using the updated voice model.
transcribing a second portion of the audio data based on the voice model as updated


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 11-30 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claims 11, 21 and 30 recite generating a transcription of a first portion of an audio file using a voice model, receiving a correction of the transcription, generating an updated voice model, and generating a transcription of a second portion of the audio file using the updated model.
The limitation of generating a transcription of a first portion of an audio file using a voice model, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a memory and a processor, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the generic computer components, generating a transcription in the context of this claim encompasses the person listening to an audio file and generating a transcription using known samples of user voices found in the file as the model. Similarly, the limitation of receiving a correction of the transcription, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a memory and a processor, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the generic computer components, receiving a correction of the transcription in the context of this claim encompasses the person manually receiving corrections from another person. Next, the limitation of generating an updated voice model, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a memory and a processor, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the generic computer components, generating an updated voice model in the context of this claim encompasses the person manually relabeling the known speech segments from the speakers found in the file.  Finally, the limitation of generating a transcription of a second portion of the audio file using the updated model, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a memory and a processor, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the generic computer components, generating a transcription in the context of this claim encompasses the person listening to an audio file and generating a transcription using the updated known samples of user voices found in the file as the model. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only recites computer components recited at a high-level of generality (i.e., as a generic processor) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of computer components amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Claims 12 and 22 specifies that the voice model is a voice-independent model.  However, this does not prevent the abstract idea from being performed in the mind.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 

Claims 13 and 23 specifies that the audio file originates from a VoIP server.  However, this does not prevent the abstract idea from being performed in the mind. For example, a human can listen to messages played on a VoIP server.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 

Claims 14 and 24 specifies that the audio file originates from a client computer.  However, this does not prevent the abstract idea from being performed in the mind.  For example, a human can listen to messages played that has been transmitted from a client computer.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 

Claims 15 and 25 specifies that the audio file originates from an email message.  However, this does not prevent the abstract idea from being performed in the mind.  For example, a human can listen to messages played that has been sent via email.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 

Claims 16 and 26 add the step of requesting the audio data from a remote source.  However, this does not prevent the abstract idea from being performed in the mind.  For example, a human can send a request for a file via an email client.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 

Claim 17 adds the step of prioritizing the first portion of audio data.  However, this does not prevent the abstract idea from being performed in the mind.  For example, a human can transcribe the first portion before other portions.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 

Claims 18 and 27 specify that the first and second portions are remotely selected.  However, this does not prevent the abstract idea from being performed in the mind, as the method of selection does not affect how the file is transcribed.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 

Claims 19 and 28 adds the step of determining when each of the plurality of speakers in speaking.  However, this does not prevent the abstract idea from being performed in the mind as this step can be performed by manually recognizing who is speaking.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 

Claims 20 and 29 specify that the transcription has first text associated with a first speaker and second text associated with a second speaker.  However, this does not prevent the abstract idea from being performed in the mind, as a human could associate text with each speaker.  Nor does apply the abstract idea to a practical application or include additional elements that amount to significantly more than the abstract idea.  Therefore, these claims are not patent eligible as well. 
Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claim 11, 14-21 and 24-30 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Bijl et al. (US Patent 6,173,259) in view of Din et al. (US Patent 6,754,631) and further in view of Bates et al (US Patent 7,590,536).

Consider claim 11, Bijl teaches a memory device having instructions stored thereon that, in response to execution by a processor (col 3 lines 20-30, instructions, col 4 lines 25-35, processor), cause the processor to perform operations comprising: 
generating a transcription of a first portion of audio data using a voice model (speech recognition performed using Hidden Markov Models, generating text files, Col 8 line 40 – line Col 9 line 30, figure 2); 
receiving a correction to the transcription of the first portion of the audio data (user at corrector terminal corrects transcription, Col 10 lines 13- 54); 
generating an updated voice model based on the correction to the transcription of the first portion of the audio data (corrections are used to update speech recognition models; Col 11 line 64- Col 12 line 5).
Bijl does not specifically teach wherein the audio data comprises a single audio file with a recording of a plurality of speakers.
In the same field of transcription, Din teaches wherein the audio data comprises a single audio file with a recording of a plurality of speakers (col 6 lines 35-50, recording a meeting with multiple speakers for transcription).
Therefore it would have been obvious to one of ordinary skill in the art at the time of invention to transcribe audio files with multiple speakers as taught by Din in the system of Bijl in order to increase convenience when transcriptions is needed of multiparty conversation (Din Col 1 lines 10-37).
Bijl and Din do not specifically teach generating a transcription of a second portion of audio data using the updated voice model.
In the same field of transcriptions, Bates teaches generating a transcription of a second portion of audio data using the updated voice model (Col 8 lines 56-col 9 line 40, especially col 8 lines 62- col 9 line 3, models are updated while speech recognition runs continuously.  Thus, later portions of the same audio will be recognized using the updated models).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use updated speech models on later portions of the audio file as taught by Bates in the system of Bijl and Din in order to allow for improved accuracy of the transcription of the audio file.

Consider claim 13, Bijl teaches the memory device of claim 11, the first portion of the audio data or the second portion of the audio data originates from a client computer coupled to the at least one computer network (Col 5 lines 16-46 user terminal can be personal computer, and transcription request along with audio files may be sent via email.).

Consider claim 15, Bijl teaches the memory device of claim 11, wherein the operations further comprise extracting the first portion of the audio data or the second portion of the audio data from an e-mail message (Col 5 lines 16-46 user terminal can be personal computer, and transcription request along with audio files may be sent via email.).

Consider claim 16, Bijl teaches the memory device of claim 11, wherein the operations further comprise requesting the first portion of the audio data or the second portion of the audio data from a remote audio data source (in pack up and move, data is written to a temporary directory, until it is retrieved by a call from an external program; Col. 7 lines 40-67, remote clients).

Consider claim 17, Bijl teaches the memory device of claim 11, wherein the operations further comprise prioritizing the first portion of the audio data (priority assigned to each dictation request, col. 6 lines 30-34).

Consider claim 18, Bijl teaches the memory device of claim 11, wherein the first portion of the audio data includes a first remotely user-selected portion of the audio data, and wherein the second portion of the audio data includes a second remotely user-selected portion of the audio data (figure 2 audio data collected from multiple user terminals 2 associated with different users sent for transcription. col 4 lines 15-30.  By the user sending the files for transcription they are user selected, and include the first and second portion.  No requirement in the claim that the selection defines the first and second portion).

Consider claim 19, Din teaches the memory device of claim 11, wherein the operations further comprise determining when each of the plurality of speakers is speaking (col 6 lines 45-52, determining the identity of each speaker).

Consider claim 20, Din teaches the memory device of claim 11, wherein the transcription of the first portion of audio data comprises a first text data set associated with one of the plurality of speakers and a second text data set associated with another one of the plurality of speakers (col 6 lines 45-67, determining the identity of each speaker, and associating identity with each utterance in the transcription).

Consider claim 21, Bijl teaches a method (abstract) comprising: 
generating a transcription of a first portion of audio data using a voice model (speech recognition performed using Hidden Markov Models, generating text files, Col 8 line 40 – line Col 9 line 30, figure 2); 
receiving a correction to the transcription of the first portion of the audio data (user at corrector terminal corrects transcription, Col 10 lines 13- 54); 
generating an updated voice model based on the correction to the transcription of the first portion of the audio data (corrections are used to update speech recognition models; Col 11 line 64- Col 12 line 5).
Bijl does not specifically teach wherein the audio data comprises a single audio file with a recording of a plurality of speakers.
In the same field of transcription, Din teaches wherein the audio data comprises a single audio file with a recording of a plurality of speakers (col 6 lines 35-50, recording a meeting with multiple speakers for transcription).
Therefore it would have been obvious to one of ordinary skill in the art at the time of invention to transcribe audio files with multiple speakers as taught by Din in the system of Bijl in order to increase convenience when transcriptions is needed of multiparty conversation (Din Col 1 lines 10-37).
Bijl and Din do not specifically teach generating a transcription of a second portion of audio data using the updated voice model.
In the same field of transcriptions, Bates teaches generating a transcription of a second portion of audio data using the updated voice model (Col 8 lines 56-col 9 line 40, especially col 8 lines 62- col 9 line 3, models are updated while speech recognition runs continuously.  Thus, later portions of the same audio will be recognized using the updated models).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use updated speech models on later portions of the audio file as taught by Bates in the system of Bijl and Din in order to allow for improved accuracy of the transcription of the audio file.
Claim 24 contains similar limitations as claim 14 and is therefore rejected for the same reasons.

Claim 25 contains similar limitations as claim 15 and is therefore rejected for the same reasons.

Claim 26 contains similar limitations as claim 16 and is therefore rejected for the same reasons.

Claim 27 contains similar limitations as claim 18 and is therefore rejected for the same reasons.

Claim 28 contains similar limitations as claim 19 and is therefore rejected for the same reasons.

Claim 29 contains similar limitations as claim 20 and is therefore rejected for the same reasons.

Consider claim 30, Bijl teaches a system (abstract) comprising: 
means for generating a transcription of a first portion of audio data using a voice model (speech recognition performed using Hidden Markov Models, generating text files, Col 8 line 40 – line Col 9 line 30, figure 2); 
means for receiving a correction to the transcription of the first portion of the audio data (user at corrector terminal corrects transcription, Col 10 lines 13- 54); 
means for generating an updated voice model based on the correction to the transcription of the first portion of the audio data (corrections are used to update speech recognition models; Col 11 line 64- Col 12 line 5).
Bijl does not specifically teach wherein the audio data comprises a single audio file with a recording of a plurality of speakers.
In the same field of transcription, Din teaches wherein the audio data comprises a single audio file with a recording of a plurality of speakers (col 6 lines 35-50, recording a meeting with multiple speakers for transcription).
Therefore it would have been obvious to one of ordinary skill in the art at the time of invention to transcribe audio files with multiple speakers as taught by Din in the system of Bijl in order to increase convenience when transcriptions is needed of multiparty conversation (Din Col 1 lines 10-37).
Bijl and Din do not specifically teach generating a transcription of a second portion of audio data using the updated voice model.
In the same field of transcriptions, Bates teaches generating a transcription of a second portion of audio data using the updated voice model (Col 8 lines 56-col 9 line 40, especially col 8 lines 62- col 9 line 3, models are updated while speech recognition runs continuously.  Thus, later portions of the same audio will be recognized using the updated models).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use updated speech models on later portions of the audio file as taught by Bates in the system of Bijl and Din in order to allow for improved accuracy of the transcription of the audio file.

Claims 12 and 22 rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Bijl, Din and Bates as applied to claims 11 and 21 above, and further in view of Diede et al. (US Patent 6,963,633).

Consider claim 12, Bijl, Din and Bates teach the memory device of claim 11, but does not specifically teach wherein the voice model comprises a voice- independent model.
In the same field of transcription, Diede teaches wherein the voice model comprises a voice- independent model (abstract, col 24 line 62- col 25 line 52, speaker independent model used and may be updated).
Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to use speaker independent models as taught by Diede in the system of Bijl, Din and Bates in order to recognize speech from a wide variety of individuals that may not have been used to train the model.

Claim 22 contains similar limitations as claim 12 and is therefore rejected for the same reasons.

Claims 13 and 23 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Bijl, Din and Bates as applied to claims 11 and 21 above, and further in view of Walker (US PAP 2003/0050777).

Consider claim 13, Bijl, Din and Bates teach the memory device of claim 11, wherein the first portion of the audio data or the second portion of the audio data originates from a voicemail server. (Bijl, Col 5 lines 22-25, recordings can be from voicemail server).
Bijl, Din and Bates do not specifically teach receiving audio data from a VolP voicemail server.
In the same field of transcription, Walker teaches receiving audio data from a VolP system (0019, speech recognition can be performed on a VoIP system).
Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to include VoIP support as taught by Walker in the system Bijl, Din and Bates in order to provide transcription support for a well-known form of voice communication (Walker 0019).

Claim 23 contains similar limitations as claim 13 and is therefore rejected for the same reasons.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. YU et al. (US PAP 2005/0159949) also teaches updating models based on user corrections.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/           Primary Examiner, Art Unit 2655