Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/23/2021 has been entered.
 Response to Amendment
3.	In response to the office action mailed on 07/23/2021, applicant filed an amendment on 12/23/2021, amending claims 1, 10, 15, 20, 24, and canceling claims 7, 19.  The pending claims are 1-6, 8-18, and 20-25. 

Response to Arguments
4.	Applicant's arguments filed 12/23/2021 have been fully considered but they are not persuasive.
As per claim 1, applicant argues that the prior art does not disclose, selecting, from a plurality of computing devices, a master computing device, individuals of the plurality of devices associated with different users and trained for their respective user's voice; and wherein the portions of recognized speech received from individuals of the plurality of devices are obtained from audio recorded separately by individual devices of the plurality of devices.


As per the rest of the claims, and combinations of prior art reference, applicant has no further arguments beside the ones mentioned above.  Therefore, all the combinations of prior art reference mentioned above are valid, and all other claims are rejected for the same reasons as set above. 
 Claim Interpretation
5.	The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim 24-25 are being interpreted under 35 U.S.C. 112(f).

Claim Objections
6.	Claim 1, 15, 20, 24 are objected to because of the following informalities:  
Claim 1 recites, “Selecting, from a plurality of computing devices, a master computing device, individuals of the plurality of devices associated with different users and trained for its respective user's voice). The examiner interprets the determiner “its” as “their”.  Accordingly, the their respective user's voice-.
Claim 1 also recites, “wherein the portions of recognized speech received from individuals of the plurality of devices is obtained from audio recorded separately by individual devices of the plurality of devices”.  The limitation is interpreted as - wherein the portions of recognized speech received from individuals of the plurality of devices are obtained from audio recorded separately by individual devices of the plurality of devices-.
Same applies to claim 15, 20, and 24.
Appropriate correction is required.



Claim Rejections - 35 USC § 103
7.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-6, 9-12, 14-18, 20-25 are rejected under 35 U.S.C. 103 as being unpatentable over Wetjen (US 20150106091 A1) in view of Thomson (US 20200175987 with an effective filing date of 12/04/2018). 
 Regarding claim 1, Wetjen discloses a method for performing collaborative automatic speech recognition (see Wetjen Abstract, which notes a method for processing multiple individual participant speech in a conference call with an audio speech recognition system), the method comprising: 
receiving, by the master computing device (see Wetjen [0156], which notes a block schematic diagram of a computer system 900 to implement one or more of the methods according to example embodiments), a plurality of portions of recognized speech from individuals of the plurality of devices (see Wetjen FIG. 6, which shows at 630 receiving transcribed text from ASR services/devices 610, 615, and 620; and see Wetjen [0108], which notes a plurality (610, 615, and 620) of ASR services/devices each providing respective identified/recognized words in response to an input audio stream), each portion including an associated confidence score (see Wetjen [0108], which notes if mismatched words/portions occur in method 600, the portion having mismatched words/portions are provided to element 640 where the highest confidence words are selected;) and time stamp (see Wetjen [0108], which notes the words/portions may be correlated based on time stamps and channel corresponding to a user in one embodiment to ensure each service is processing the same utterance); 
for one or more time stamps associated with the plurality of portions (see Wetjen [0108], which notes the words/portions may be correlated/identified based on time stamps and channel corresponding to a user in one embodiment to ensure each service is processing the same utterance), identifying by the master computing device, two or more confidence scores for two (see Wetjen [0109], which notes words and phrases are then selected from among the non-matching words and phrases from each speech recognition service such that the selected words have the highest (in comparison with lower) confidence values and are correctly aligned in time/correlated with the matching words); 
selecting, by the master computing device, for the one or more time stamps, one of the two or more of the plurality of portions of recognized speech based on the two or more confidence scores for the two or more of the plurality of portions ( see Wetjen [0109], which notes in the method 600 that words and phrases are then selected from among the non-matching words and phrases from each speech recognition service such that the selected words/portions have the highest confidence values and are correctly aligned in time with the matching words/portions and phrases to form a new complete recognition result containing the best selections from the results of all speech recognition services); and
generating, by the master computing device, a transcript using the one of the two or more of the plurality of portions of recognized speech selected for the respective one or more time stamps (see Wetjen [0109], which notes in method 600 a single, more accurate recognition result is obtained by combining elements selected from each of the speech recognition services, providing a highly accurate transcription of the speaker).
Further, Wetjen teaches receiving an audio stream by multiple automatic speech recognition services 610, 615, and 620 (Fig. 6).  The identified words by the respective services are compared at 630. The words may be correlated based on time stamps and channel corresponding to a user to ensure each service is processing the same utterance.
Wetjen does not explicitly disclose selecting, from a plurality of computing devices, a master computing device, individuals of the plurality of devices associated with different users 
Thomson in the same field of endeavor teaches at Fig. 13 and [0333]-[0340], a fuser 1324 may be configured to merge portions of recognized speech received from individuals of the plurality of devices 1320a-n are obtained from audio recorded separately by individual devices of the plurality of devices.  The recognized portions generated by ASR systems 1320 are fused by fuser device 1324 to create a fused transcription ([0337]).  The plurality of ASR systems 1320 may be included in a single transcription unit, spread across multiple transcription units, or may be part of different API services, such as services provided by different vendors.  Each of the plurality of ASR systems 1320 is speaker-dependent ([0334]), wherein ASR1 and ASR2 are built or trained by different vendors with different sets of acoustic and/or text data ([0338]). Acoustic data is divided by speaker category or demographic such as accent or dialect, geographical region, gender, age (child, elderly, etc.) ([0339]).  Therefore, it would have been obvious at the time the application was filed to use Thomson’s feature of fusing recognized portions from a plurality of separate devices to create a fused transcription with the system of Wetjen, in order to improve the accuracy of ASR systems and transcription of audio of communication sessions.
Regarding claim 15, Wetjen discloses a non-transitory computer-readable storage medium having stored thereon computer executable instructions for performing collaborative automatic speech recognition, wherein the instructions, when executed by a computer device, cause the computer device to be operable for (see Wetjen [0156], which notes FIG. 9 is a block schematic diagram of a computer system 900 to implement one or more of the methods according to example embodiments; and see Wetjen [0157], which notes computer-readable 
receiving (see Wetjen [0156], which notes a block schematic diagram of a computer system 900 to implement one or more of the methods according to example embodiments) a plurality of portions of recognized speech from a plurality of devices (see Wetjen FIG. 6, which shows at 630 receiving transcribed text from ASR services/devices 610, 615, and 620; and see Wetjen [0108], which notes a plurality (610, 615, and 620) of ASR services/devices each providing respective identified/recognized words in response to an input audio stream), each portion including an associated confidence score (see Wetjen [0108], which notes if mismatched words/portions occur in method 600, the portion having mismatched words/portions are provided to element 640 where the highest confidence words are selected;) and time stamp (see Wetjen [0108], which notes the words/portions may be correlated based on time stamps and channel corresponding to a user in one embodiment to ensure each service is processing the same utterance);
for one or more time stamps associated with the plurality of portions, identifying (see Wetjen [0108], which notes the words/portions may be correlated/identified based on time stamps and channel corresponding to a user in one embodiment to ensure each service is processing the same utterance) two or more confidence scores for two or more of the plurality of portions of recognized speech (see Wetjen [0109], which notes words and phrases are then selected from among the non-matching words and phrases from each speech recognition service such that the selected words have the highest (in comparison 
selecting for the one or more time stamps, one of the two or more of the plurality of portions of recognized speech based on the two or more confidence scores for the two or more of the plurality of portions (and see Wetjen [0109], which notes in the method 600 that words and phrases are then selected from among the non-matching words and phrases from each speech recognition service such that the selected words/portions have the highest confidence values and are correctly aligned in time with the matching words/portions and phrases to form a new complete recognition result containing the best selections from the results of all speech recognition services); and 
generating a transcript using the one of the two or more of the plurality of portions of recognized speech selected for the respective one or more time stamps (see Wetjen [0109], which notes in method 600 a single, more accurate recognition result is obtained by combining elements selected from each of the speech recognition services, providing a highly accurate transcription of the speaker).  
Further, Wetjen teaches receiving an audio stream by multiple automatic speech recognition services 610, 615, and 620 (Fig. 6).  The identified words by the respective services are compared at 630. The words may be correlated based on time stamps and channel corresponding to a user to ensure each service is processing the same utterance.
Wetjen does not explicitly disclose individuals of the plurality of devices associated with different users and trained for their respective user's voice; and wherein each of the plurality of devices are separate from the computing device, and each portion of recognized speech received 
Thomson in the same field of endeavor teaches at Fig. 13 and [0333]-[0340], a fuser 1324 may be configured to merge portions of recognized speech received from individuals of the plurality of devices 1320a-n are obtained from audio recorded separately by individual devices of the plurality of devices.  The recognized portions generated by ASR systems 1320 are fused by fuser device 1324 to create a fused transcription ([0337]).  The plurality of ASR systems 1320 may be included in a single transcription unit, spread across multiple transcription units, or may be part of different API services, such as services provided by different vendors.  Each of the plurality of ASR systems 1320 is speaker-dependent ([0334]), wherein ASR1 and ASR2 are built or trained by different vendors with different sets of acoustic and/or text data ([0338]). Acoustic data is divided by speaker category or demographic such as accent or dialect, geographical region, gender, age (child, elderly, etc.) ([0339]).  Therefore, it would have been obvious at the time the application was filed to use Thomson’s feature of fusing recognized portions from a plurality of separate devices to create a fused transcription with the system of Wetjen, in order to improve the accuracy of ASR systems and transcription of audio of communication sessions.
Regarding claim 20, Wetjen discloses an apparatus for performing collaborative automatic speech recognition, the apparatus comprising: one or more computer processors; and a computer-readable storage medium comprising instructions for controlling the one or more computer processors to be operable (see Wetjen [0156], which notes FIG. 9 is a block schematic diagram of a computer system 900 to implement one or more of the methods according to example embodiments, where one example computing device in the form of a computer 900, may include a processing unit 902, memory 903, removable storage 910, and non-removable storage 912; and see Wetjen [0157], which notes computer-readable instructions stored on a computer-readable medium are executable by the processing unit 902 of the computer 900):
receiving (see Wetjen [0156], which notes a block schematic diagram of a computer system 900 to implement one or more of the methods according to example embodiments) a plurality of portions of recognized speech from a plurality of devices (see Wetjen FIG. 6, which shows at 630 receiving transcribed text from ASR services/devices 610, 615, and 620; and see Wetjen [0108], which notes a plurality (610, 615, and 620) of ASR services/devices each providing respective identified/recognized words in response to an input audio stream), each portion including an associated confidence score (see Wetjen [0108], which notes if mismatched words/portions occur in method 600, the portion having mismatched words/portions are provided to element 640 where the highest confidence words are selected;) and time stamp (see Wetjen [0108], which notes the words/portions may be correlated based on time stamps and channel corresponding to a user in one embodiment to ensure each service is processing the same utterance); 
for one or more time stamps associated with the plurality of portions, (see Wetjen [0108], which notes the words/portions may be correlated/identified based on time stamps and channel corresponding to a user in one embodiment to ensure each service is processing the same utterance), identifying two or more confidence scores for two or more of the plurality of portions of recognized speech (see Wetjen [0109], which notes words and phrases are then selected from among the non-matching words and phrases from each speech recognition service such that the selected words have the highest (in comparison with lower) confidence values and are correctly aligned in time/correlated with the matching words); 
(and see Wetjen [0109], which notes in the method 600 that words and phrases are then selected from among the non-matching words and phrases from each speech recognition service such that the selected words/portions have the highest confidence values and are correctly aligned in time with the matching words/portions and phrases to form a new complete recognition result containing the best selections from the results of all speech recognition services); and
generating a transcript using the one of the two or more of the plurality of portions of recognized speech selected for the respective one or more time stamps (see Wetjen [0109], which notes in method 600 a single, more accurate recognition result is obtained by combining elements selected from each of the speech recognition services, providing a highly accurate transcription of the speaker). 
Further, Wetjen teaches receiving an audio stream by multiple automatic speech recognition services 610, 615, and 620 (Fig. 6).  The identified words by the respective services are compared at 630. The words may be correlated based on time stamps and channel corresponding to a user to ensure each service is processing the same utterance.
Wetjen does not explicitly disclose individuals of the plurality of devices associated with different users and trained for their respective user's voice ; and wherein each of the plurality of devices are separate from the computing device, and each portion of recognized speech received from each of the plurality of devices is obtained by each of the plurality of devices from audio separately recorded by each respective device.

Regarding claim 24, Wetjen discloses a method for performing collaborative automatic speech recognition (see Wetjen Abstract, which notes a method for processing multiple individual participant speech in a conference call with an audio speech recognition system), the method comprising:
means for receiving (see Wetjen [0156], which notes a block schematic diagram of a computer system 900 to implement one or more of the methods according to example embodiments) a plurality of portions of recognized speech from a plurality of devices (see Wetjen FIG. 6, which shows at 630 receiving transcribed text from ASR services/devices 610, 615, and 620; and see Wetjen [0108], which notes a plurality (610, 615, and 620) of ASR services/devices each providing respective identified/recognized words in response to an input audio stream), each portion including an associated confidence score (see Wetjen [0108], which notes if mismatched words/portions occur in method 600, the portion having mismatched words/portions are provided to element 640 where the highest confidence words are selected;) and time stamp (see Wetjen [0108], which notes the words/portions may be correlated based on time stamps and channel corresponding to a user in one embodiment to ensure each service is processing the same utterance);
means for identifying (see Wetjen [0108], which notes the words/portions may be correlated/identified based on time stamps and channel corresponding to a user in one embodiment to ensure each service is processing the same utterance) two or more confidence scores for two or more of the plurality of portions of recognized speech for one or more time stamps associated with the plurality of portions (see Wetjen [0109], which notes words and phrases are then selected from among the non-matching words and phrases from each speech recognition service such that the selected words have the highest (in comparison with lower) confidence values and are correctly aligned in time/correlated with the matching words);
means for selecting for the one or more time stamps, one of the two or more of the plurality of portions of recognized speech based on the two or more confidence scores for the two or more of the plurality of portions (and see Wetjen [0109], which notes in the method 600 that words and phrases are then selected from among the non-matching words and phrases from each speech recognition service such that the selected words/portions have the highest confidence values and are correctly aligned in time with the matching words/portions and phrases to form a new complete recognition result containing the best selections from the results of all speech recognition services); and
(see Wetjen [0109], which notes in method 600 a single, more accurate recognition result is obtained by combining elements selected from each of the speech recognition services, providing a highly accurate transcription of the speaker).
Further, Wetjen teaches receiving an audio stream by multiple automatic speech recognition services 610, 615, and 620 (Fig. 6).  The identified words by the respective services are compared at 630. The words may be correlated based on time stamps and channel corresponding to a user to ensure each service is processing the same utterance.
Wetjen does not explicitly disclose individuals of the plurality of devices associated with different users and trained for their respective user's voice; and wherein each of the plurality of devices are separate from the computing device, and each portion of recognized speech received from each of the plurality of devices is obtained by each of the plurality of devices from audio separately recorded by each respective device.
Thomson in the same field of endeavor teaches at Fig. 13 and [0333]-[0340], a fuser 1324 may be configured to merge portions of recognized speech received from individuals of the plurality of devices 1320a-n are obtained from audio recorded separately by individual devices of the plurality of devices.  The recognized portions generated by ASR systems 1320 are fused by fuser device 1324 to create a fused transcription ([0337]).  The plurality of ASR systems 1320 may be included in a single transcription unit, spread across multiple transcription units, or may be part of different API services, such as services provided by different vendors.  Each of the plurality of ASR systems 1320 is speaker-dependent ([0334]), wherein ASR1 and ASR2 are built or trained by different vendors with different sets of acoustic and/or text data ([0338]). Acoustic 
 As per claims 2, 16, 21, and 25, Wetjen teaches all of the limitations of claims 1, 15, 20, and 24 above.  
Wetjen does not explicitly disclose wherein the plurality of portions of recognized speech are recognized using a plurality of automatic speech recognition systems that are using differently trained models.
However, Thomson does teach wherein the plurality of portions of recognized speech are recognized using a plurality of automatic speech recognition systems that are using differently trained models (see Thomson [0337], which notes that FIG. 13 also illustrates a fuser 1324 that may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription; and see Thomson TABLE 3, which notes in outline point #2 that ASR1 and ASR2 may be configured or trained differently or use different models).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the different ASR models as taught by Thomson in order to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused (see Thomson [0337], which notes the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription).  The combination of Wetjen and Thomson includes predictable results, such as the creation of a fused transcription.
As per claim 3, Wetjen teaches all of the limitations of claim 2 above. Wetjen does not explicitly disclose wherein the differently trained models include different parameters that are used by respective automatic speech recognition systems to recognize speech.
Thomson in the same field of endeavor does teach wherein the differently trained models include different parameters that are used by respective automatic speech recognition systems to recognize speech (see Thomson TABLE 3, which notes in outline point #3 that ASR2 may run in a reduced mode (having different parameters) or may be “crippled” or deliberately configured to deliver results with reduced accuracy, compared to ASR1: because ASR2 may tend to perform reasonably well with speech that is easy to understand, and therefore closely match the results of ASR1, the agreement rate between ASR1 and ASR2 may be used to predict the accuracy of ASR1, ASR2, and/or other ASR systems).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the different ASR models as taught by Thomson in order to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription (see Thomson [0337], which notes the fuser 1324 may be 
As per claims 4, 17, and 22, Wetjen teaches all of the limitations of claims 1, 15, and 20 above.  Wetjen does not explicitly disclose wherein a model for an automatic speech recognition system in one of the plurality of devices is trained using speech samples from a user.
Thomson in the same field of endeavor does teach wherein a model for an automatic speech recognition system in one of the plurality of devices is trained using speech samples from a user (see Thomson [0337], which notes [0116] the ASR systems described in this disclosure may be separated into one of two categories: speaker-dependent ASR systems and speaker-independent ASR systems, where a speaker-dependent ASR system may use a speaker-dependent speech model. A speaker-dependent speech model may be specific to a particular person or a group of people, where for example, a speaker-dependent ASR system configured to transcribe a communication session between the first user 110 and the second user 112 may include a speaker-dependent speech model that may be specifically trained using speech patterns for either or both the first user 110 and the second user 112.).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the different ASR models as taught by Thomson in order to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the 
As per claim 5, Wetjen teaches all of the limitations of claim 4 above. Wetjen does not explicitly disclose wherein the model of the automatic speech recognition system is also trained using standardized speech samples from other users.
Thomson in the same field of endeavor does teach wherein the model of the automatic speech recognition system is also trained using standardized speech samples from other users (see Thomson [0338], which notes a speaker-independent ASR system may be trained on a speaker-independent speech model. A speaker-independent speech model may be trained for general speech and not specifically trained using speech patterns of the people for which the speech model is employed, where for example, a speaker-independent ASR system configured to transcribe a communication session between the first user 110 and the second user 112 may include a speaker-independent speech model that may not be specifically trained using speech patterns for the first user 110 or the second user 112. In these and other embodiments, the speaker-independent speech model may be trained using speech patterns of users of the transcription system 108 other than the first user 110 and the second user 112).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by 
As per claims 6, 18, and 23, Wetjen teaches all of the limitations of claim 1 above.  Wetjen does not explicitly disclose wherein a model for an automatic speech recognition system in a device in the plurality of devices is trained using standardized speech samples that are altered based on characteristics of speech samples from a user.
Thomson in the same field of endeavor does teach wherein a model for an automatic speech recognition system in a device in the plurality of devices is trained using standardized speech samples (see Thomson [0251], which notes the feature extractor 504 receives audio/speech samples and generates one or more features based on a feature model 505, where types of features may include LSFs (line spectral frequencies), cepstral features, and MFCCs (Mel Scale Cepstral Coefficients) as well as features derived from a video signal, such as a video of the speaker's lips or face) that are altered based on characteristics of speech samples from a user (see Thomson [0253], which notes the feature transformer 506 may be configured to convert/alter the extracted features, based on a transform model 507, into a transformed format 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the different ASR models as taught by Thomson in order to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription (see Thomson [0337], which notes the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription). The combination of Wetjen and Thomson includes predictable results, such as the creation of a fused transcription.
As per claim 9, Wetjen teaches all of the limitations of claim 1 above.  Wetjen does not explicitly disclose wherein each of the plurality of devices communicates the plurality of portions of recognized speech to each other.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the different ASR models as taught by Thomson in order to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription (see Thomson [0337], which notes the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription). The combination of Wetjen and Thomson includes predictable results, such as the creation of a fused transcription.
As per claim 10, the combination of Wetjen and Thomson teaches all of the limitations of claim 7 above.  Wetjen does not explicitly disclose wherein each of the plurality of devices generates the transcript.
Thomson in the same field of endeavor teachs wherein each of the plurality of devices generates the transcript (see Thomson [0192], which notes each transcription unit/device 214 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the different ASR models as taught by Thomson in order to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription (see Thomson [0337], which notes the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to 
As per claim 11, Wetjen teaches all of the limitations of claim 1 above. Wetjen does not explicitly disclose post-processing the transcript to alter the transcript.
However, Thomson the same field of endeavor does teach further comprising: post-processing the transcript to alter the transcript (see Thomson [0135], which notes the text editor 126 may be configured to obtain transcriptions from the ASR systems 120 and/or the fuser, where for example, the text editor 126 may obtain the transcription from the second ASR system 120b. The text editor 126 may be configured to obtain edits/post-process a transcription).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the transcribing endpoints as taught by Thomson in order to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription (see Thomson [0337], which notes the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription). The combination of Wetjen and Thomson includes predictable results, such as the creation of a fused transcription.
As per claim 12, Wetjen teaches all of the limitations of claim 1 above. Wetjen does not explicitly disclose adding an item to the transcript to alter the transcript.
Thomson the same field of endeavor does teach further comprising: adding an item to the transcript to alter the transcript (see Thomson [0136], which notes an example where the text editor 126 may be configured to direct a display of a device associated with the CA client 122 to present a transcription for viewing by a person, such as the CA 118 or another CA, among others. The person may review the transcription and provide input/add an item through an input device regarding edits to the transcription).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the different ASR models as taught by Thomson in order to create a fused transcription, where the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription (see Thomson [0337], which notes the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription). The combination of Wetjen and Thomson includes predictable results, such as the creation of a fused transcription.
As per claim 14, Wetjen teaches all of the limitations of claim 1 above.  Wetjen does not explicitly disclose one of the plurality of portions of recognized speech is from speech samples from a user, each of the plurality of devices recognizes the one of the plurality of portions of 
Thomson in the same field of endeavor does teach further comprising: one of the plurality of portions of recognized speech is from speech samples from a user (see Thomson [0636], which notes an example with respect to predicting or estimating accuracy of a transcription, wherein one or more companion ASR systems may process substantially the same speech as a first transcription unit.), each of the plurality of devices recognizes the one of the plurality of portions of recognized speech from the speech samples from the user (see Thomson [0359], which notes in an example, a speaker may say “OK, let's meet at four.” During the transcription generation process 1402, three different ASR systems (e.g., ASR systems 1320 of FIG. 13) may each generate one of the below hypotheses: wherein a first hypothesis by a first ASR system is noted in [0360] 1. OK, let's meet more; wherein a second hypothesis by a second ASR system is noted in [0361], 2. OK, says meet at 4:00; wherein a third hypothesis by a third ASR system is noted in [0362], 3. OK, ha let's meet at far), and the one of the plurality of portions of recognized speech from each of the plurality of devices each includes a different confidence score (see Thomson [0371], which notes additional criteria may also include an ASR system including a relatively higher estimated accuracy for a segment (e.g., phrase, sentence, turn, series, or session) of words containing the token, and which notes yet another additional criterion might be analyzing a confidence score given to a token from the ASR system that generated the token.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by .
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Wetjen in view of Thomson and further in view of Anuar (US 20110261149 A1).  
As per claim 8, Wetjen teaches all of the limitations of claim 1 above.  Wetjen in view of Thomson does not explicitly disclose initializing a meeting for the plurality of devices, wherein the computing device establishes a communication channel with each of the plurality of devices to receive the plurality of portions of recognized speech.
However, Anuar in the same field of endeavor teaches further comprising: initializing a meeting (see Anuar [0066], which notes in 602, a videoconference may be initiated or performed between a plurality of participants at respective participant locations, where the conference may be initiated between a first participant using a first endpoint (e.g., at a first participant location) and a plurality of other participants using other endpoints (e.g., at other participant locations). Thus, endpoints may be similar to those described above regarding FIGS. 1-4, although other variations are envisioned. The videoconference may be established according to any of a variety 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the transcribing endpoints as taught by Anuar in order to determine an endpoint having a best/highest quality connection to a recording server, where the quality of the determined endpoint to the recording server transcription may include a quality that is higher than what would be otherwise available from a connection between a fixed MCU to the .
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Wetjen in view of Thomson and further in view of Diamant (US 20190341050 A1).  
As per claim 13, Wetjen teaches all of the limitations of claim 1 above.  Wetjen in view of Thomson  does not explicitly disclose downloading presentation materials; and adding at least a portion of the transcript to the presentation materials.
Diamant in the same field of endeavor teaches downloading presentation materials (see instant application [0034], which notes at 308, any presentation materials are downloaded, where for example, master device 104 may download a presentation that will be presented during the meeting; see Diamant [0123], which notes creating the transcript at 211 may further include tracking shared digital information at 220, where such shared digital information may include any suitable digital content, e.g., word processor documents, presentation slides, multimedia files, computer programs, or any other files being reviewed by conference participants. For example, tracking shared digital information may include tracking a time at which one or more files were shared among conference participants. In some examples, tracking shared digital 
adding at least a portion of the transcript to the presentation materials (see Diamant [0123], which notes such tracking may enable reviewing presentation slides alongside transcribed conversation; and see Diamant FIG. 18, which shows the transcript entries 181 recording the downloaded presentation materials as event E4.  Additionally the whiteboard depictions 184, 185, and 186 are presentation materials shown in expanded view 180 that are downloaded and displayed to remote participants as noted in Diamant [0124] in conjunction with contemporaneous transcript entries).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Wetjen with the transcription machine methods as taught by Diamant in order to extend transcriptions to a variety of scenarios that include sharing of whiteboard drawings and digital files during conferences (see Diamant [0004], which notes a transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant, where transcriptions can be extended to a variety of scenarios to coordinate the conference, facilitate communication among conference participants, record events of interest during the conference, track whiteboard drawings and digital files shared during the conference, and more generally create a robust record of multi-modal interactions among conference participants.). The combination of Wetjen .
Conclusion
8.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDELALI SERROU whose telephone number is (571)272-7638. The examiner can normally be reached M-F 9 Am - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) 

/ABDELALI SERROU/Primary Examiner, Art Unit 2659