Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2018-0113234, filed on 09/20/2018 and KR10-2019-0013855 field on 02/01/2019.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/20/2019 and 03/10/2020 is being considered by the examiner.
Drawings
The drawing submitted on 09/20/2019 is being considered by the examiner.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/07/2021 has been entered.




Allowable Subject Matter
Claims 1-3, 5-14 and 16-22 are allowed.
The following is an examiner’s statement of reasons for allowance: Regarding Claims 1, 7, 12, and 18, the prior art of record, Cutler et al. (US 2018/0218727 A1) teach: A method for providing data for training by a first electronic device, the method comprising: obtaining (captured) first voice data for a voice uttered by a user (user 106a) through the first electronic device (user terminal 102a) ([0003] In one known alternative solution, a transmitting device captures voice information from a user, uses speech recognition to convert the voice information into text, and communicates packets encoding the voice information and the text to a receiving device at a remote location. [0030] By way of illustration, embodiments below may be described from the perspective of the first user 106a sending a message to the second user 106b, but it will be appreciated that similar teachings may also apply in the reverse direction and/or in relation to other combinations of users. [0031] The transmitted audio comprises speech spoken by the sending user 106a, captured from an internal or external microphone at transmitting user terminal 102a. The captured audio and optional video are encoded (i.e. compressed) by the sending client application 103a and transmitted in encoded form over the network 101 to the client 103b on the receiving terminal 102b.); obtaining a first voice recognition result (text) comprising a first text, the first text corresponding to the first voice data (timestamped text data) to a first voice recognition model (voice model 209) ([0035] To accommodate such situations or similar, according to the present disclosure the sending client 103a is configured with a speech-to-text converter to convert the sending user's locally captured speech to the receiving user terminal 102b, e.g. in parallel with the audio but in separate packets. The receiving client 103b is configured with a text-to-speech converter comprising a model of the sending user's voice, arranged to convert the received text to synthesized speech at the receive end based on said model, and play this out to the receiving user 106b. [0036] The transmitting client 103a comprises an encoder 206 and a speech-to-text converter 208, the speech-to-text converter 208 comprises a voice model 209 specific to the voice of the transmitting user 103a. [0038] The speech-to-text converter 208 is arranged so as to also receive an instance of the captured audio, including the sending user's speech, and to convert this into a textual representation of the same speech content based on the model 209 of the sending user's voice as maintained at the transmitting user terminal 102a. [0039] The speech-to-text converter 208 is configured to supply the converted text to the encoder 206 to synchronize this converted text with the corresponding audio from which it was generated. The encoder 206 then sends the timestamped text data over the network 101 via the transmitting interface 210 as part of the session (e.g. call).);  and transmitting the first voice recognition result, to a second electronic device (receiving terminal 102b) which obtained second voice data (synthesis speech) for the voice uttered by the user([0031] The transmitted audio comprises speech spoken by the sending user 106a, captured from an internal or external microphone at transmitting user terminal 102a. The captured audio and optional video are encoded (i.e. compressed) by the sending client application 103a and transmitted in encoded form over the network 101 to the client 103b on the receiving terminal 102b. [0035] To accommodate such situations or similar, according to the present disclosure the sending client 103a is configured with a speech-to-text converter to convert the sending user's locally captured speech to the receiving user terminal 102b, e.g. in parallel with the audio but in separate packets. [0039] The speech-to-text converter 208 is configured to supply the converted text to the encoder 206 to synchronize this converted text with the corresponding audio from which it was generated. The encoder 206 then sends the timestamped text data over the network 101 via the transmitting interface 210 as part of the session (e.g. call). [0044] The transmitter 210, under control of the encoder 206, is arranged to send the converted text data 254, the encoded audio data 250 and (optionally) the encoded video 252 to the receiving user terminal 102b.), wherein the first voice recognition result is used to train (update or adapt or train) a second voice recognition model (receive-side model 209′) corresponding to the second voice data (synthesis speech) ([0055] The receive-side model 209′ comprises one or more parameters of the transmitting user's voice, e.g. the fundamental pitch (i.e. fundamental frequency), the mean of the fundamental pitch, the variance of the fundamental pitch, and/or spectral information such as in the form of linear prediction coefficients or mel cepstral coefficients. In embodiments, values of one or more of these parameters may be provided to the receiving terminal 102b “up front”, either at some point in advance of the session such as when the transmitting and receiving users 106a, 106b first become contacts, or at least at the beginning of the call. For instance, the one or more values of the one or more parameters may be sent by the transmitting client 103a to the receiving client 103b based on the transmit-side model 209 maintained at the transmitting terminal 102a. [0056] Alternatively or additionally, in embodiments, the transmitting client 103a may be configured to send one or more values of one or more of the voice parameters to the receiving client 103b dynamically during the call, along with the audio stream 250. The receiving client 103b can then use these to dynamically adapt the receive-side voice model 209′. For instance, consider the case where, during the call or session, the transmitting user 106a makes one or more manual corrections to the converted text generated by the speech-to-text converter 208 based on the transmit-side model 209. This will result in the transmit-side model 209 being trained so as to update the values of one or more of the voice parameters in the transmit-side model 209. In response, the receiving client 103b updates the values of those parameters in the receive-side model 209′, again during the same session (e.g. call). Thus the receive-side voice model 209′ can be dynamically updated. [0059] Another alternative or additional possibility is that the text-to-speech converter 218 is configured to train the receive-side voice model 209′ dynamically based on the received audio 250 in conjunction with the received text 254 during periods of good network conditions. This embodiment works when text 254 and audio 250 is received for the same content, i.e. same periods of text, such as when the text 254 is sent continually throughout the whole call. Since the text-to-speech converter 218 has access to the audio 250 and text 254 for the same content, it is able to train the receive side model 209′ in much the same way as speech recognition software is trained based on the user reading out predetermined passages pre-known to the conversion software.), and wherein a performance of the second electronic device is lower (poor) than a performance of the first electronic device ([0049] In embodiments the text data 254 may be transmitted throughout the entire session (e.g. call), continually in parallel with the corresponding audio, i.e. so the transmitted content always comprises both audio 250 and text 254 representing the same periods of the session (e.g. call).  In this case the controller 216 is configured to control the text-to-speech converter to only play out the synthesized speech from certain selected portions of the received text during periods of the session (e.g. call), such as when conditions are classified as poor or when selected by the receiving user 106b. For instance, the controller 216 may be configured to feed back a report to the transmitting client 103a informing it when the network conditions experienced at the receive side are classified as poor.).
Culter et al. however alone or in combination of other prior art failed to teach, for claims 1, 7, 12 and 18, “second electronic device which obtained second voice data for the voice uttered by the user, wherein the first voice recognition result is used to train a second voice recognition model for the voice uttered by the user for obtaining a second voice recognition result comprising a second text, the second text corresponding to the second voice data”.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Deisher et al.(US 10325590 B2) teach: A language model is modified for a local speech recognition system using remote speech recognition sources. In one example, a speech utterance is received. The speech utterance is sent to at least one remote speech recognition system. Text results corresponding to the utterance are received from the remote speech recognition system. A local text result is generated using local vocabulary. The received text results and the generated text result are compared to determine words that are out of the local vocabulary and the local vocabulary is updated using the out of vocabulary words.
Park et al.(US 2018/0211665 A1) teach: [0156] As shown in the drawing, if user A 1400 speaks a voice input, only the first electronic device 1410, which is frequently or recently used by user A 1400, may be woken up. The second electronic device 1430, which is frequently or recently used by user B, may not wake up. For example, in a state where the first electronic device 1410 and the second electronic device 1430 are located within a specified distance (e.g., about 20 m), if user A 1400 is located between the first electronic device 1410 and the second electronic device 1430 and speaks a voice input including a specified word, the first electronic device 1410 and the second electronic device 1430 may receive the voice input with the same voice intensity at substantially and almost the same time. In this case, the first electronic device 1410 and the second electronic device 1430 may verify usage history information of user A 1400 to determine whether user A 1400 is a user who frequently use the first electronic device 1410 or the second electronic device 1430. If determining that user A 1400 is the user who frequently use the first electronic device 1410, the first electronic device 1410 may wake up.
Jeon et al.(US 2017/0103755 A1) teach: (Abstract) Disclosed are apparatuses and methods for processing a control command for an electronic device based on a voice agent. The apparatus includes a command tagger configured to receive at least one control command for the electronic device from at least one voice agent and to tag additional information to the at least one control command, and a command executor configured to, in response to the command tagger receiving a plurality of control commands, integrate the plurality of control commands based on additional information tagged to each of the plurality of control commands and to control the electronic device based on a result of the integration. 
BAE et al. (US 2017/0076726 A1) teach: (Abstract) An electronic device, a method for driving the electronic device, a voice recognition device, a method for driving the voice recognition device, and a non-transitory computer readable recording medium are provided. A voice recognition system includes an electronic device configured to selectively transmit a voice signal for voice utterance given by a user to an outside; and a voice recognition device configured to determine, as a recognition result of the transmitted voice signal, the recognition result that satisfies a predetermined condition among recognition results that are obtained by performing parallel processing of the transmitted voice signal through a plurality of voice recognizers and to provide the determined recognition result to the electronic device. [0094] Then, the voice recognition processor 410 may determine whether to process the received voice utterance by itself or to request the recognition result from the voice recognition device 120 of FIG. 1. For this, the voice recognition processor 410 first confirms the utterance length of the voice utterance. If the time period that is determined as a start and an end of the voice utterance is within a predetermined time range, the voice recognition processor 410 may process audio data of the voice utterance using the internal voice recognizers. In contrast, if the time period deviates from the predetermined time range, the voice recognition processor 410 may transmit the audio data of the voice utterance to the voice recognition device 120 through the communication interface 400. [0105] For example, if voice utterance given by a user is received, the controller 520 executes the voice recognition executor 540 and then transfers the voice utterance. Then, the voice recognition executor 540 derives the optimum recognition result for the received voice utterance and provides the derived recognition result to the controller 520 through parallel processing of the received voice utterance using a plurality of voice recognizers. Then, the controller 520 performs various operations on the basis of the corresponding recognition result.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656