DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) were submitted on 05/14/2020, 06/03/2020, 11/03/2020, 08/27/2021 and 01/13/2022. The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: Fig. 20, reference 2022 of the drawings is not.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-5, 13-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Pfeffinger et al. (WO 2017/217978 A1) in view of Min et al. (US 2017/0018270 A1).  
 
Regarding claim 1: 
Pfeffinger discloses A method, performed by an electronic device (¶0070), of providing a voice recognition service, the method comprising:
obtaining a user call keyword configured to activate the voice recognition service, based on a first user voice input; Pfeffinger discloses receiving a first wake-up word (call keyword) from a driver (first user) to activate a speech-enabled navigation application (voice recognition service) (Pfeffinger ¶0025).
obtaining a user-customized feature by inputting an audio signal of the user-customized voice DB to a wake-up recognition module; Pfeffinger discloses interpreting the acoustic signal at least in part by determining, using the information indicative of the speaker's identity and automated speech recognition, whether the utterance spoken by the speaker includes the at least one designated wake-up word; and interacting with the speaker based, at least in part, on results of the interpreting (Pfeffinger ¶0003 and 0040).

Pfeffinger does not explicitly disclose:
generating a user-customized voice database (DB) by inputting the obtained user call
keyword to a text to speech (TTS) module;

However, in an analogous art, Min discloses:
generating a user-customized voice database (DB) by inputting the obtained user call
keyword to a text to speech (TTS) module; Min teaches the creation of user training data (databases) by way of performing Text-To-Speech (TTS) based on the user’s real speech signals (Min ¶0075 and Fig. 3).
Therefore, it would have been obvious to one of ordinary skill in the art, to incorporate the disclosed teaching of Min to that of Pfeffinger, because this would reduce the time and costs to train, as well as improve the accuracy of the current speech/acoustic models by implementing personalized or targeted acoustic models (Min 0018-0019, 0023).

Regarding Claim 2:
Pfeffinger in view of Min further discloses obtaining a voice DB related to a call keyword stored before the user call keyword is obtained, to activate the voice recognition service; and generating the user-customized voice DB by inputting the obtained voice DB and the user call keyword to the TTS module (View Min ¶0031 and Pfeffinger ¶0018-0019).

Regarding Claim 3:
Pfeffinger in view of Min further discloses obtaining an audio signal based on a second user voice input; obtaining an output value of the wake-up recognition module from the wake-up recognition module by inputting the obtained audio signal to the wake-up recognition module; and activating the voice recognition service, based on a result of a comparison between the obtained output value of the wake-up recognition module and the user-customized feature (View Pfeffinger ¶0025 and 0045).

Regarding Claim 4:
Pfeffinger in view of Min further discloses the method of claim 2, wherein the TTS module is configured to change audio signals of the user-customized voice DB, based on at least one acoustic feature for uttering the call keyword that is obtained from a speaker voice model including acoustic features of a plurality of speakers (View Min ¶0063).

Regarding Claim 5:
Pfeffinger in view of Min further discloses generating similar keywords that are similar to the user call keyword, by using a prestored language model; generating a similar voice DB by inputting the generated similar keyword to the TTS module; and refining layers within the wake-up recognition module and attention related to a connection strength between the layers, based on audio signals of the similar voice DB and the user-customized voice DB (View Pfeffinger ¶0039 and Min ¶0011 and 0014).

Regarding Claim 13:
Pfeffinger discloses an electronic device (¶0070) for providing a voice recognition service, the electronic device comprising;
a memory storing one or more instructions: Pfeffinger discloses the processor 510 may execute one or more processor-executable instructions stored in one or more non-transitory computer-readable storage media (e.g., the memory 520), which may serve as non-transitory computer-readable storage media storing processor-executable instructions for execution by the processor 510 (Pfeffinger ¶0068).
obtain a user call keyword for activating the voice recognition service, based on a first user voice input; Pfeffinger discloses receiving a first wake-up word (call keyword) from a driver (first user) to activate a speech-enabled navigation application (voice recognition service) (Pfeffinger ¶0025).
obtain a user-customized feature by inputting an audio signal of the user-customized voice DB to a wake-up recognition module; Pfeffinger discloses interpreting the acoustic signal at least in part by determining, using the information indicative of the speaker's identity and automated speech recognition, whether the utterance spoken by the speaker includes the at least one designated wake-up word; and interacting with the speaker based, at least in part, on results of the interpreting (Pfeffinger ¶0003 and 0040).

Pfeffinger does not explicitly disclose:
generate a user-customized voice database (DB) by inputting the obtained user call keyword to a text to speech (TTS) module;

However, in an analogous art, Min discloses:
generate a user-customized voice database (DB) by inputting the obtained user call keyword to a text to speech (TTS) module; Min teaches the creation of user training data (databases) by way of performing Text-To-Speech (TTS) based on the user’s real speech signals (Min ¶0075 and Fig. 3).
Therefore, it would have been obvious to one of ordinary skill in the art, to incorporate the disclosed teaching of Min to that of Pfeffinger, because this would reduce the time and costs to train, as well as improve the accuracy of the current speech/acoustic models by implementing personalized or targeted acoustic models (Min 0018-0019, 0023).

Regarding Claim 14:
Pfeffinger in view of Min further discloses the electronic device of claim 13, wherein the processor is further configured to execute the one or more instructions to: obtain a voice DB related to a call keyword previously stored before the user call keyword is obtained, in order to activate the voice recognition service, and generate the user-customized voice DB by inputting the obtained voice DB and the user call keyword to the TTS module (View Min ¶0031 and Pfeffinger ¶0018-0019).

Regarding Claim 15:
Pfeffinger in view of Min further discloses the electronic device of claim 14, wherein the processor is further configured to execute the one or more instructions to: obtain an audio signal based on a second user voice input, obtain an output value of the wake-up recognition module from the wake-up recognition module by inputting the obtained audio signal to the wake-up recognition module, and activate the voice recognition service, based on a result of a comparison between the obtained output value of the wake-up recognition module and the user-customized feature (View Pfeffinger ¶0025 and 0045).

Regarding Claim 16:
Pfeffinger in view of Min further discloses the electronic device of claim 14, wherein the TTS module is configured to change audio signals of the user-customized voice DB, based on at least one acoustic feature for uttering the call keyword that is obtained from a speaker voice model including acoustic features of a plurality of speakers (View Min ¶0063).

Regarding Claim 17:
Pfeffinger in view of Min further discloses the electronic device of claim 13, wherein the processor is further configured to execute the one or more instructions to: generate similar keywords that are similar to the user call keyword, by using a prestored language model,
generate a similar voice DB by inputting the generated similar keyword to the TTS
module, and refine layers within the wake-up recognition module and attention related to a
connection strength between the layers, based on audio signals of the similar voice DB and the
user-customized voice DB (View Pfeffinger ¶0039 and Min ¶0011 and 0014).

Regarding Claim 20:
Pfeffinger discloses a non-transitory computer-readable recording medium having recorded thereon a program (¶0068) which, when executed by a computer system, causes the computer system to perform a method of providing a voice recognition service, the method comprising;
obtain a user call keyword for activating the voice recognition service, based on a first user voice input; Pfeffinger discloses receiving a first wake-up word (call keyword) from a driver (first user) to activate a speech-enabled navigation application (voice recognition service) (Pfeffinger ¶0025).
obtaining a user call keyword for activating the voice recognition service, based on a first user voice input; Pfeffinger discloses receiving a first wake-up word (call keyword) from a driver (first user) to activate a speech-enabled navigation application (voice recognition service) (Pfeffinger ¶0025).
obtain a user-customized feature by inputting an audio signal of the user-customized voice DB to a wake-up recognition module; Pfeffinger discloses interpreting the acoustic signal at least in part by determining, using the information indicative of the speaker's identity and automated speech recognition, whether the utterance spoken by the speaker includes the at least one designated wake-up word; and interacting with the speaker based, at least in part, on results of the interpreting (Pfeffinger ¶0003 and 0040).

Pfeffinger does not explicitly disclose:
generating a user-customized voice database (DB) by inputting the obtained user call keyword to a text to speech (TTS) module;

However, in an analogous art, Min discloses:
generating a user-customized voice database (DB) by inputting the obtained user call keyword to a text to speech (TTS) module; Min teaches the creation of user training data (databases) by way of performing Text-To-Speech (TTS) based on the user’s real speech signals (Min ¶0075 and Fig. 3).
Therefore, it would have been obvious to one of ordinary skill in the art, to incorporate the disclosed teaching of Min to that of Pfeffinger, because this would reduce the time and costs to train, as well as improve the accuracy of the current speech/acoustic models by implementing personalized or targeted acoustic models (Min 0018-0019, 0023).

Claims 6, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Pfeffinger et al. (WO 2017/217978 A1) in view of Min et al. (US 2017/0018270 A1) and further in view of Parthasarathi (US 2017 /0270919 A1).

Regarding Claim 6:
Pfeffinger in view of Min discloses the method of claim 3. However, Pfeffinger in view of Min, fails to explicitly disclose the claimed, wherein the obtaining the audio signal comprises: determining a window length of a window for division in units of frames;
overlapping windows each having the determined window length at a certain window interval; 
dividing the obtained audio signal into a plurality of frames by using the overlapped windows.

However, in an analogous art, Parthasarathi discloses:
determining a window length of a window for division in units of frames; Parthasarathi discloses the use of sliding windows for digitized audio data that is divided into audio frames and used to represent various time intervals (Parthasarathi ¶0049 and ¶0112).
overlapping windows each having the determined window length at a certain window interval; Parthasarathi discloses, for one configuration, each audio frame includes 25 ms of audio and the frames start at 10 ms intervals resulting in a sliding window where adjacent audio frames include 15 ms of overlapping audio (Parthasarathi ¶0049).
dividing the obtained audio signal into a plurality of frames by using the overlapped windows.
Parthasarathi discloses the use of sliding windows for digitized audio data that is divided into audio frames and used to represent various time intervals of overlapping audio (Parthasarathi ¶0049).
Therefore, it would have been obvious to one of ordinary skill in the art, to incorporate the disclosed teaching of Parthasarathi to that of Pfeffinger in view of Min because this would reduce the noise of the audio data, enhance the quality metrics and improve the current Automatic Speech Recognition (ASR) capabilities (Parthasarathi ¶0049).

Regarding Claim 12:
Pfeffinger, hereinafter, in combination with Min and Parthasarathi, further discloses the method of claim 1, further comprising; generating a user-customized voice model, based on the user-customized feature; and storing the generated user-customized voice model. (View Parthasarathi ¶0146)

Regarding Claim 18:
Pfeffinger in view of Min discloses the electronic device of claim 13. However, Pfeffinger in view of Min, fails to explicitly disclose the claimed, wherein the processor is further configured
to execute the one or more instructions to:
determine a window length of a window for division in units of frames,
overlap windows each having the determined window length at a certain window interval, and divide the obtained audio signal into a plurality of frames by using the overlapped windows.

However, in an analogous art, Parthasarathi discloses:
determine a window length of a window for division in units of frames; Parthasarathi discloses the use of sliding windows for digitized audio data that is divided into audio frames and used to represent various time intervals (Parthasarathi ¶0049 and ¶0112).
overlap windows each having the determined window length at a certain window interval. Parthasarathi discloses, for one configuration, each audio frame includes 25 ms of audio and the frames start at 10 ms intervals resulting in a sliding window where adjacent audio frames include 15 ms of overlapping audio (Parthasarathi ¶0049).
overlap windows each having the determined window length at a certain window interval, and divide the obtained audio signal into a plurality of frames by using the overlapped windows.
Parthasarathi discloses the use of sliding windows for digitized audio data that is divided into audio frames and used to represent various time intervals of overlapping audio (Parthasarathi ¶0049).
Therefore, it would have been obvious to one of ordinary skill in the art, to incorporate the disclosed teaching of Parthasarathi to that of Pfeffinger in view of Min because this would reduce the noise of the audio data, enhance the quality metrics and improve the current Automatic Speech Recognition (ASR) capabilities (Parthasarathi ¶0049).

Allowable Subject Matter

8.	Claims 7-11 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
 Chen et al. (US 2016/0293168 Al) discloses the creation of a personal, customized wake-up word by text for the purposes of activating voice control thus enabling an electronic device. Chen further discloses the use of a second microphone with the reception of different voices. 
Prasad et al. (US 9,697,828 Bl) discloses the utterance of a personalized keyword (“wake word”) derived from audio signals, contextual information or from a second user’s audio signal.   Prasad further discloses the use of acoustic, environmental and contextual features and Automatic Speech Results (ASR) and Natural Language Understanding (NLU) results to detect words. Prasad further discloses training these detected words and data.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DERRICK SCOTT JEFFERIES whose telephone number is (571)272-0923. The examiner can normally be reached 7:30a-4:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DERRICK SCOTT JEFFERIES/Examiner, Art Unit 2658                                                                                                                                                                                                        
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658