Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2019-0072337, filed on 06/18/2019.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/21/2020 is being considered by the examiner.
Drawings
The drawing submitted on 04/20/2020 is being considered by the examiner.
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with applicant representative Mr. Norman Lee on 09/23/2021 through voice message based on the examiner request on 09/20/2021.
The application has been amended as follows: Claims 10-12 has been cancelled.
Allowable Subject Matter
Claims 1-9 and 13-16 are allowed.
The following is an examiner’s statement of reasons for allowance: The pertinent art of record XU et al. (CN 111508475 A1) teach: (Abstract) The invention discloses a voice keyword recognition method and device for robot awakening and a storage medium, and the method comprises the steps: obtaining a voice sample, carrying out voice enhancement, and obtaining a voice enhancement sample; performing framing processing on the voice enhancement sample to obtain a voice frame; carrying out denoising processing on the voice frame to obtain voice sample features; inputting the voice sample features into a deep neural network model for training, wherein each neural unit of an output layer of the deep neural network model obtains a posterior probability; combining the posterior probabilities into a posterior probability sequence, comparing the posterior probability sequence with a preset threshold probability sequence, and determining keywords of the input voice. The problem that the keyword recognition robustness is low due to the fact that data are few or unbalanced and are affected by environmental noise is solved, the voice keyword recognition accuracy is improved, and therefore the work efficiency of waking up a robot is improved.
The prior art of record Wu et al. (US 2021/0141818 A1) teach: [0100] At 1050, the personality specific text such as the age and gender specific text, with the emotional labels and logic tunes attached, is sent to a TTS module to be transformed into a voice conforming to the style of the chatbot. For example, if the chatbot is designed as a 17 year old high school girl, the TTS module is trained with training voices read by similar style people. Therefore the voice transformed from the text may sounds like a teenager girl. The voices and/or the personality specific texts used to generate the voices may be stored in the voice database 290, where each voice and/or each text is stored as a knowledge message in association with a keyword list which may be used to trigger the verbal presentation of the knowledge message.
The prior art of record KIMURA (US 2017/0352351 A1) teach: [0020] A TOF sensor 5 and the camera 7 are image system sensors. The TOF sensor 5 and the camera 7 can capture surroundings where the communication robot itself is placed, the position and facial expression of a person, and the like, individually or in cooperation with each other. For example, the TOF sensor 5 can easily acquire shape information and distance information of an object at the same time. The object includes, for example, a person such as adult, infant, and toddler. The communication robot may read facial expression of a person and a direction of his/her eyes using, for example, the camera 7 in addition to so-called face recognition. The communication robot may acquire, for example, an approaching distance at any time with the TOF sensor 5. The communication robot can calculate a moving speed from a difference in distance to an approaching person and an elapsed time. The difference in distance includes, for example, a difference between the distance measured last time and the distance measured this time. The elapsed time includes, for example, the elapsed time from the previous measurement to the current measurement. The communication robot can distinguish whether the approach is, for example, running up thereto, slow approaching, or approaching with caution based on the calculated moving speed. The communication robot can also store the result of distinction as the approach information in the storage 8. The communication robot may acquire, for example, an image and distance information from the camera 7. In this case, the distance information can be easily acquired by autofocus technology. The communication robot may monitor the front thereof by the camera 7 and the rear thereof by the TOF sensor 5. One or more of either the camera 7 or the TOF sensor 5 may be used for the communication robot. [0021] The microphone 6 can collect surrounding sounds and sounds emitted by people. For example, the communication robot cannot predict from which direction a person talks to the communication robot. For example, the communication robot does not know in advance from which direction a person utters the voice with respect to the sound emitted by the communication robot. Therefore, the communication robot is preferably provided with the microphone 6 having wide directivity beforehand. Moreover, the communication robot may arrange the microphone 6 on the left and the right of the face (i.e., the display 10). For example, the communication robot cannot capture a person behind the communication robot with the image system sensor such as the camera 7 and the TOF sensor 5. However, the communication robot can recognize the direction of the person behind the robot by the sound image localization by microphones 6 provided on the left and the right thereof.
The pertinent  art of record Raffle (US 2020/0092625 A1) teach: [0034] In some implementations, the cover 200 may respond to a keyword that corresponds to its character such as, for example, a name, its facial features, and the like. In some implementations, the keyword for the cover 200 may be previously set. That is, in some implementations, the cover 200 may be provided to the user with the keyword already set, or with the cover 200 already named. In some implementations, the user may set, or reset, the keyword for the cover 200 based on user preferences. In setting the keyword, the user may, for example, speak the desired keyword, or name, for detection by one of the audio sensor(s) 271 of the cover 200, to train the audio sensor(s) 271 of the electronics module 270 to listen for and detect the keyword, or name of the cover 200 spoken by the user. In some situations, setting, or resetting, the keyword, or name, in this manner may allow the interaction to be personalized for a specific user. That is, setting/resetting the keyword/name in this manner may cause the cover 200 to respond (i.e., the electronics module 270 to control the motor(s) 274 and/or the light source(s) 276 to animate and/or illuminate the cover 200, and/or the audio output device(s) 278 to output audio content, as described above) only when the keyword/name is spoken by the specific user. In some implementations, the keyword/name may be set/reset in response to text entered by the user, and translated into the keyword/name to be detected by the electronics module 270 of the cover 200.
The prior art of record Yoon et al (Us 2020/0225344 A1) teach: [0028] The example embodiments relate to technology for reflection-aware sound source /sound localization, and more particularly, to technology for consecutively detecting an intermittent sound source occurring in a single frame and estimating, that is, localizing a three-dimensional (3D) position of a sound source considering indirect sound as well as direct sound. For example, the example embodiments relate to technology for estimating a 3D position of a sound source in a 3D scene representing an indoor space, that is, a specific scene captured from a 3D video, by tracing a propagation and reflection path of an audio signal received through a microphone array of a robot. [0066] According to example embodiments, a 3D position of a sound source, that is, an audio signal may be further accurately estimated in an indoor space by consecutively detecting an intermittent sound source occurring in a single frame based on direct sound output from the sound source and indirect sound reflected by an object, such as walls and ceiling. Also, by accurately estimating a 3D position of a corresponding sound source from sound, for example, speech and footstep, of an object, for example, a thing and a person, present around, for example, a robot, the robot may further accurately determine a specific person that is speaking when the specific person is conversing with the robot in an indoor environment in which a plurality of persons is present. In addition, in a region in which a visual sensor, for example, a camera, a red, green, blue (RGB)-D camera, and a laser scanner, of the robot does not properly function, a position of specific sound may be traced.
The prior art of record Cui et al.(US 2019/0206400 A1) teach: [0050] In some embodiments, an adaptive interactive cognitive reasoning engine (AICoRE) includes a human-like agent reasoner that functions as the human mind in a robot with the ability to perceive, reason, remember, and respond. The AICoRE utilizes learning, problem solving, and automated reasoning to interact with humans. An AICoRE server can be used for implementing speech communication as well as to understand and respond to visual information. For example, an AICoRE module can be implemented as a general conversation agent that can also use visual information. The AICoRE module can provide feedback to a robot in a variety of scenarios based on the input received. Example scenarios include providing feedback in the event a user talks to a robot in a retail environment, a user shows a photo to the robot in a retail environment, a user asks the robot questions, and a user commands the robot to perform an action, among others. An AICoRE module can provide analysis and feedback to support robot hardware status changes, robot movement in a retail environment including movement for creating a map of the environment, and robot scans the environment, among others.  
The prior art of record Wang et al. (US 2020/0151503 A1) teach: [0016] Text recognition systems can be used in many different situations to recognize text in an image, such as autonomous driving (e.g., by recognizing street signs and guiding an automobile), robots (e.g., autonomous parking lot attendants), drones, aiding visually-impaired persons, keyword parsing of a document, advertisement recommendation for mobile clients, and the like. Text recognition systems are often trained with training datasets that contain large numbers of synthetically-generated text images, e.g., noisy images including text perturbed by one or more nuisance factors, such as compression artifacts, additive noise processes, geometric distortion, and the like. For instance, a training dataset may include hundreds of thousands of noisy images to train a convolutional neural network of a text recognition system with each training class corresponding to a word of the English language. Hence, training a text recognition system can require significant resources, such as manual resources to design and select a training dataset, and computer resources to process the images of the training dataset to train the text recognition system. Moreover, text recognition systems are often not robust to some nuisance factors, such as compression artifacts and geometric deformation of text, and adding additional training images to a training dataset for these nuisance factors exacerbates the amount of resources needed when using large training datasets.
The prior art of records alone or in combination failed to teach the claims 1 and 13, “obtaining a first set of voice triggers from the input text via voice synthesis; obtaining a second set of voice triggers by applying a first filter in accordance with an environmental factor to the first set of voice triggers; obtaining a third set of voice triggers by applying a second filter in accordance with a mechanism characteristic of the robot to the second set of voice triggers; and applying the first set of voice triggers, the second set of voice triggers, and the third set of voice triggers to the trigger recognition model as learning data for the voice trigger”.
 Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MOHAMMAD K ISLAM/ Primary Examiner, Art Unit 2656