Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is responsive to the application filed on 04/07/2021.
Claims 1-23 are pending.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6 and 8-23 are rejected under 35 U.S.C. 103 as being unpatentable over Kramer et al. (US Patent 9,613,624 B1) in view of Zhang et al. (US Patent 10,381,017 B2).
Regarding Claim 1, Kramer teaches a computer-implemented method of inferring phoneme probabilities in speech audio (see Fig.3 and Col.4, Line 35-39), the method comprising:
receiving, at a computing device, a segment of speech (see Fig.3 (304) and Col.5, Line 3-5);
and inferring, using an acoustic model conditioned on the sound embedding, the phoneme probabilities related to the segment of speech (see Fig.3 (318,320), Fig.5 and Col.9, Line 17-37), wherein the sound embedding comprises information of non-phoneme features of the first segment of speech (see Fig.5 and Col.9, Line 28-37).
Kramer fails to teach generating, using an encoder model, a sound embedding from a first segment of speech.
Kramer, however, teaches using an encoder for encoding ASR data including feature vectors (see Fig.3 (322) and Col.7, Line 35-37).
Zhang teaches using a neural network model for obtaining feature vectors corresponding to audio data to be recognized (see Fig.2 (202,203) and Col.5, Line 3-13).
It would have been obvious for one skilled in the art, before the effective filing date of the application, to include to Kramer’s method the step for generating, using an encoder model, a sound embedding from a first segment of speech. The motivation would be to generate the feature vectors or sound embedding information to train the acoustic model to recognize speech and non-speech sounds.
Regarding Claims 2 and 10, Kramer further teaches wherein the first segment of speech corresponds to a key phrase with known phonemes (see Fig.3 (320) and Col.9, Line 17-22).
Regarding Claim 3, Kramer further teaches storing the sound embedding in a memory device associated with the computing device (see Fig.3 (314) and Col.9, Line 17-27, storing during processing of the speech data).
Regarding Claims 4, 13 and 18, Kramer further teaches wherein the information of non-phoneme features includes noise (see Fig.5 and Col.9, Line 28-37).
Regarding Claims 5 and 14, Zhang further teaches wherein the acoustic model is trained on labeled samples of speech audio, each of the labeled samples having a corresponding sound embedding (see Fig.2 (103a) and Col.3, Line 59-67).
Regarding Claims 6 and 15, Zhang further teaches wherein the labeled samples include a multiplicity of voices mixed with a multiplicity of noise profiles, wherein the first segment and the second segment are mixed with the same noise profile for each sample (see Fig.2 (103a) and Col.3, Line 59-67).
Regarding Claims 8, 11 and 22, the rationale provided for the rejection of Claim 1 is incorporated herein.
Regarding Claim 9, Kramer teaches a computer-implemented method of inferring phoneme probabilities in speech audio (see Fig.3 and Col.4, Line 35-39), the method comprising:
receiving, at a computing device, a segment of speech (see Fig.3 (304) and Col.5, Line 3-5);
and inferring, using an acoustic model conditioned on sound embedding, the phoneme probabilities related to the segment of speech (see Fig.3 (318,320), Fig.5 and Col.9, Line 17-37), wherein  the sound embedding includes non-phoneme features of the segment of speech (see Fig.5 and Col.9, Line 28-37).
Kramer fails to teach receiving a sound embedding for a first segment of speech at the acoustic model.
Zhang, however, teaches receiving, at an acoustic model, feature vectors corresponding to audio data to be recognized for a speech segment (see Fig.2 (202,203) and Col.5, Line 3-13).
It would have been obvious for one skilled in the art, before the effective filing date of the application, to include to Kramer’s method the step for receiving a sound embedding for a first segment of speech at the acoustic model. The motivation would be to train the acoustic model to recognize specific audio data such as speech and background noise.
Regarding Claim 16, the rationale provided for the rejection of Claim 1 is incorporated herein.
Regarding Claim 17, Kramer further teaches wherein the outputs of the acoustic model comprise phoneme probabilities of an utterance following the key phrase (see Fig.3 (318,320), Fig.5 and Col.9, Line 17-37).
Regarding Claim 20, Kramer further teaches storing the sound embedding in a memory device associated with the computing device (see Fig.3 (314) and Col.9, Line 17-27, storing during processing of the speech data).; and transmit the stored sound embedding and an audio signal of utterance over a network to another device (see Fig.3 (307) and Col., Line 31-37).
Regarding Claim 21, the rationale provided for the rejection of Claim 1 is incorporated herein.
Regarding Claim 23, Zhang further teaches wherein the acoustic model is trained on labeled samples of speech audio, each of the labeled samples having a corresponding sound embedding (see Fig.2 (103a) and Col.3, Line 59-67); and wherein the labeled samples include a multiplicity of voices mixed with a multiplicity of noise profiles, wherein the first segment and the second segment are mixed with the same noise profile for each sample (see Fig.2 (103a) and Col.3, Line 59-67).
Claims 7, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kramer et al. (US Patent 9,613,624 B1) in view of Zhang et al. (US Patent 10,381,017 B2), and in further view of Li et al. (US Pub. 2020/0066271 A1).
Regarding Claims 7, 12 and 19, Kramer and Zhang teach the method of Claim 1 but fail to teach wherein the encoder model is jointly trained with the acoustic model.
Li, however, teaches jointly training two different models in a neural network system (see Fig.1 (114,116) and paragraph [0069]).
It would have been obvious for one skilled in the art, before the effective filing date of the application, to include to the method of Claim 1 the step for jointly training the model and the acoustic model. The motivation would be to train the models in a single neural network system during the processing of the user speech input data.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VU B HANG whose telephone number is (571)272-0582.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mohamad H. Ghayour, can be reached at (571)272-3021. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VU B HANG/Primary Examiner, Art Unit 2672