DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
 Applicants’ amendment filed on 04/07/21 has been entered. Claims 1, 9, 17 have been amended. No claims have been canceled. No new claims have been added. Claims 1-20 are still pending in this application, with claims 1, 9 and 17 being independent.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 8-13, 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2019/0304470 to Ghaemmaghami et al. (“Ghaemmaghami”) in view of U.S. Patent Application Publication No. 2020/0090651 to Tran et al. (“Tran”).  
As to claims 1, 9 and 17, Ghaemmaghami discloses a method, a computer-readable instructions, a system, the method comprising: receiving an audio signal [paragraphs 0009, 0086, 0095]; obtaining a plurality of multi-dimensional features based on the audio signal 
Ghaemmaghami does not expressly disclose wherein the utterance-level embedding extraction mechanism includes one or more jump connections in a time direction of the utterance-level embedding extraction mechanism.   
In the same or similar field of invention, Tran discloses wherein the utterance-level embedding extraction mechanism includes one or more jump connections in a time direction of the utterance-level embedding extraction mechanism [Tran paragraph 0062-0065]. Tran discloses systems which generate word-embedding of the words in the utterances received from a user. Based on the word-embedding, the disclosed systems can generate vector representations for received utterances [paragraph 0007]. Further, dialog response system generates hidden vector outputs associated with utterances at corresponding time steps 1, 2… M-1. Further, Tran discloses residual connection (similar to jump/skip connection) in a time direction (1, 2...M-1) of the utterance-level embedding [see paragraphs 0062-0065]. 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Ghaemmaghami to have feature of having the utterance-level embedding extraction mechanism includes one or more jump connections in a time direction of the utterance-level embedding extraction mechanism as taught by Tran.  The suggestion/motivation would have been to utilize an independent gate context-dependent Tran paragraph 0007].  
As to claims 2, 10 and 18, Ghaemmaghami discloses wherein the audio signal includes speeches from a plurality of speakers with noise interference [Abstract, paragraphs 0008, 0024: “non-speech”, 0031, 0068, 0086, 0073, 0108, 0144: “for example non-speech characteristics may be background noises…”].
As to claims 3 and 11, Ghaemmaghami discloses wherein after receiving the audio signal, the method further comprises: dividing the audio signal into a plurality of frames using a predetermined window size and a predetermined window shift; and grouping the plurality of frames into a plurality of segments, a respective segment of the plurality of segments including a predetermined number of frames [paragraphs 0098-0099, Figs. 5a-5c].
As to claims 4 and 12, Ghaemmaghami discloses wherein after obtaining the plurality of multi- dimensional features based on the audio signal, the method further comprises normalizing the plurality of multi-dimensional features [paragraphs 0096, 0183].
As to claims 5 and 13, Ghaemmaghami discloses wherein obtaining the plurality of multi- dimensional features based on the audio signal includes converting the plurality of frames into a plurality of multi-dimensional filter bank features [paragraphs 0100-0103].
As to claims 7 and 15, Tran discloses wherein the utterance-level embedding extraction mechanism includes a Recurrent Neural Network (RNN) [Tran paragraphs 0021, 0024-0025]. In addition, the same motivation is used as the rejection of claims 1 and 9.
As to claims 8 and 16, Ghaemmaghami discloses wherein recognizing the speaker from the audio signal based on the utterance-level representation includes classifying the utterance- 
As to claim 19, Ghaemmaghami discloses wherein the data preparation module is further configured to: divide the audio signal into a plurality of frames using a predetermined window size and a predetermined window shift; group the plurality of frames into a plurality of segments, a respective segment of the plurality of segments including a predetermined number of frames [paragraphs 0098-0099, Figs. 5a-5c]; convert the plurality of frames into a plurality of multi-dimensional filter bank features [paragraphs 0100-0103]; and normalizing the plurality of multi-dimensional filter bank features [paragraphs 0096, 0183].	


Claims 6, 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2019/0304470 to Ghaemmaghami et al. (“Ghaemmaghami”) and U.S. Patent Application Publication No. 20200090651 to Tran et al. (“Tran”) in further view of U.S. Patent Application Publication No. 20170125020 to Seo et al. (“Seo”).  
As to claims 6 and 14, Ghaemmaghami and Tran disclose the method of claim 1 and the computer-readable storage medium of claim 9 [see rejection of claims 1 and 9].
Ghaemmaghami and Tran do not expressly disclose wherein obtaining the plurality of segment-level representations based on the plurality of multi-dimensional features is performed by a Convolutional Neural Network (CNN).   
In the same or similar field of invention, Seo discloses wherein obtaining the plurality of segment-level representations based on the plurality of multi-dimensional features is performed by a Convolutional Neural Network (CNN) [Seo paragraphs 0055, 0073, 0075, 0076].
Ghaemmaghami and Agarwal to have feature of wherein obtaining the plurality of segment-level representations based on the plurality of multi-dimensional features is performed by a Convolutional Neural Network (CNN) as taught by Seo.  The suggestion/motivation would have been to improve the voice recognition rate by controlling the learning rates of the first voice recognition unit and the second voice recognition unit [Seo paragraph 0095].  
As to claim 20, Ghaemmaghami and Agarwal disclose the system of claim 17 [see rejection of claim 17]. Further, Tran discloses the utterance-level embedding extraction module includes a Recurrent Neural Network (RNN) [Tran paragraphs 0021, 0024-0025]. 
Ghaemmaghami and Tran do not expressly disclose wherein the segment-level embedding extraction module includes a Convolutional Neural Network (CNN).
In the same or similar field of invention, Seo discloses wherein the segment-level embedding extraction module includes a Convolutional Neural Network (CNN) [Seo paragraphs 0055, 0073, 0075, 0076].
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Ghaemmaghami and Tran to have features of wherein the segment-level embedding extraction module includes a Convolutional Neural Network (CNN) as taught by Seo.  The suggestion/motivation would have been to improve the voice recognition rate by controlling the learning rates of the first voice recognition unit and the second voice recognition unit [Seo paragraph 0095].  
	
	Conclusion

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ANTIM G SHAH/Primary Examiner, Art Unit 2652