DETAILED ACTION

This communication is in response to the Application filed on 21 December 2022. 
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending and have been examined, and the Examiner determines that this action is in condition for Allowance. 


EXAMINER’S AMENDMENT
The applicant’s representative, Patrick Leonard, has agreed to the following claim amendments.

Please amend claim 1 as follows:
A speech conversion system, comprising:  
a processor; and
memory storing instructions executable by the processor, the instructions comprising, to:
	determine a first set of encoder vectors corresponding to human speech by inputting a spectrogram corresponding to human speech to a first encoder preprocessing neural network;
	using a first recurrent neural network (RNN) (GRU0) and the preprocessed encoder vectors as input to the first RNN, determine a first concatenated sequence;
the first set of encoder vectors derived from a spectrogram corresponding to human speech as input to the second RNN, determine a second concatenated sequence;
determine a second set of encoder vectors by doubling a stack height and halving a length of the second concatenated sequence;
using the second set of encoder vectors, determine a third set of encoder vectors; and
decode the third set of encoder vectors using an attention block.


Please amend claim 2 as follows:
The system of claim 1, wherein the instructions further comprise to, prior to determining the second concatenated sequence[[:]],

determine the first set of encoder vectors by doubling a stack height and halving a length of the first concatenated sequence.

Please amend claim 15 as follows:
A method of speech conversion, comprising: 
	determine a first set of encoder vectors corresponding to human speech by inputting a spectrogram corresponding to human speech to a first encoder preprocessing neural network;
using a first recurrent neural network (RNN) (GRU0) and the preprocessed encoder vectors as input to the first RNN, determine a first concatenated sequence;
using a second the first set of encoder vectors derived from a spectrogram corresponding to human speech as input to the second RNN, determining a second concatenated sequence;
determining a second set of encoder vectors by doubling a stack height and halving a length of the second concatenated sequence;
using the second set of encoder vectors, determining a third set of encoder vectors; and
decoding the third set of encoder vectors using an attention block.

Please amend claim 16 as follows:
The method of claim 15, further comprising, prior to determining the second concatenated sequence[[:]],

determining the first set of encoder vectors by doubling a stack height and halving a length of the first concatenated sequence.


Reasons for Allowance
The following is a statement of reasons for the indication of allowable subject matter:

The closest prior art of record includes US 20200258496 (Yang et al.), US 20200250794 (Zimmer et al.), “Sequence-to-Sequence Acoustic Modeling for Voice Conversion” (Zhang et al.), and “Representation Learning for Speech Emotion Recognition” (Ghosh et al.). Yang et al. using a second recurrent neural network (RNN) (GRU1) and a first set of encoder vectors derived from a spectrogram as input to the second RNN (Yang et al., para [0041] and para [0046]), determine a second concatenated sequence (Yang et al., para [0051]); using the second set of encoder vectors, determine a third set of encoder vectors (Yang et al., para [0007] and para [0075]); and decode the third set of encoder vectors using an attention block (Yang et al., para [0041] and para [0070]).
Yang et al., though, does not disclose determining a second set of encoder vectors by doubling a stack height and halving a length of the second concatenated sequence. Zimmer et al. is cited to disclose determining a second set of encoder vectors by doubling a stack height and halving a length of the second concatenated sequence (Zimmer et al., para [0267]).
Lastly, Zhang et al. teaches encoding a speech spectrogram as part of the conversion process of a sequence-to-sequence voice conversion method (Zhang et al., fig. 1), while Ghosh et al. teaches encoding a speech spectrogram as performed by a stacked autoencoder and an RNN.


.


Conclusion
The prior art made of record and not relied upon is considered pertinent to the Applicant’s disclosure. See attached PTO 892 from this and previous office action(s).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 5712727453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.