DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ehara et al. (US 2020/0294512 A1), hereinafter referred to as Ehara, in view of Tzinis et al. (Tzinis, E., Venkataramani, S., Wang, Z., Subakan, C., & Smaragdis, P. (2020, May). Two-step sound source separation: Training on learned latent targets. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 31-35). IEEE.), hereinafter referred to as Tzinis, and further in view of Fotopoulou et al. (US 2021/0104249 A1), hereinafter referred to as Fotopoulou.

Regarding claim 1, Ehara teaches:
A method of encoding a speech signal, the method comprising:
determining a number of bits used for quantization of each of the plurality of sound source signals according to a type of each of the plurality of sound sources (Fig. 13 element 703, para [0122], where bit allocation is performed for the separated signals); 
quantizing each of the plurality of sound source signals based on the determined number of bits (Fig. 2 element 105, para [0049], where quantization is performed); and 
generating a bitstream by combining the plurality of quantized sound source signals (para [0051], where the bitstreams are multiplexed and transmitted).  
Ehara does not teach:
identifying an input signal for a plurality of sound sources; 
generating a latent signal by encoding the input signal; 
obtaining a plurality of sound source signals by separating the latent signal for each of the plurality of sound sources;
Tzinis teaches:
identifying an input signal for a plurality of sound sources (page 31 section 2, where the identified input is a mixture consisting of N sources); 
generating a latent signal by encoding the input signal (page 31-32 section 2.1, where an encoder obtains a latent representation of the input signal); 
obtaining a plurality of sound source signals by separating the latent signal for each of the plurality of sound sources (page 31-32 section 2.1, where the masks and latent representation are multiplied to determine an estimate for each source);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Ehara by applying the source separation of Ehara (Ehara para [0045]) to a latent signal as taught by Tzinis (Tzinis page 31-32 section 2.1), in order to achieve a consistent performance improvement (Tzinis page 1 last paragraph of section 1).
Ehara in view of Tzinis does not explicitly teach that quantization is performed on the near field signal source, but does teach in Ehara para [0047] that an existing coding method may be used for coding an acoustic signal component x.
Fotopoulou para [0071] teaches quantization of each signal being performed using an assigned bit budget.
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (coding method) with other components (quantization); the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Regarding claim 2, Ehara in view of Tzinis and Fotopoulou teaches:
The method of claim 1, wherein the obtaining of the plurality of sound source signals comprises: 
determining a masking vector for each of the plurality of sound sources (Tzinis page 31-32 section 2.1, where a masking vector is determined for each source); and 
determining the plurality of sound source signals from the latent signal using the masking vector (Tzinis page 31-32 section 2.1, where the masks and latent representation are multiplied to determine an estimate for each source).  

Claim(s) 3-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ehara, in view of Tzinis, and Fotopoulou, and further in view of Rickard (Rickard, S. (2006, September). Sparse sources are separated sources. In 2006 14th European signal processing conference (pp. 1-5). IEEE.).

Regarding claim 3, Ehara in view of Tzinis and Fotopoulou teaches:
The method of claim 2
Ehara in view of Tzinis and Fotopoulou does not teach:
wherein the determining of the plurality of sound source signals comprises separating the latent signal so that the plurality of sound source signals are orthogonal to each other, using the masking vector.
Rickard teaches:
wherein the determining of the plurality of sound source signals comprises separating the latent signal so that the plurality of sound source signals are orthogonal to each other, using the masking vector (Page 2 section 2, where the sources are orthogonal and determined using the binary masking vector).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Ehara in view of Tzinis and Fotopoulou by using the binary, orthogonal masking vectors of Rickard (Rickard page 2 section 2) as the masking vectors of Ehara in view of Tzinis and Fotopoulou (Tzinis pages 31-32 section 2.1), in order to obtain an optimal mask for demixing (Rickard page 3 first full paragraph).

Regarding claim 4, Ehara in view of Tzinis and Fotopoulou teaches:
The method of claim 2, wherein the masking vector is a  vector determined based on probabilities for each of the plurality of sound sources (Tzinis page 31-32 section 2.1, where a masking vector is determined using softmax, which uses probabilities).  
Ehara in view of Tzinis and Fotopoulou does not teach:
wherein the masking vector is a binary vector;
Rickard teaches:
wherein the masking vector is a binary vector (Page 2 section 2 equation 9, where the masking vector is binary);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Ehara in view of Tzinis and Fotopoulou by using the binary, orthogonal masking vectors of Rickard (Rickard page 2 section 2) as the masking vectors of Ehara in view of Tzinis and Fotopoulou (Tzinis pages 31-32 section 2.1), in order to obtain an optimal mask for demixing (Rickard page 3 first full paragraph).

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ehara, in view of Tzinis, and Fotopoulou, and further in view of Henderson (US 2021/0141798 A1).

Regarding claim 5, Ehara, in view of Tzinis, and Fotopoulou teaches:
The method of claim 1
Ehara, in view of Tzinis, and Fotopoulou does not teach:
wherein the quantizing of each of the plurality of sound source signals comprises quantizing each of the plurality of sound source signals using softmax.  
Henderson teaches:
wherein the quantizing of each of the plurality of sound source signals comprises quantizing each of the plurality of sound source signals using softmax (para [0219], where the quantization uses softmax).  
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (quantization) with other components (quantization using softmax); the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Claim(s) 6 and 8-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ehara, in view of Rothberg et al. (US 10,956,787 B2), hereinafter referred to as Rothberg, and further in view of Fotopoulou.

Regarding claim 6, Ehara teaches:
A method of encoding a speech signal, the method comprising: 
identifying an input signal for a plurality of sound sources (Fig. 2 element 102, para [0042], where an acoustic signal is received from a microphone array); 
obtaining a plurality of quantized sound source signals by inputting the input signal to an encoding model (Fig. 2 elements 103-105, para [0047], [0049], where the object and noise signals are encoded); and 
generating a bitstream by combining the plurality of quantized sound source signals (para [0051], where the bitstreams are multiplexed and transmitted),
wherein the encoding model obtains a plurality of sound source signals by separating a latent signal of the input signal for each of the plurality of sound sources, and to quantize each of the plurality of sound source signals according to a type of each of the plurality of sound sources (Fig. 2 element 101, para [0045], where the sound source estimation unit receives the input acoustic signal and produces a latent signal to input to element 102).  
Ehara does not teach:
wherein the encoding model is trained;
Rothberg teaches:
wherein the encoding model is trained (col. 21 lines 47-61, where parameters are updated);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Ehara by performing training, as taught by Rothberg (Rothberg col. 21 lines 47-61) on the model of Ehara (Ehara Fig. 2) by using backpropagation, so that parameters may updated during the execution of a process (Rothberg col. 20 lines 61-67).
Ehara in view of Rothberg does not explicitly teach that quantization is performed on the near field signal source, but does teach in Ehara para [0047] that an existing coding method may be used for coding an acoustic signal component x.
Fotopoulou para [0071] teaches quantization of each signal being performed using an assigned bit budget.
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (coding method) with other components (quantization); the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Regarding claim 8, Ehara in view of Rothberg and Fotopoulou teaches:
The method of claim 6, wherein the encoding model is trained based on a difference between the input signal and an output signal reconstructed from the quantized sound source signals and a difference between entropy of the input signal and entropies of the quantized sound source signals (Rothberg col. 21 lines 47-61, where cross entropy, KL divergence, and L1 and L2 distances are used as loss functions in back propagation to update the parameters).  

Regarding claim 9, Ehara teaches:
A method of decoding a speech signal, the method comprising: 
identifying a bitstream generated by an encoder (Fig. 14 element 402, para [0129], where a bitstream is received from an encoder); 
generating output signals for a plurality of sound sources by inputting the bitstream to a decoding model (Fig. 3 elements 201, 203, para [0054], [0056], where the object and noise signals are decoded); and 
obtaining a final output signal by combining the output signals for the plurality of sound sources (Fig. 3 element 207, para [0055], where the signals are added together for final output),
wherein the decoding model extracts sound source signals quantized for each of the plurality of sound sources from the bitstream and generates the final output signal by decoding the quantized sound source signals (Fig. 13 element 105, para [0122], where the noise is quantized, and Fig. 3 elements 201, 203, para [0054], [0056], where the object and noise signals are decoded).
Ehara does not teach:
wherein the decoding model is trained;
Rothberg teaches:
wherein the decoding model is trained (Rothberg col. 21 lines 47-61, where parameters are updated);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Ehara by performing training, as taught by Rothberg (Rothberg col. 21 lines 47-61) on the model of Ehara (Ehara Fig. 2) by using backpropagation, so that parameters may updated during the execution of a process (Rothberg col. 20 lines 61-67).
Ehara in view of Rothberg does not explicitly teach that quantization is performed on the near field signal source, but does teach in Ehara para [0047] that an existing coding method may be used for coding an acoustic signal component x.
Fotopoulou para [0071] teaches quantization of each signal being performed using an assigned bit budget.
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (coding method) with other components (quantization); the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Regarding claim 10, Ehara in view of Rothberg and Fotopoulou teaches:
The method of claim 9, wherein the decoding model is configured to inversely quantize each of the quantized sound source signals, to generate output signals for each of the plurality of sound sources by decoding each of the inversely quantized sound source signals, and to obtain the final output signal by combining the output sources for each of the plurality of sound sources (Fotopoulou Fig. 15a element 710, para [0160], where dequantization and decoding is performed, and Ehara Fig. 3 element 207, para [0055], where the signals are added together for final output).  

Regarding claim 11, Ehara in view of Rothberg and Fotopoulou teaches:
The method of claim 9, wherein the decoding model is trained based on a difference between an input signal and the final output signal and a difference between entropy of the input signal and entropies of the quantized sound source signals (Rothberg col. 21 lines 47-61, where cross entropy, KL divergence, and L1 and L2 distances are used as loss functions in back propagation to update the parameters).

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ehara, in view of Rothberg, and Fotopoulou, and further in view of Tzinis.

Regarding claim 7, Ehara in view of Rothberg and Fotopoulou teaches:
The method of claim 6,
to determine a number of bits used for quantization of each of the plurality of sound source signals according to the type of each of the plurality of sound sources (Ehara Fig. 13 element 703, para [0122], where bit allocation is performed for the separated signals), and to quantize each of the plurality of sound source signals based on the determined number of bits (Ehara Fig. 2 element 105, para [0049], where quantization is performed).  
Fotopoulou para [0071] teaches quantization of each signal being performed using an assigned bit budget.
Ehara in view of Rothberg and Fotopoulou does not teach:
wherein the encoding model is configured to generate the latent signal by encoding the input signal, to obtain the plurality of sound source signals by separating the latent signal for each of the plurality of sound sources,
Tzinis teaches:
wherein the encoding model is configured to generate the latent signal by encoding the input signal (Tzinis page 31-32 section 2.1, where an encoder obtains a latent representation of the input signal), to obtain the plurality of sound source signals by separating the latent signal for each of the plurality of sound sources (Tzinis page 31-32 section 2.1, where the masks and latent representation are multiplied to determine an estimate for each source),
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Ehara in view of Rothberg and Fotopoulou by applying the source separation of Ehara in view of Rothberg and Fotopoulou (Ehara para [0045]) to a latent signal as taught by Tzinis (Tzinis page 31-32 section 2.1), in order to achieve a consistent performance improvement (Tzinis page 1 last paragraph of section 1).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 6,098,039 A Abstract, Fig. 1 teaches encoding audio by splitting an audio signal into bands, allocating bits to each band and quantizing the bands.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658