DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Response to Applicant’s Reply
The Applicant’s reply filed on September 27, 2022, to the Election/Restriction Requirement office action set forth in the previous Office Action mailed on July 27, 2022, has been acknowledged herein and wherein the Applicant has elected Group I, claims 1-15 and withdrawn claims 16-20 from further consideration on the merits pursuant to 37 CFR 1.142(b), as being drawn to a nonelected invention, there being no allowable generic or linking claim. A complete reply to a future final office action must include cancellation of non-elected claims or other appropriate action (37 CFR 1.144). See MPEP § 821.01. Because claim 16 has been withdrawn, claim 7 shall be withdrawn due to similarity in essential, and the following office action is based on the correction as set forth below.
Because the Applicant in the Remarks of the Reply filed on September 27, 2022 did not distinctly and specifically point out the supposed errors in the restriction requirement, the election has been treated as an election without traverse (MPEP §818.03(a)).
The applicant alleged “the Examiner has failed to establish a prima facie case for restriction requirement as required by USPTO procedure” and “Applicant respectfully notes that there is no serious search burden” because “Each of the Groups include similar concepts and even share the same or similar words, making a search for one Group relevant for the other Group. Accordingly, it cannot reasonably be argued that a serious burden exists in searching for all the Groups” and “claims are closely related the art for one Group is very liely to the relevant for another Group” and “the search and examination of the claims can be made without serious burden, the examiner must examine the claim groups on the merits, even if they include claims to independent or district inventions”, as asserted in paragraphs 3-5 of page 10 and paragraphs 1-2 of page 11 in Remarks filed on September 27, 2022.
In response to the argument above, the Office respectfully disagrees because (1) the restriction-election office action clearly provided the prima facie case for restriction requirement such as “different classification”, “recognized divergent subject matter” (e.g., different utilities described in page 3 of the Restriction-Election Requirement office action), “different field search” due to different classification (e.g., one Group I is classified in  G06N3/0454, etc., while Group II is classified in G10L21/0208, etc.), in page 4 of the Restriction-Election Requirement office action and however, the Applicant is in silence about the citation above and thus, the argument about “failed to establish a prima facie case” is moot, (2) because Group I and Group II have been classified in different fields, it would be burden in search and in examination, but Applicant alleged “no burden” with no evidence to be provided, which is considered to be not persuasive argument, and (3) further, because of the search and examination in burden as analyzed and evidenced in the previous office action, the further argument “the examiner must examine the claim groups on the merits, even if they include claims to independent or distinct inventions” is unacceptable and respectfully denied herein according to MPEP 803. 
Therefore, based on the evidences and analyses above, the Restriction-Election Requirement set forth in the previous office action maintained with the correction of claim 7 which drawn to Group II and withdrawn as well, as set forth above. Note: the Applicant improperly use the same terms representing different entities in withdrawn claims 7, 16, which provisionally causes much indefiniteness under 35 U.S.C. 112(b) in claims 7, 16 because of alleged “similarity” in claimed words, but different fields (one in training with given sample sets and another one in application of the trained models with unknown sample sets, but improperly use the same term which looks same utilities, but different fields), which provisionally causes confusing limitation by limitation in scope.

Claim Objections
Claims 1-16 are objected to because of the following informalities: 
Claim 1 recites “each clean-noisy audio pair comprises …”, “a clean audio of content by …”, which should be -- each of the one or more clean-noisy audio pairs comprises …--, -- a clean audio of a content by …--, respectively, for clarifications. Claims 2- are objected due to the dependencies to claim 1.
Claim 14 is objected for the at least similar reason as described in claim 12 above since claim 14 recited similar deficient features as recited in claim 12. Claims ? are objected due to the dependencies to claim 1.
Since there are many similar deficiencies listed above in the applicant’s claims, it would be a burden on the examiner to list them all. Therefore, applicant’s corporation for thoroughly revising the claims would be highly desirable for expediting the processing of this application.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-16 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 1 recites “each clean-noisy audio pair comprises a clean audio of content … and a noisy audio of the content by the speaker” and then recites “for each clean audio, generating one or more continuous latent representations for the clean audio using the first encoder” which is confusing because it is unclear whether “generating” is “for each clean audio” or “for the clean audio” and it is unclear whether “the clean audio” is referred back to “each clean audio” or “a clean audio” and thus, renders claim indefinite. Claim 1 further similarly recites “for each noisy audio, generating … for the noisy audio …” which is further confusing because it is unclear whether “generating” herein is “for each noisy audio” or “for the noisy audio” and it is unclear whether “the noisy audio” herein is referred back to “each noisy audio” or “a noisy audio of the content” and thus, further renders claim indefinite. Claim 1 further recites “for each continuous latent representation of clean audio” and “for each continuous latent representation of noisy audio”, and then recites “for each clean-noisy audio pair, inputting …, the clean audio, …” which is further confusing because it is unclear whether “the clean audio” herein is referred back to “clean audio” in “for each continuous latent representation of clean audio”, “each clean audio” in “for each clean audio”, or “a clean audio” in “each clean-noisy audio pair comprises a clean audio of content” and thus, further renders claim indefinite. Claim 1 further recites “using the quantizer” which is further confusing because it is unclear whether “the quantizer” herein is referred back to “a quantizer” in “a denoising system comprising a first encoder, a second encoder, a quantizer, and …” as recited in claim 1 or to “a quantizer” in “using a quantizer” as recited in claim 1 and thus, further renders claim indefinite. Claims 2-8 are rejected due to the dependencies to claim 1.
Claim 9 is rejected for the at least similar reasons described in claim 1 above since claim 9 recited the similar deficient features as recited in claim 1. Claims 10-15 are rejected and claim 16 is provisionally rejected, due to the dependencies to claim 9.
Claim 3 further rejected for the at least similar reason as described in claim 1 above because claim 3 recites the similar deficient feature as recited in claim 1, for example, claim 3 recites “the clean audio”. Claim 3 further recites “the continuous latent representation of the time step” which has insufficient antecedent basis for the limitation and causes further confusing because it is unclear what “the continuous latent representation of the time step” is and it is unclear how “L2 distance between” is well-defined and how the “distance” is included in “the distance” and thus, further renders claim indefinite. 
Claims 4-5 further rejected for the at least similar reason as described in claim 1 above because claim 4-5 recite the similar deficient feature as recited in claim 1, for example, claims 4-5 recited “the quantizer”.
Claim 6 further recites “repeating the steps of claim 1” and parent claim 1 recites steps of “generating … using the first encoder”, “generating … using the second encoder”, “generating  … using a quantizer”, “generating … using the quantizer”, “inputting … into the decoder to generate an audio sequence prediction”, “computing a loss for …”, and “updating the denoising system using the loss”, etc. and it is unclear whether the claimed “the steps of claim 1” is referred to some combinations of steps of claim 1 or all steps of claim 1 and thus, further renders claim indefinite. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6, 8-12, 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Sriram et al. (US 20190130903 A1, hereinafter Sriram) and in view of Engel et al. (US 10068557 B1, hereinafter Engel).
Claim 1:  Sriram teaches a computer-implemented method (title and abstract, ln 1-15, a system in figs. 1-2 and method steps in fig. 3-4 and executed by a laptop, a tablet computer, a PDA, a smartphone, a smartwatch, a server, etc., para 63) for training a denoising system (a new robust training approach in denoising, abstract, para 22) comprising: 
given a denoising system (a system in figs. 1-2) comprising a first encoder (an encoder 115 by receiving a clean audio input 105, para 33), a second encoder (an encoder 115 by receiving an audio input 110 as augmented audio input 105 with noises, para 33), and a decoder (an decoder 150) and given a set of one or more clean-noisy audio pairs (a pair of [x, ẋ] in audio input sequence), in which each clean-noisy audio pair comprises a clean audio of content by a speaker (x, 105 as clean audio input from a speaker placed in a variety of configurations etc., para 46) and a noisy audio of the content by the speaker (the augmented clean audio input with noises, para 33): 
for each clean audio (x, 105), generating one or more continuous latent representations for the clean audio using the first encoder (Z=g(x), 125, and g is the encoder function in figs. 1-2, para 30); 
for each noisy audio (ẋ, 110, as augmented x with noises, para 33), generating one or more continuous latent representations for the noisy audio using the second encoder (Ẑ =g(ẋ), 130, and g is the encoder function in figs. 1-2, para 30); 
for each clean-noisy audio pair, inputting a speaker embedding that represents the speaker of the clean-noisy audio pair into the decoder (embeddings generated by the encoder and outputted from the Z 125 as an input to the decoder 150 in figs. 1-2, para 36; the embeddings representing the speaker of the clean-noisy audio pair, para 33, 36-37) to generate an audio sequence prediction (a prediction 160 is generated and outputted by the decoder 150 in figs. 1-2); 
computing a loss for the denoising system (via a discriminator 140 in fig. 1, similar to EM distance 240 in fig. 2), in which the loss comprises a latent representation matching loss term (e.g., difference between the encoded outputs represented by g(x) and g(ẋ) at input sequence x in equation 1, para 31-32) that, for a time step in which the discrete clean audio representation and the discrete noisy audio representation for that time step differ (a discriminator loss L1  in equation 1, para 30-32), is based upon a distance measure between the continuous latent representation of the clean audio and the continuous latent representation of the noisy audio for that time step (the L1 distance includes a calculation of difference, i.e., the claimed distance measure, between g(x) and g(ẋ) at input sequence x in equation 1, outputted from the encoders and inputted to loss discriminator in figs. 1-2); and 
updating the denoising system using the loss (the L1 distance with the cross entropy CE loss calculated by using clean speech sample for model training, para 32; the model trained using both the CE losses and the EM distances to update weights of the model and the discriminator at step 335 in fig. 3 and included in an iteration of the training processing at 430 in fig. 4).
However, Sriram does not explicitly teach a quantizer to quantize the outputs of the encoders and does not explicitly teach that inputting the discrete clean audio representations and the clean audio, for the disclosed decoder to generate the disclosed audio sequence prediction.
Engel teaches an analogous field of endeavor by disclosing a computer-implemented method for training a denoising system (title and abstract, ln 1-17 and a method in fig. 12 and a system in figs. 1A-1B and training through a model trainer 122 in fig. 11) and wherein for each audio input (input audio waveform 16 in fig.1A), inputting the discrete clean audio representations (including embedding 20 from the encoder 12, the input audio waveform is quantized via a 8-bit μ-law quantizer, col 8, ln 64-66) and a clean audio (portion of the input audio waveform 18 in fig. 1A), and a speaker embedding that represents the speaker of the input audio (pitch 22, while used in human speech, col 23, ln 13-18) into a decoder (decoder neural network 14 in fig. 1A; details in fig. 1B) to generate an audio sequence prediction (a next sequential audio sample 24 in fig. 1A or 158 in fig. 1B) and computing a loss for the system (via loss function 28 in fig. 1A, wherein training is performed via backpropagated through the decoder neural network 14 and encoder neural network 12, col 6, ln 62-67, col 7, ln 1-2) for benefits of achieving a more accurate and perceptual reconstruction of audio waveforms by training generative model at individual sample level (abstract, col 14, ln 4-10) and the model being learnt directly from data in an intuitive and controllable parameters (col 3, ln 51-55) in human speech application (col 23, ln 13-18) and generating discrete audio representation by using a quantizer (a μ-law transformation implemented by the computer, col 8, ln 64-67, col 9, ln 1-7) for benefits of simplifying computation complexity and saving memory space for the computation (col 8, ln 64-67, col 9, ln 1-7).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein inputting the discrete clean audio representations and the clean audio into the decoder to generate an audio sequence prediction for computing the loss for the system and the quantizer of the audio input, as taught by Engel, to the feature wherein for each clean-noisy audio pair, inputting the speaker embedding that represents the speaker of the clean-noisy audio pair into the decoder to generate the audio sequence prediction and to the corresponding clean audio representation for each continuous latent representation of clean audio and the corresponding noisy audio representation for each continuous latent representation of noisy audio in the computer-implemented method and the quantizer for training the denoising system, as taught by Sriram, respectively for the benefits discussed above.
Claim 9 has been analyzed and rejected according to claim 1 above and the combination of Sriram and Engel further teaches a system comprising one or more processors; and non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, to implement the method of claim 1 (Sriram, storage devices 608 with storage controller 607, and connected to GPU, CPU through a bus 616 and storing programs of instructions for operating system and applications in fig. 6, para 66, and Engel, a computing system, including memory 114 with instructions and executed by processors 112 with encoder NN 132, decoder NN 134, model trainer 122, etc., in fig. 11).
Claim 2: the combination of Sriram and Engel further teaches, according to claim 1 above, wherein the latent representation matching loss term further comprises: an annealing term that increases during training from zero or near zero to one or near one (Sriram, a clip weight w is updated based on clip value of w, -c, para 42).
Claim 3: the combination of Sriram and Engel further teaches, according to claim 1 above, wherein the distance measure between the continuous latent representation of the clean audio and the continuous latent representation of the noisy audio comprises: an l.sup.2 distance between the continuous latent representation of the clean audio and the continuous latent representation of the time step (Sriram, EM distance in equation 2, between Ex[] and E ẋ[] in equation 2, para 38-39).
Claim 4: the combination of Sriram and Engel further teaches, according to claim 1 above, wherein the loss comprises: a decoder term related to loss for the decoder (g(ẋ) and the CE Loss is related to the decoder in fig. 2; CE() in line 15 of para 42); and a quantizer term related to loss for the quantizer (Engel, loss function 28 by taking quantized input waveform to Encoder neural network 12, col 9, ln 1-16, and thus, the loss function 28 inherently related to loss for the quantizer).
Claim 6: the combination of Sriram and Engel further teaches, according to claim 1 above, the computer-implemented method further comprising: repeating the steps of claim 1 with one or more additional sets of clean-noisy audio pairs (Sriram, the steps are repeated in step 430, para 44 and Engel, iteration is performed, col 11, ln 2-10); and responsive to a stop condition being reached, outputting a trained denoising system comprising a trained second encoder, a trained quantizer, and a trained decoder (Sriram, critical iterations is reached to set critic weight at 430 in fig. 4, para 44 and Engel, the iteration until a number of iteration is reached, col 11, ln 2-10; e.g., iteration 1000, col 10, ln 59-63).
Claim 8: the combination of Sriram and Engel further teaches, according to claim 1 above, wherein the decoder is an autoregressive generative model (Sriram, the decoder 150 in fig. 1, para 48, and Engel, autoregressive decoder model, abstract).
Claim 10 has been analyzed and rejected according to claims 9, 2 above.
Claim 11 has been analyzed and rejected according to claims 9, 3 above.
Claim 12 has been analyzed and rejected according to claims 9, 4 above.
Claim 14 has been analyzed and rejected according to claims 9, 6 above.
Claim 15: the combination of Sriram and Engel further teaches, according to claim 14 above, wherein the none of the first encoder, the second encoder, the quantizer, and the decoder of the denoising system are pre-trained (Sriram, the two encoders 115, decoder 150 are not pretrained in training procedure in fig. 1 and Engel, encoder 12, decoder 14, and quantizing by u-law, are not pretrained in fig. 1A).

Claims 5, 13 are rejected under 35 U.S.C. 103 as being unpatentable over Sriram (above) and in view of Engel (above) and Lee et al (US 20200211580 A1, hereinafter Lee).
Claim 5: the combination of Sriram and Engel further teaches, according to claim 1 above, including the quantizer (Engel, quantizing the input audio signal to the neural network by using u-law, the discussion in claim 1 above) and wherein the quantizer comprises one or more quantized variational autoencoders (Sriram, two encoders for augmented sample and for noisy sample X in fig. 1 and Engel, quantized audio input X and the discussion in claim 1 above) that convert the one or more continuous latent representations for clean audio to the corresponding one or more discrete clean audio representations (Sriram, X 105 to Z 125 behind the decoder 150 in fig. 1) and that convert the one or more continuous latent representations for noisy audio to the one or more corresponding discrete noisy audio representations (Sriram, from Ẋ to Ẑ in fig. 1), except wherein the quantized is vector-quantized.
Lee teaches an analogous field of endeavor by disclosing a computer-implemented method (title and abstract, ln 1-13 and fig. 4-5) and wherein vector-quantized variational autoencoders are disclosed (generative adversarial network GAN and an autoencoder AE, para 75 with vector quantization algorithm, para 146) for benefits of improving speech recognition by effective cancelling ambient noises via a deep training neural network (para 4-7).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the vector-quantized variational autoencoders, as taught by Lee, to the quantized variational autoencoders in the computer-implemented method, as taught by the combination of Sriram and Engel, for the benefits discussed above.
Claim 13 has been analyzed and rejected according to claims 9, 5 above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, Art Unit 2654