Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/27/2020 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 2 is objected to because of the following informalities: 
 Claim 2, “the neural network learns is trained …”, appears as a typographic/ grammatical error and is suggested by the examiner to read as follow: “the neural network is trained …”.
 Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 5 recites the term “the embedding space”. There is insufficient antecedent basis for this limitation in the claims as there is no prior mention of the cited terms within the respective independent claim set. To overcome this rejection, it is advised to change the cited terms in the claim set as follows: “an embedding space”.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1 and 4-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The independent claim 1 is directed towards the abstract idea of:
“A method for encoding audio signal, comprising: generating a current latent vector by reducing a dimension of a current frame of an audio signal; generating a concatenation vector by concatenating a previous latent vector generated by reducing a dimension of a previous frame of the audio signal with the current latent vector; and encoding and quantizing the concatenation vector to output a bit stream.”
	The limitation of “generating …”, “generating …”, and “encoding …”, under broadest reasonable interpretation, as drafted, covers a human performing mental processing and utilizing concepts using pen and paper. An ordinary person skilled in the art would be able to generate a latent vector by reducing the dimension of current frame of audio signal. For example, the audio frames can be down sampled or compressed using relevant mathematic algorithms such convolution manually using pen and paper.  Similarly, a latent vector for previous frame can also be found by an individual using pen and paper. As for the concatenation vector limitation, this limitation can be performed by simply joining both of the found latent vectors into one singular vector. Lastly, the limitation of encoding and quantizing is nothing more than utilizing mathematical operation. For example, an individual skilled in the art would be able to take a vector and divided into finite range to assign relevant bits and generate bit stream using binary coded format; thus, covering both the quantization and encoding processing. Therefore, all of these limitations can be performed by human using pen and paper by utilizing relevant concepts.
The Judicial exception is not integrated into a practical application since claim 1 does not present any additional elements and fails to amount to significantly more than the judicial exception. As discussed above, the claim is directed to an abstract idea, making it ineligible for patent
The Independent claim 4 is directed towards the abstract idea of:
“A method for encoding audio signal, comprising: generating a condition vector for a current frame of an audio signal using a previous latent vector generated by reducing a dimension of a previous frame of the audio signal; generating a current latent vector by reducing a dimension of the current frame of the audio signal in a neural network to which the condition vector is applied; and encoding and quantizing the current latent vector to output a bit stream”
	The limitation of “generating …”, “generating …”, and “encoding …”, under broadest reasonable interpretation, as drafted, covers a human performing mental processing and utilizing concepts using pen and paper. An ordinary person skilled in the art would be able to generate condition vector using latent vector. For example, an individual can take the previous vector as condition parameter to create and calculate a conditional probability vector corresponding previous frame using pen and paper. As for the limitation of reducing the dimension of current and previous audio frames, it is nothing more than downsampling or compressing audio frames using relevant mathematic algorithm such as convolution using pen and paper. The limitation of condition vector being applied to the neural network to find current latent vector can be performed by using the condition vector as parameter or training parameter during the neural network calculation; all of which can be performed manually by human using pen and paper. Lastly, the limitation of encoding and quantizing the latent vector to bit format is nothing more than utilizing mathematical operation which can be performed by human. For example, an individual skilled in the art would be able to take a vector and divided into finite range to assign relevant bits using binary coded format; thus, covering both the quantization and encoding processing.
	The Judicial exception is not integrated into a practical application since claim 4 does not present any additional elements and fails to amount to significantly more than the judicial exception. As discussed above, the claim is directed to an abstract idea, making it ineligible for patent.
	The dependent claim 5 is aimed towards the abstract idea of generating conditional vector by projecting the embedding space of previous latent vector to another dimension. Having a vector embedded space projected to another dimension can be as simple as padding the vector space or adding the two vector space to get a new dimensionality; all of which can be performed manually by utilizing relevant mathematical concepts. 
	The dependent claim 6 is aimed towards the abstract ide of generating condition vector using previous latent vector in a different neural network. Condition vector can be found manually by a human in a similar manner described above for claim 4, and furthermore it can be placed in a separate neural network format such as entropy or probabilistic network. The neural network is considered as a mathematical concept which can be computed by human if the network does not require substantial pre-training.
	Dependent claims 5-6 do not impose the judicial exception being integrated into a practical application and further fails to include additional elements that are sufficient to amount to significantly more than the judicial.
	The Independent claim 7 is directed towards the abstract idea of:
“A method for decoding audio signal, comprising: generating a condition vector for a current frame of an audio signal using a previous latent vector generated by reducing dimension of a previous frame of a bit stream; generating a current latent vector by reducing dimension of the current frame of the audio signal in a neural network to which the condition vector is applied; and decoding the current latent vector to restore the audio signal”
	The limitation of “generating …”, “generating …”, and “decoding …”, under broadest reasonable interpretation, as drafted, covers a human performing mental processing and utilizing concepts using pen and paper. An ordinary person skilled in the art would be able to generate condition vector using latent vector. For example, an individual can take the previous vector as condition parameter to create and calculate a conditional probability vector corresponding previous frame using pen and paper. As for the limitation of reducing the dimension of current and previous audio frames, it is nothing more than downsampling or compressing audio frames using relevant mathematic algorithm such as convolution using pen and paper. The limitation of condition vector being applied to the neural network to find current latent vector can be performed by using the condition vector as parameter or training parameter during the neural network calculation; all of which can be performed manually by human using pen and paper. Lastly, the limitation of decoding to restore audio signal is nothing more than utilizing mathematical operation which can be performed by human. For example, an individual skilled in the art would be take the bit stream in coded format and decoded each audio frame waveform vector by hand using relevant codebook decoding reference.
	The Judicial exception is not integrated into a practical application since claim 4 does not present any additional elements and fails to amount to significantly more than the judicial exception. As discussed above, the claim is directed to an abstract idea, making it ineligible for patent.
	The dependent claim 8 is aimed towards the abstract idea of generating conditional vector by projecting the embedding space of previous latent vector to another dimension. Having a vector embedded space projected to another dimension can be as simple as padding the vector space or adding the two vector space to get a new dimensionality; all of which can be performed manually by utilizing relevant mathematical concepts.
	The dependent claim 9 is aimed towards the abstract ide of generating condition vector using previous latent vector in a different neural network. Condition vector can be found manually by a human in a similar manner described above for claim 4, and furthermore it can be placed in a separate neural network format such as entropy or probabilistic network. The neural network is considered as a mathematical concept which can be computed by human if the network does not require substantial pre-training.
Dependent claims 8-9 do not impose the judicial exception being integrated into a practical application and further fails to include additional elements that are sufficient to amount to significantly more than the judicial.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6 are rejected under 35 U.S.C. 103 as being unpatentable over Chorowski (Document ID: “Unsupervised speech representation learning using wavenet autoencoders”) in view of Garbacea (Document ID: US-20200234725-A1)
Regarding claim 1, Chorowski teaches a method for encoding audio signal, comprising: generating a current latent vector by reducing a dimension of a current frame of an audio signal (Fig 1 and corresponding Description, show the dimensionality reduction being performed at encoder to VQ-VAE stage; For example 768 going to 64); encoding and quantizing vector to output a bit stream (Fig 1 and corresponding description, and Page 2, Col 2, Paragraph 4, lines 1-4, show the vector quantization process being performed on the encoded latent vector; also see Fig 2, table I, and Page 10, Col 1, Paragraph 4-6 where VQ-VAE which is used for quantization generating bits/ token).
	Even though Chorowski does mention the past sample being used during the encoding process (Page 4, Col 1, line 9-14; also see Fig 1’s description), It however fails to specifically mention concatenation vector being generated.  Therefore, failing to cover the claimed limitation of “generating a concatenation vector by concatenating a previous latent vector generated by reducing a dimension of a previous frame of the audio signal with the current latent vector”
	Garbacea does teach the claimed limitation of generating concatenation vector using the current and previous samples (Fig 1A and Paragraph 0043-0044 where discrete latent representation is generated using the current input audio and previous input audio). It would have been inherent to one skill in the art to have used the discrete latent representation mentioned by Garbecca to present a formal representation and implementation of past sample mentioned to be considered by Chorowski during encoding process (Chorowski, Page 4, Col 1, line 9-14;). Garbacea is considered analogous to the claimed invention because it is also aimed towards audio coding and reconstruction. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Chorowski to incorporate discrete latent representation as taught by Garbecea to improve performance of the Fidelity of reconstructed speech (Paragraph 007).
	Regarding claim 2, Chorowski in view of Garbecea teach the method of claim 1, wherein the generating the current latent vector reduces the dimension of the current frame of the audio signal using a neural network (Garbecea, Fig 1A and Paragraph 0042; shows an encoded neural network being used to get encoder output of current input audio. Here, it is also mentioned a mean pooling process over time dimension which will inherently result in reduced frame dimension), wherein the neural network learns is trained according to a loss function of the current latent vector calculated by setting the previous latent vector as a conditional probability (Garbecea, Fig 1A-B and Paragraph 0097-0099; mention of a system determining reconstruction loss to update the decoder and encoder network parameter which include finding the probability between the input audio and decoder input which is the discrete latent representation shown in Fig 1A. Here, the discrete latent representation as mentioned and cited earlier consist of previous input audio as well as current input audio). Garbacea is considered analogous to the claimed invention because it is also aimed towards audio coding and reconstruction. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Chorowski to incorporate loss calculation as taught by Garbecea to improve performance of the Fidelity of reconstructed speech (Garbacea, Paragraph 007).
Regarding claim 3, Chorowski in view of Garbecea teach the method of claim 1, wherein the generating the current latent vector reduces the dimension of the current frame of the audio signal using a neural network (Garbecea, Fig 1A and Paragraph 0042; shows an encoded neural network being used to get encoder output of current input audio. Here, it is also mentioned a mean pooling process over time dimension which will inherently result in reduced dimension), wherein the neural network is trained according to an entropy of the current latent vector calculated by setting the previous latent vector as a conditional probability (Garbecea, Fig 1A-B and Paragraph 0097-0099; mention of a system determining reconstruction loss to update the decoder and encoder network parameter which include finding the probability between the input audio and decoder input which is the discrete latent representation shown in Fig 1A. Here, the discrete latent representation as mentioned and cited earlier consist of previous input audio as well as current input audio). The loss found to update the parameter of encoder network can be equated to the entropy mentioned in the claim as both are commonly used alternatively in the field. Furthermore, the loss found by Garbecca can be seen as doing the same conditional probability function as mentioned in the claim. Garbacea is considered analogous to the claimed invention because it is also aimed towards audio coding and reconstruction. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Chorowski to incorporate loss calculation as taught by Garbecea to improve performance of the Fidelity of reconstructed speech (Garbacea, Paragraph 007).
Regarding Claim 4, Garbecea teaches a method for encoding audio signal, comprising: generating a condition vector for a current frame of an audio signal using a previous latent vector (Fig 1A-B and Paragraph 0043-0044, where discrete latent representation 122 is generated using the current input audio and previous input audio; Here, the loss parameters (Paragraph 0097-0099) being found using  discrete latent representation can be equated to condition vector; also see, Fig 1B and Paragraph 0010, lines 1-6); generating a current latent vector of the current audio signal in a neural network to which the condition vector is applied (Fig 1A-B show the audio signal going into the encoder neural network and outputting encoder output. Here, according to Paragraph 0034, encoder neural network processes input audio using “encoder network parameter.”. These parameters are found using the discrete latent representation as shown in Paragraph 0097-0099).
Even though, Garbecea does not specifically mention dimension reduction, it does mention performing mean pooling over the time dimension (Paragraph 0042-0043) which can be equated to dimension reduction. Furthermore, Garbecea fails to mention the claimed limitation of: “encoding and quantizing the current latent vector to output a bit stream.”
Chorowski does teach the claimed limitation of encoding and quantizing the current latent vector to output a bit stream (Fig 1 and corresponding description, and Page 2, Col 2, Paragraph 4, lines 1-4, show the vector quantization process being performed on the encoded latent vector; also see Fig 2, Table I, and Page 10, Col 1, Paragraph 4-6 where VQ-VAE which is used for quantization having bits/ token). Chorowski is considered analogous to the claimed invention because it is also aimed towards audio coding. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Garbecea to incorporate current audio encoding and quantization as taught by Chorowski to improve the degree to which latent representation can be mapped to the phonetic content (Page 1, Col 2, Paragraph 2, lines 19-25).
Regarding claim 5, Garbecea in view of Chorowski does teaches the method of claim 4, wherein the generating the condition vector generates the condition vector by projecting the embedding space of the previous latent vector to another )dimension (Garbecea, Fig 1B, show the embedding space 130 being projected into another dimensions for training using parameter shown in Paragraph 0097-0099). Here, embedding space can be equated to previous input audio (Garbecea, Fig 1A and Paragraph 0043); condition vector can be equated to the parameter found by calculating loss (Garbecea, Paragraph 0097-0099).
Regarding Claim 6, Garbecea in view of Chorowski teaches the method of claim 4, wherein the generating the condition vector generates the condition vector by transforming and compressing the previous latent vector in another neural network different from the neural network (Garbecea, as seen in Fig 1A and Paragraph 0059, a separated training system where neural network is provided to calculate the parameters/ condition vector).
Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Garbecea (Document ID: US-20200234725-A1)
Regarding Claim 7, Garbecea teaches a method for decoding audio signal, comprising: generating a condition vector for a current frame of an audio signal using a previous latent vector (Fig 1A-B and Paragraph 0043-0044, where discrete latent representation 122 is generated using the current input audio and previous input audio; Here, the loss parameters (Paragraph 0097-0099) being found using  discrete latent representation can be equated to condition vector; also see, Fig 1B and Paragraph 0010, lines 1-6); generating a current latent vector of the current audio signal in a neural network to which the condition vector is applied (Fig 1A-B show the audio signal going into the encoder neural network and outputting encoder output. Here, according to Paragraph 0034, encoder neural network processes input audio using “encoder network parameter.”. These parameters are found using the discrete latent representation as shown in Paragraph 0097-0099).
Even though, Garbecea does not specifically mention dimension reduction, it does mention performing mean pooling over the time dimension (Paragraph 0042-0043) which can be equated to dimension reduction. Furthermore, Garbecea does teach the claimed limitation of decoding the current latent vector to restore the audio signal (Fig 1A-B where input audio is shown to be reconstructed).
Regarding claim 8, Garbecea teaches the method of claim 7, wherein the generating the condition vector generates the condition vector by projecting an embedding space of the previous latent vector to another dimension (Garbecea, Fig 1B, show the embedding space 130 being projected into another dimensions for training using parameter shown in Paragraph 0097-0099). Here, embedding space can be equated to previous input audio (Garbecea, Fig 1A and Paragraph 0043); condition vector can be equated to.
Regarding claim 9, Garbecea teaches the method of claim 7, wherein the generating the condition vector generates the condition vector by transforming and compressing the previous latent vector with another neural network different from the neural network (As seen in Fig 1A and Paragraph 0059, a separated training system where neural network is provided to calculate the parameters/ condition vector).
Conclusion
The analogous prior art made of record and not relied upon is considered to applicant’s disclosure.
Klejsa (Document ID: “High-quality speech coding with sample RNN”) teach audio coding using encoding decoding using conditional sample RNN network; quantization and entropy encoding is also mentioned in this NPL.
Kim (Document ID: US-20210142812-A1) teaches the autoencoder network for speech coding where training is performed using residual signal.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEEL P. KARELIA whose telephone number is (571)272-4377. The examiner can normally be reached Monday-Friday 6:30 am - 4:00 pm (every other Friday Off)).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NEEL PIYUSHKUMAR KARELIA/Examiner, Art Unit 2659                                                                                                                                                                                                        

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659