1Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The filed information disclosure statement (IDS) complies with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Document 1 of the NPL documents is not considered (missing the publication date).
Claim Objections
3.	Claim 11 is objected to because of the following informalities:
  Computer readable medium claim 11 is improperly depending on method claim 1.  Claim 11 should be rewritten in an independent form reciting all the limitation of claim 1.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
4.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 12-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 12 recites the limitation "in response to the sample being selected from the first training set, training an encoding layer and a shared decoding layer that are included in the artificial neural network based on the sample”. 
There is insufficient antecedent basis for this limitation in the claim.  The limitation is interpreted as - in response to a sample being selected from the first training set, training an encoding layer and a shared decoding layer that are included in the artificial neural network based on the sample-.
Claims 13-15 are rejected for being dependent on claim 12.
 
Claim Rejections - 35 USC § 103
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-8, 11-24, and 26-30 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 2016/0163310) in view of Jauhar (US 20200125944).
As per claim 1, Lee teaches a decoding method in an artificial neural network for speech recognition, the decoding method comprising: 
performing a first decoding task of decoding a feature including speech information and at least one token recognized up to a current point in time, using a shared decoding layer included in the artificial neural network (Fig. 7, 15, [0112]-[0119], [0163]-[0164], performing a first decoding task on speech data, corresponding to received sequence of words, by segmenting, extracting continuous feature values, generating phonemes values, vector values, and probability values.  See also, Figs. 5-7, 13 and corresponding description); 
performing a second decoding task of decoding the at least one token using the shared decoding layer ((Fig. 7, 15, [0119], [0163]-[0164], performing a second decoding task using the second neural network language model; See also, Figs. 5-7, 13 and corresponding description, and see [0059], wherein the language model may provide a probability value for a word to follow an input word using a recognition function of a neural network.); and 
determining an output token to be recognized subsequent to the at least one token based on a result of the first decoding task and a result of the second decoding task (Fig. 7, [0119], determining a final recognition result with respect to the speech data based on the first and second decoding task).
Lee teaches a trained neural network having a first language model and a second language for performing the first and second decoding tasks (Fig. 15, [0163]-[0164]).  Lee does not explicitly discloses the first and second decoding tasks use a shared layer.
Jauhar in the same field of endeavor teaches using a deep neural network for speech recognition wherein higher layers of the network are shared between different language models performing different tasks ([0021]).  Therefore, it would have been obvious at the time the application was filed to use Jauhar’s above feature with the system of Lee, in order to save on memory and provide faster speech recognizers.
As per claim 2, Lee teaches wherein performing the first decoding task comprises: adjusting, to be a first value, a weight of a synapse connecting the shared decoding layer and a neuron corresponding to the feature, and the performing of the second decoding task comprises: adjusting the weight of the synapse to be a second value ([0087], [0098], [0102]-[0105], [0145],adjusting the edges connections weights in the neural network based on the obtained results from the first and second decoding tasks to determine a final result that matches the expected value).
As per claim 3, Lee teaches wherein performing the first decoding task comprises: inputting, to the shared decoding layer, the feature and the at least one token recognized up to the current point, and the performing of the second decoding task comprises: inputting, to the shared decoding layer, the at least one token recognized up to the current point (necessarily disclosed within the process of continuous speech recognition of words within received speech sentences, and input of the speech data and corresponding result to the first and second language models of the neural network.  See Fig. 7, [0112], [0152] for processing word sequences).
As per claim 4, Lee teaches wherein performing the first decoding task comprises: inputting, to a first pre-decoding layer, the feature and the at least one token recognized up to the current point; and inputting an output of the first pre-decoding layer to the shared decoding layer to perform the first decoding task, and performing the second decoding task comprises: inputting, to a second pre-decoding layer, the at least one token recognized up to the current point; and inputting an output of the second pre-decoding layer to the shared decoding layer to perform the second decoding task ([0095]-[0097], wherein received continuous words are converted into continuous vector values and input into a projection layer, the output of the projection layer is input into the hidden layers, and the output of the hidden layers is input into the output layer that determines a probability of a word following the input three words).
As per claim 5, Lee teaches wherein performing the first decoding task comprises: inputting, to the shared decoding layer, the feature and the at least one token recognized up to the current point; and inputting an output of the shared decoding layer to a first post-decoding layer to perform the first decoding task, and performing the second decoding task comprises: inputting, to the shared decoding layer, the at least one token recognized up to the current point; and inputting an output of the shared decoding layer to a second post-decoding layer to perform the second decoding task ([0095]-[0097], wherein received continuous words are converted into continuous vector values and input into a projection layer, the output of the projection layer is input into the hidden layers, and the output of the hidden layers is input into the output layer that determines a probability of a word following the input three words).
As per claim 6, Lee teaches wherein the result of the first decoding task includes first probabilities of candidates for the output token, and the result of the second decoding task includes second probabilities of the candidates for the output token (Figs. 5, 15 and [0059], [0102], and [0105], wherein each of the first and second language model used during the first and second decoding tasks provide a first and second probability value, respectively,  for a word to follow an input word using a recognition function of a neural network).
As per claim 7, Lee teaches wherein determining the output token comprises: calculating a weighted sum of the first probabilities and the second probabilities; and determining, to be the output token, a candidate corresponding to a maximum weighted sum among the candidates ([0092]-[0093], selecting solutions with maximum probability).
As per claim 8, Lee teaches determining the output token to be a subsequent input token ([0063] subsequent word during a recognition process).
As per claim 11, Lee teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the decoding method of claim 1 ([0021], computer-readable medium.  The rest is rejected for the same reason as set with regard to claim 1).
As per claims 16, 18-24, apparatus claims 16, 18-24 and method claims 1-8 are related as apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step.  Accordingly, claims 16, 18-24 are similarly rejected under the same rationale as applied above with respect to method claims 1-8. 
As per claim 17, Lee teaches a speech preprocessor configured to extract a speech feature vector from the speech information, wherein the encoder is configured to generate the feature based on the speech feature vector ([0095]-[0097], extracting continuous feature values, generating phonemes values, and vector values of received words).
As per claim 12, Lee teaches selecting at least one sample from a batch including a first training set including a pair of a speech and a text corresponding to the speech and a second training set including a text ([0060]-[0065], selecting training data based on a set of speech and corresponding text and a set of words that are phonetically similar) ; 
in response to the sample being selected from the first training set, training an encoding layer and a shared decoding layer that are included in the artificial neural network based on the sample ([0060]-[0071], and [00078]-[0084], and [0092], wherein training data is input and used for training the neural network ending layer); and 
in response to the sample being selected from the second training set, training the shared decoding layer based on the sample ([0060]-[0071], and [00078]-[0084], [0094], wherein training the decoding layer based on the sample).
Lee does not explicitly discloses using a shared layer.  Jauhar in the same field of endeavor teaches using a deep neural network for speech recognition wherein higher layers of the network are shared between different language models performing different tasks ([0021]).  Therefore, it would have been obvious at the time the application was filed to use Jauhar’s above feature with the system of Lee, in order to save on memory and provide faster process to improve the speech recognition technology.
As per claim 13, lee teaches extracting, using the encoding layer, a feature from a speech included in the sample ([0027], [0055], extracting feature vectors from speech data); estimating, using the shared decoding layer, an output token to be recognized subsequent to at least one token, based on the extracted feature and the at least one token; and training the encoding layer and the shared decoding layer based on the estimated output token and at least a portion of a text corresponding to the speech included in the sample ([0056]-[0059], [0084]-[0093], wherein iterative training performed to output token subsequent to other tokens based on extracted features from the speech sample).
As per claim 14, lee teaches estimating, using the shared decoding layer, an output token to be recognized subsequent to at least one token, based on the at least one token; and training the shared decoding layer based on the estimated output token and at least a portion of a text included in the sample (Fig. 7, [0119], determining a final recognition result with respect to the speech data based on the first and second decoding task).
	As per claim 15, lee teaches extracting, using the encoding layer, a feature from a speech included in the sample; estimating, using the shared decoding layer, first probabilities of candidates for an output token to be recognized subsequent to at least one token based on the extracted feature and the at least one token ([0059], [0102], and [0105], wherein a first and second probability value, respectively,  for a word to follow an input word using a recognition function of a neural network); 
estimating, using the shared decoding layer, second probabilities of the candidates for the output token to be recognized subsequent to the at least one token based on the at least one token (estimating a first and second probability value, respectively,  for a word to follow an input word using a recognition function of a neural network as evidenced by [0059], [0102], and [0105] );
estimating the output token based on a weight between the first probabilities and the second probabilities; and learning the weight based on at least a portion of a text corresponding to the speech included in the sample ([0059], [0093], [0102], [0105], and [0119], wherein each of the first and second language model used during the first and second decoding tasks provide a first and second probability value, respectively, for a word to follow an input word using a recognition function of a neural network, and updating the weight based on the calculated probabilities and providing recognition result having a greatest probability value).
As per claim 26, Li teaches a speech recognition apparatus (Fig. 5-7, 13) comprising: an encoder configured to receive a speech feature vector corresponding to speech information and output a context vector ([0100]-[0111], receiving and processing feature vectors corresponding to speech data); and a decoder configured to receive the context vector; decode the context vector and a most recently recognized token using a shared decoding layer included in an artificial neural network to output a first result; decode only the most recently recognized token using the shared decoding layer to output a second result; and output a current token based on the first result and the second result (necessarily disclosed within the process of continuous speech recognition of words within received speech sentences, and input of the speech data and corresponding result to the first and second language models of the neural network.  See [0100]- [0112], [0152] for processing word sequences).
Lee does not explicitly discloses using a shared layer.  Jauhar in the same field of endeavor teaches using a deep neural network for speech recognition wherein higher layers of the network are shared between different language models performing different tasks ([0021]).  Therefore, it would have been obvious at the time the application was filed to use Jauhar’s above feature with the system of Lee, in order to save on memory and provide faster process to improve the speech recognition technology.
As per claim 27, Lee teaches wherein the decoder is configured to decode the context vector and the most recently recognized token using a speech recognition task of the shared decoding layer (necessarily disclosed within the process of iteratively processing speech recognition on received speech, [0069], [0077], [0084]).
As per claim 28, Lee teaches wherein the decoder is configured to decode only the most recently recognized token using a language model task of the shared decoding layer ([0104], determine a final recognition result with respect to the speech data, among the candidate recognition results, based on the second language model with a more excellent recognition performance than the first language model) .
As per claim 29, Lee teaches wherein the decoder is entirely configured by the shared decoding layer ([0163]).
As per claim 30, Lee teaches wherein only a portion of the decoder is configured by the shared decoding layer ([0113]-[0118]).
Claims 9-10 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 2016/0163310) in view of Jauhar (US 20200125944), and further in view of Watanable (US  20180261225).
As per claims 9 and 25, Lee in view of Jauhar does not explicitly disclose wherein the feature is determined using an attention network based on sequence vectors associated with a progress of the speech information.  Watanable in the same field of endeavor teaches an attention-based encoder-decoder network designed to perform speech recognition ([0042]).  Therefore, it would have been obvious at the time the application was filed to use Watanable’s  attention network with the system of Lee in view of Jauhar, in order to provide an end-to-end frameworks merely focusing on clean speech for achieving good performance in noisy environments.
	As per claim 10, Lee in view of Jauhar does not explicitly disclose generating the feature by encoding the speech information using an encoding layer included in the artificial neural network.  Watanable in the same field of endeavor teaches generating the feature by encoding the speech information using an encoding layer included in the artificial neural network ([0042]).  Therefore, it would have been obvious at the time the application was filed to use Watanable’s  encoding feature with the system of Lee in view of Jauhar, in order to provide better recognition results.

Conclusion
6.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDELALI SERROU whose telephone number is (571)272-7638. The examiner can normally be reached M-F 9 Am - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ABDELALI SERROU/Primary Examiner, Art Unit 2659