Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/06/2019 and 11/18/2020 is considered by the examiner.
Drawings
The drawing submitted on 05/16/2019 is considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-7, 9-15, and 17-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Shankar et al.(Published in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Date of Conference: 15-20 April 2018, Date Added to IEEE Xplore: 13 September 2018, ISBN Information: Electronic ISSN: 2379-190Â INSPEC Accession Number: 18097004,  DOI:10.1109/ICASSP.2018.8461628, Publisher: IEEE,   Conference Location: Calgary, AB, Canada).

Regarding Claims 1, 9, and 17, Shankar et al. teach:  A universal modeling system (Fig. 1: Recurrent adaptive mixture model (RADMM) based neural language model), comprising: a plurality of domain expert models (Fig.1, LSTM expert (1) LSTM expert (k) . . . LSTM expert (K) . . .) to each receive raw input data (Fig.1, Input layer- Word Embedding) and provide a domain expert output based on the raw input data (See Fig.1, where the domain expert output is shown through arrow); a neural mixture component (Fig.1, Mixer LSTM) to generate a weight (Fig.1, mixture weights “scalars g sub t”) corresponding to each domain expert model based on information created by the plurality of domain expert models; and an output layer (Fig.1, output layer)  to provide a universal modeling system output ( Fig.1, output layer- final mixture model or single domain robust model) based on each domain expert output after being multiplied  by the corresponding weight (see Fig.1, where mixer weights are multiplied with the each of the output from the LSTM expert(1) through LSTM expert (k)) for that domain expert model (Fig.1, Section: 2.1. Model Description: The architecture of the recurrent adaptive mixture model (RADMM) based language model is shown in Fig. 1. The building blocks of the model are: one word embedding layer shared across experts, multiple layers of parallel LSTM domain experts, the mixer LSTM network and the single softmax output layer…Such a vector is fed to each domain expert LSTM sub k for a domain id k ∈ 1, ..., K where K is the number of pre-defined domains... The same input word vector xt is also fed to the mixer LSTM function:...which is followed by a fully connected layer with the softmax activation function to generate the mixture weights over K domains:…The scalars gt(k) are then used as the relevance weights to combine the K LSTM expert features by linear interpolation: …which is used as the final feature to generate the output word distribution:… Section 2.2.1. Requirements- The role of the mixer is to generate the context dependent relevance weight for each expert… Section 4. Related Work: We
focused on building a single domain robust language model in the spirit of the Bayesian interpolation [10] for the ngram count LMs. We achieved this goal by using an adaptive state dependent mixture weights based on the LSTM…Section 5. CONCLUSION: We designed a neural network architecture motivated by data diversity. Our proposed model combines domain adaptation with an LSTM based mixture of experts in a single domain robust model…).

Regarding Claims 2, and 10, Shankar et al. teach: The system of claim 1, wherein the raw input data is a stream of audio frames containing speech utterances and the universal modeling system output is associated with automatic speech recognition ( Abstract: The proposed model is designed to benefit from the scenario where the training data are available in diverse domains as is the case for YouTube speech recognition. The two core components of our model are an ensemble of parallel long short-term memory (LSTM) expert layers for each domain and another LSTM based network which generates state dependent mixture weights for combining expert LSTM states by linear interpolation. The resulting model is a recurrent adaptive mixture model (RADMM) of domain experts. We train our model on 4.4B words from YouTube
speech recognition data. We report results on the YouTube speech recognition test set. Section 1. Introduction: Motivated by this data diversity, we design a neural network architecture which integrates the diversity of the data into a single neural language model (LM). We present such a model together with a multi-stage training strategy. We
evaluate our model on the YouTube speech recognition test set containing various domains, without using any domain information at the evaluation time.).

Regarding Claims 3, 11, and Shankar et al. teach: The system of claim 1, wherein the information created by the plurality of domain expert models comprises hidden features (See rejection of claim 1 and Section 3.3 Neural language model training setups: All LSTMs used in this work have tied input and forget gate, as well as the recurrent projection as in Sak et al.’s work [14]. These setups are the same as those used in Kumar et al. [12]. All our implementations of the neural language models are based on the TF-Slim library of Tensorflow [15]. In all models, we use the input word embedding size of 1024. The background model is a 2-layer LSTM with 2048 units per layer with 514 recurrent projection units.).

Regarding Claims 4, 12, and 19, Shankar et al. teach: The system of claim 3, wherein the neural mixture component includes a Long Short-Term Memory ("LSTM") element (See Fig.1 and rejection of claim 1).

Regarding Claims 5, 13, and 20, Shankar et al. teach:  The system of claim 1, wherein the weights comprise at least one of: (i) constrained scalar numbers, (ii) unconstrained scaler numbers, (iii) vectors, and (iv) matrices (See rejection of claim 1 and Section 2.1 Model Description, for calculation of scalars weight “g sub t (k)”).

Regarding Claims 6, and 14, Shankar et al. teach: The system of claim 1, wherein the information created by the plurality of domain expert models is associated with row convolution (See rejection of claim 1 and Section 3.3 Neural language model training setups: All LSTMs used in this work have tied input and forget gate, as well as the recurrent projection as in Sak et al.’s work [14]. These setups are the same as those used in Kumar et al. [12]. All our implementations of the neural language models are based on the TF-Slim library of Tensorflow [15]. Row convolution are inherent in Tensorflow.).

Regarding Claims 7, and 15, Shankar et al. teach:  The system of claim 1, wherein the neural mixture component is associated with learned, unconstrained vector weights generated based on hidden features, the vector weights (scalars weight) being distributed among each of the plurality of domain expert models to provide learned hidden interpolation (See rejection of claim 1 and also see section 1.1 to section 3.3).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 8 and 16, are rejected under 35 U.S.C. 103 as being unpatentable over Shankar et al.
Regarding Claims 8 and 16, Shankar et al. do not teach: wherein a single layer Deep Neural Network ("DNN") is applied to each of the plurality of domain expert models to provide a hybrid attention mixture model.
However, use of  a single layer DNN for domain expert model to provide a hybrid attention mixture model, would be predictable and as well design choice, since hybrid neural networks are complimentary in their modeling capabilities as evidence by Tara et al.( Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks, Publication date 08/06/2015) teach: (Abstract) Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech recognition tasks. CNNs, LSTMs and DNNs are complementary in their modeling capabilities, as CNNs are good at reducing frequency variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more separable space. In this paper, we take advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture.).
Therefore, it will be obvious to one ordinary skill in the art at the time of the invention was made for Shankar et al. to include the teaching of “single layer Deep Neural Network ("DNN") is applied to each of the plurality of domain expert models to provide a hybrid attention mixture model” for use of known techniques to improve similar devices (methods or products) in the same way.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Goel et al.(US 2018/0308487 A1) teach: The alternate LSTM-RNN based age and emotion identification subsystem comprises a single end to end DNN classifier trained directly using the raw speech waveforms; said end to end classifier has two convolutional layer followed by two Network-in-Network (NIN) layers which performs the role of feature extraction from raw waveforms; the end to end DNN classifiers also has 2 LSTM layers after the feature extraction layers followed by a soft-max layer.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656