DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed U.S. Provisional Application  No. 62/836542, filed on 04/19/2020, which is acknowledged.
Drawings
The drawings were received on 06/17/2019.  These drawings are acceptable.

Information Disclosure Statement
The information disclosure statements (IDSs) submitted on 12/08/2020 and 08/15/2019 have been considered by the examiner. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

 (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.




s 1-3 and 8-9 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Huang et al. (US Pub. No. 2014/0257805, hereinafter ‘Ting’).

Regarding independent claim 1 limitations, Ting teaches: a method performed on a computing device, the method comprising: 
providing a multi-task machine learning model having one or more shared layers and two or more task-specific layers; (claimed provided model for a recognition system, as depicted in Fig. 1, in 0005-0008: Described herein are various technologies pertain­ing to automatic speech recognition (ASR) systems that are trained using multilingual training data. With more specific­ity, an ASR system can include a deep neural network (DNN) [claimed providing a multi-task machine learning model having one or more shared layers and two or more task-specific layers], wherein the DNN includes an input layer that receives a feature vector extracted from a captured utterance in a first language. The DNN also includes a plurality of hidden layers, wherein each hidden layer in the plurality of hidden layers comprises a respective plurality of nodes … The hidden layers [claimed providing a multi-task machine learning model having one or more shared layers and two or more task-specific layers] have several parameters associ­ated therewith, such as weights between nodes in separate layers, wherein the weights represent the synaptic strength, as well as weight biases… In an exemplary embodiment, the DNN can include non-hierarchi­cal multiple softmax layers [claimed providing a multi-task machine learning model having … two or more task-specific layers], one softmax layer for each lan­guage that is desirably subject to recognition by the ASR system… By sharing the hidden layers [claimed providing a multi-task machine learning model having one or more shared layers …] in the DNN and using the joint training strategy described above, recognition accuracy across all languages decodable by the DNN can be improved over monolingual ASR systems trained using the acoustic (training) data from each of the individual languages alone.)
performing a pretraining stage on the one or more shared layers using one or more unsupervised prediction tasks; (in 0039: Further, the MDNN 300 can be pre-trained through utilization of either a supervised or unsupervised learning process. In an exemplary embodiment, an unsupervised pre­training procedure can be employed [claimed performing a pretraining stage on the one or more shared layers using one or more unsupervised prediction tasks], as such pre-training may not involve language-specific softmax layers,…)
and performing a tuning stage on the one or more shared layers and the two or more task-specific layers using respective task- specific objectives. (in 0039-0040: … Fine-tuning of the MDNN 300 can be undertaken through employment of a back propa­gation (BP) algorithm. Since, in the multilingual DNN 300, however, a different softmax layer is used for each language, the BP algorithm can be slightly adjusted. For instance, when a training sample is presented for updating the MDNN 300, only the shared hidden layers 312-318 and the language­specific softmax layer (the softmax layer for a language of the training sample) are updated,… After the training phase has been completed, the MDNN 300 can be employed to recognize speech in any target language repre­sented by one of the plurality of softmax layers 352-354  [claimed performing a tuning stage on the one or more shared layers and the two or more task-specific layers using respective task- specific objectives]. It is also to be understood that the plurality of hidden layers 312-318 of the MDNN 300 can be considered as an intelligent feature extraction module, jointly trained with data from multiple source languages.; And training jointly task specific layers as claimed, in 0035-0036: The MDNN 300 also comprises a plurality of soft­max layers 352-354, wherein each softmax layer in the plu­rality of softmax layers 352-354 corresponds to a different respective language… Simi­larly, the Nth softmax layer 354 includes a plurality of mod­eling units 364-370 that are representative of phonetic ele­ments employed in an Nth language. In the architecture depicted in FIG. 3, the input layer 302 and the plurality of hidden layers 312-318 can be shared across all of the softmax layers 352-354 [claimed the two or more task-specific layers using respective task- specific objective], and thus can be shared across all languages with respect to which spoken words can be recognized through utilization of the MDNN 300…; And in 0026: …As will be described in greater detail below, at least a portion of the MDNN 106 may be trained through utilization of multilin­gual training data [claimed performing a tuning stage on the one or more shared layers and the two or more task-specific layers using respective task- specific objective] , wherein languages in the multilingual training data are referred to herein as "source languages." [claimed the two or more task-specific layers using respective task- specific objective] … It can thus be ascertained that a language, in some embodiments, may be both a source language and a target language. The MDNN 106 includes an input layer 108 that receives the feature vector extracted from the at least one frame of the input signal by the extractor component 104. In an exemplary embodi­ment, the MDNN 106 may be a context-dependent MDNN, wherein the input layer 108 is configured to receive feature vectors for numerous frames, thus providing context for a particular frame of interest.)
	
Regarding claim 2, the rejection of claim 1 is incorporated and Ting further teaches the method of claim 1, further comprising: after the tuning stage, processing input data using the multi-task machine learning model to obtain a task-specific result from an individual task-specific layer. (in 0035-0036: The MDNN 300 also comprises a plurality of soft­max layers 352-354, wherein each softmax layer in the plu­rality of softmax layers 352-354 corresponds to a different respective language… Simi­larly, the Nth softmax layer 354 includes a plurality of mod­eling units 364-370 that are representative of phonetic ele­ments employed in an Nth language. In the architecture depicted in FIG. 3, the input layer 302 and the plurality of hidden layers 312-318 can be shared across all of the softmax layers 352-354, and thus can be shared across all languages with respect to which spoken words can be recognized [input as spoken words across train languages using MDNN multi-task learning model for processing claimed after the tuning stage, processing input data using the multi-task machine learning model to obtain a task-specific result from an individual task-specific layer]  through utilization of the MDNN 300…  The plurality of softmax layers 352-354, how­ever, are not shared, as each language has its own softmax layer that outputs respective posterior probabilities of the phonetic elements that are specific to a language [softmax layers probabilities as task-specific results in claimed after the tuning stage, processing input data using the multi-task machine learning model to obtain a task-specific result from an individual task-specific layer].)

	
	Regarding claim 3, the rejection of claim 1 is incorporated and Ting further teaches the method of claim 1, wherein the two or more task-specific layers perform different natural language processing tasks. (in 0039: …The plurality of hidden layers 312-318 act as a structural regularization to the multilingual DNN 300, and the entire multilingual DNN 300 can be considered as an example of multitask learning. After the training phase has been completed, the MDNN 300 can be employed to recognize speech in any target language repre­sented by one of the plurality of softmax layers 352-354 [claimed wherein the two or more task-specific layers perform different natural language processing tasks].)

Regarding claim 8, the rejection of claim 1 is incorporated and Ting further teaches the method of claim 1, further comprising: after the tuning stage, performing a domain adaptation process to adapt the multi-task machine learning model for an additional task, the domain adaptation process comprising: adding a new task-specific layer to the multi-task machine learning model; (claimed tuning stage using backpropagation to learn softmax layer as claimed task specific layer, in 0039-0040: … Fine-tuning of the MDNN 300 can be undertaken through employment of a back propa­gation (BP) algorithm. Since, in the multilingual DNN 300, however, a different softmax layer is used for each language, the BP algorithm can be slightly adjusted. For instance, when a training sample is presented for updating the MDNN 300, only the shared hidden layers 312-318 and the language­specific softmax layer (the softmax layer for a language of the training sample) are updated,… After the training phase has been completed, the MDNN 300 can be employed to recognize speech in any target language repre­sented by one of the plurality of softmax layers 352-354 . It is also to be understood that the plurality of hidden layers 312-318 of the MDNN 300 can be considered as an intelligent feature extraction module, jointly trained with data from multiple source languages.; And adding trained new softmax layer after training as claimed, in 0040-0041: … It can, therefore, be ascertained that knowledge learned in the multiple hidden layers 312-318 based upon training data in multiple source languages can be employed to distinguish phones in the new target language ( e.g., cross-lingual model transfer can be employed). Cross-lingual model transfer can be undertaken as follows: the shared hidden layers 312-318 can be extracted from the MDNN 300, and a new softmax layer [claimed after the tuning stage, … adding a new task-specific layer to the multi-task machine learning model] for the new target language can be added on top of the plurality of hidden layers 312-318. The output nodes of the softmax layer for the new target language correspond to senones utilized in the new target language… If a relatively large amount of training data for the new target language is avail­able, parameter values in the plurality of hidden layers 312- 318 can be further tuned [claimed after the tuning stage, performing a domain adaptation process to adapt the multi-task machine learning model for an additional task, the domain adaptation process comprising: adding a new task-specific layer to the multi-task machine learning model] based upon such training data…)
and training the new task-specific layer and the one or more shared layers using training data for the additional task. (in 0041: Cross-lingual model transfer can be undertaken as follows: the shared hidden layers 312-318 can be extracted from the MDNN 300, and a new softmax layer for the new target language can be added on top of the plurality of hidden layers 312-318. The output nodes of the softmax layer for the new target language correspond to senones utilized in the new target language. Parameter values for the hidden layers 312- 318 may be fixed, and the softmax layer can be trained using training data for the new target language. If a relatively large amount of training data for the new target language is avail­able, parameter values in the plurality of hidden layers 312- 318 can be further tuned based upon such training data [claimed training the new task-specific layer and the one or more shared layers using training data for the additional task]…)

Regarding claim 9, the rejection of claim 8 is incorporated and Ting further teaches the method of claim 8, further comprising: after the domain adaptation process, processing input data using the multi-task machine learning model to produce a first task- specific result using a first task-specific layer, a second task- specific result using a second task-specific layer, and a third task-specific result using the new task-specific layer. (in 0041-0042: Cross-lingual model transfer can be undertaken as follows: the shared hidden layers 312-318 can be extracted from the MDNN 300, and a new softmax layer [claimed new task-specific layer] for the new target language can be added on top of the plurality of hidden layers 312-318 [claimed after the domain adaptation process, processing input data using the multi-task machine learning model to produce a first task- specific result using a first task-specific layer, a second task- specific result using a second task-specific layer, and a third task-specific result using the new task-specific layer]. The output nodes of the softmax layer for the new target language correspond to senones utilized in the new target language. Parameter values for the hidden layers 312- 318 may be fixed, and the softmax layer can be trained using training data for the new target language. If a relatively large amount of training data for the new target language is avail­able, parameter values in the plurality of hidden layers 312- 318 can be further tuned based upon such training data. Experimental results have indicated that, with respect to a target language, an ASR system that includes the MDNN 300 exhibits improved recognition accuracy for the target lan­guage relative to a recognition system that includes a DNN trained solely based upon the target language [new target language for producing claimed results associated with claimed a third task-specific result using the new task-specific layer]… Rather than including a plurality of softmax layers, the MDNN 400 includes a single softmax layer 402, which comprises a plu­rality of modeling units 404-410 that represent phonetic ele­ments utilized across multiple target languages. Pursuant to an example, an ASR system that includes the MDNN 400 can be configured to recognize words spoken in a first target language [claimed after the domain adaptation process, processing input data using the multi-task machine learning model to produce a first task- specific result using a first task-specific layer,…] and words spoken in a second target language [claimed after the domain adaptation process, processing input data using the multi-task machine learning model to produce … , a second task- specific result using a second task-specific layer]… The MDNN 400 includes the input layer 302 and the plurality of hidden layers 314-318. Rather than including a plurality of softmax layers, the MDNN 400 includes a single softmax layer 402, which comprises a plu­rality of modeling units 404-410 that represent phonetic ele­ments utilized across multiple target languages [including claimed after the domain adaptation process, processing input data using the multi-task machine learning model to produce a first task- specific result using a first task-specific layer, a second task- specific result using a second task-specific layer, and a third task-specific result using the new task-specific layer; new task layer as new target language added as noted in training new softmax layer].)

Claims 15-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Elfeky et a. (NPL: “Towards acoustic model unification across dialects”, hereinafter ‘Elf’).

Regarding independent claim 15 limitations, Elf teaches: method performed on a computing device, the method comprising: 
evaluating candidate teacher instances of a multi-task machine learning model having one or more shared layers, a first task-specific layer that performs a first task, and a second task-specific layer that performs a second task; based at least on the evaluating, selecting one or more first teacher instances for the first task and one or more second teacher instances for the second task; (dialect-specific acoustic models as claimed candidate teacher instances, in Sec. 3.1: …We propose to use MTL for building a unified acoustic model for a multi-dialectal language in a similar way to the one described above. In DMTL, we train a single DNN that predicts the CD states as the primary task and the dialect as the secondary task [claimed evaluating candidate teacher instances of a multi-task machine learning model having one or more shared layers, a first task-specific layer that performs a first task, and a second task-specific layer that performs a second task]. This is achieved by adding a secondary output layer to the DNN while sharing the input and all the hidden layers [claimed evaluating candidate teacher instances of a multi-task machine learning model having one or more shared layers,….]. This secondary output layer is a softmax layer whose target output is a binary vector that represents the dialect of the input utterance. For a language with n dialects, the secondary target output is a binary vector of size n with zeros in all but the i-th position to indicate the i-th dialect. Fig. 2 depicts how a DMTL network looks like. The training process of DMTL-DNN [training process as claimed evaluating and selecting of selected instances of DMTL-DNN ensemble] is identical to that of a normal DNN [13] using cross-entropy, while also taking into account the gradients of the secondary targets…; And selecting of selected instances of DMTL-DNN ensemble as depicted in Fig. 1: 

    PNG
    media_image1.png
    444
    640
    media_image1.png
    Greyscale

And algorithm 1:

    PNG
    media_image2.png
    382
    662
    media_image2.png
    Greyscale

)
and  Page 35 of 38training a student instance of the multi-task machine learning model using first outputs of the one or more first teacher instances to train the first task-specific layer of the student instance and using second outputs of the one or more second teacher instances to train the second task-specific layer of the student instance. (first and second outputs used as the Ensemble from each dialect-specific model as depicted in Fig. 1, and Algorithm 1 for claimed training of student model as claimed student instance of the multi-task machine learning model:

    PNG
    media_image3.png
    450
    633
    media_image3.png
    Greyscale

In Sec. 2.1: … Each dialect-specific acoustic model is trained solely with the cor-responding dialectal data. However, to create an ensemble of these dialect-specific models, ... Consequently, the ensemble dialect-specific models are trained to produce that unified CD states. Then, a linear combination function (e.g., weighted average) is used to combine the ensemble models output, which provides the training data for the student model [claimed training a student instance of the multi-task machine learning model using first outputs of the one or more first teacher instances to train the first task-specific layer of the student instance and using second outputs of the one or more second teacher instances to train the second task-specific layer of the student instance]. Our proposed technique is outlined in Algorithm 1, and illustrated in Fig. 1.)

Regarding claim 16, the rejection of claim 15 is incorporated and Elf further teaches the method of claim 15, further comprising: training the student instance using an objective function defined based at least on the first outputs and the second outputs. (ensemble comprising claimed first and second outputs, claimed training depicted in Fig. 1 and Algorithm 1, Sec. 5: … In DKD, we used the idea of knowledge distillation to train a single deployable acoustic model on the outputs of an ensemble of dialect-specific acoustic models [claimed training the student instance using an objective function defined based at least on the first outputs and the second outputs]. In our second technique, DMTL, we utilized the idea of multitask learning to create a dialect aware acoustic model….; And combination function as claimed objective function, in Sec. 2 & Sec. 2.1: … knowledge distillation (KD) refers to training a student model on the output class probabilities of a teacher model [6]. KD’s goal is to transfer the knowledge of a potentially very large teacher model to a smaller student model, which is more suitable for deployment… In the next section, we propose how to use the same idea to create a student acoustic model that unifies di-alectal acoustic models… Then, a linear combination function (e.g., weighted average) is used to combine the ensemble models output, which provides the training data for the student model [claimed training the student instance using an objective function defined based at least on the first outputs and the second outputs]. Our proposed technique is outlined in Algorithm 1, and illustrated in Fig. 1.)

Regarding claim 17, the rejection of claim 16 is incorporated and Elf further teaches the method of claim 16, the objective function being further defined based at least on labels associated with labeled data processed to obtain the first outputs and the second outputs. (dialectal data associated with dialectal language stems  as claimed labeled data, and combination function as claimed objective function, depicted in Fig. 1 and in 2.1: The intuition behind using knowledge distillation to build a uni-fied acoustic model for a multi-dialectal language stems from the fact that dialects that belong to the same language share a signif-icant number of acoustic features... Each dialect-specific acoustic model is trained solely with the cor-responding dialectal data. However, to create an ensemble of these dialect-specific models [claimed the objective function being further defined based at least on labels associated with labeled data processed to obtain the first outputs and the second outputs], it is required that either all of them have the same context-dependent (CD) state inventory output, or there exists a mapping from each model’s output to a unified one… Then, a linear combination function (e.g., weighted average) is used to combine the ensemble models output, which provides the training data for the student model. Our proposed technique is outlined in Algorithm 1, and illustrated in Fig. 1 )

Regarding claim 18, the rejection of claim 15 is incorporated and Elf further teaches the method of claim 15, wherein the selecting comprises: training the multiple teacher instances to perform the first task and the second task; selecting the one or more first teacher instances based at least on accuracy of the one or more first teacher instances at performing the first task; and selecting the one or more second teacher instances based at least on accuracy of the one or more second teacher instances at performing the second task. (selecting ensemble instances of the models depicted in Fig. 1 based on the associated accuracy captured for each respective claimed instance model, in Algorithm 1: Items 2 & 3: 2. For each dialect, train a dialect-specific acoustic model [claimed wherein the selecting comprises: training the multiple teacher instances to perform the first task and the second task] using the dialect-specific training data, which outputs the posterior probabilities on the unified CD states. 3. Determine the optimal weights to combine the frame-level predictions of all dialect-specific models (created in the previous step). The optimal weights are found by performing a grid search over all possible weight combinations and choosing the one that leads to the best performance [claimed selecting the one or more first teacher instances based at least on accuracy of the one or more first teacher instances at performing the first task; and selecting the one or more second teacher instances based at least on accuracy of the one or more second teacher instances at performing the second task] (word error rate) of the ensemble on a test set comprising the union of all dialectal test sets; And dialect-specific acoustic models as claimed multiple teacher instances trained to perform the first task and the second task, in Sec. 3.1: …We propose to use MTL for building a unified acoustic model for a multi-dialectal language in a similar way to the one described above. In DMTL, we train a single DNN that predicts the CD states as the primary task and the dialect as the secondary task [claimed training the multiple teacher instances to perform the first task and the second task]. This is achieved by adding a secondary output layer to the DNN while sharing the input and all the hidden layers. This secondary output layer is a softmax layer whose target output is a binary vector that represents the dialect of the input utterance…)

Regarding claim 19, the rejection of claim 18 is incorporated and Elf further teaches the method of claim 18, wherein the training comprises: averaging first outputs of multiple first teacher instances to obtain first averages; and training the student instance to reproduce the first averages. (the combination junction using claimed averages as claimed averaged first outputs, Sec. 2.1: The intuition behind using knowledge distillation to build a uni-fied acoustic model for a multi-dialectal language stems from the fact that dialects that belong to the same language share a signif-icant number of acoustic features. One can think of the ensemble of dialect-specific acoustic models as the teacher model that mod-els the entire language. Using KD, the knowledge of this teacher model can be distilled [claimed training the student instance to reproduce the first averages] into a student model that ideally would bene-fit from the similarities in the acoustic features of the language’s di-alects. For example, starting from an ensemble of Egyptian, Levan-tine, and Gulf Arabic acoustic models that were not necessarily built using the same architecture, we can use the same knowledge distil-lation technique used in [7] to build a unified Arabic acoustic model [claimed and training the student instance to reproduce the first averages]... Then, a linear combination function (e.g., weighted average [claimed wherein the training comprises: averaging first outputs of multiple first teacher instances to obtain first averages]) is used to combine the ensemble models output, which provides the training data for the student model [claimed and training the student instance to reproduce the first averages].; as depicted in Fig.1 the average output used to determine soft labels used to train student model for claimed training the student instance to reproduce the first averages )

Regarding claim 20, the rejection of claim 19 is incorporated and Elf further teaches the method of claim 19, wherein the student instance is trained to reproduce at least two different probabilities for a given category on a given instance of training data. (student instances trained using knowledge distillation process depicted in Fig. 1 and in Sec. 2 &  Sec. 2.1: In a classification problem, knowledge distillation (KD) refers to training a student model on the output class probabilities  [wherein the student instance is trained to reproduce at least two different probabilities for a given category on a given instance of training data] of a teacher model [6]. KD’s goal is to transfer the knowledge of a potentially very large teacher model to a smaller student model [wherein the student instance is trained to reproduce at least two different probabilities for a given category on a given instance of training data], which is more suitable for deployment… The intuition behind using knowledge distillation to build a uni-fied acoustic model for a multi-dialectal language stems from the fact that dialects that belong to the same language share a signif-icant number of acoustic features… Using KD, the knowledge of this teacher model can be distilled into a student model that ideally would bene-fit from the similarities in the acoustic features of the language’s di-alects…. Our proposed technique is outlined in Algorithm 1, and illustrated in Fig. 1. ; claimed probabilities for a given training utterance data for training Arabic dialects category, as depicted in Fig. 1 and in Algorithm 1: …2. For each dialect, train a dialect-specific acoustic model using the dialect-specific training data, which outputs the posterior probabilities [claimed  least two different probabilities for a given category on a given instance of training data] on the unified CD states. 3. Determine the optimal weights to combine the frame-level predictions of all dialect-specific models (created in the previous step). The optimal weights are found by performing a grid search over all possible weight combinations and choosing the one that leads to the best performance (word error rate) of the ensemble on a test set comprising the union of all dialectal test sets. 4: Train a student model using the ensemble of dialect-specific models as the teacher model. Output: The student model, now representing [claimed wherein the student instance is trained to reproduce at least two different probabilities for a given category on a given instance of training data] the unified dialect-independent model for the language in hand.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 4-5, 10, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (US Pub. No. 2014/0257805, hereinafter ‘Ting’) in view of Chaturvedi et al. (Pub. No. US 2020/0311341, hereinafter ‘Chat’).

	Regarding claim 4, the rejection of claim 3 is incorporated and Ting further teaches the method of claim 3, wherein the one or more shared layers comprise a … encoder.  (lexicon encoder for encoding voice signal into feature vectors that is transformed using a lexicon encoder as part of the shared layer DNN network, in 0005: … With more specific­ity, an ASR system can include a deep neural network (DNN), wherein the DNN includes an input layer that receives a feature vector extracted from a captured utterance in a first language. The DNN also includes a plurality of hidden layers, wherein each hidden layer in the plurality of hidden layers comprises a respective plurality of nodes. Each node in a hidden layer is configured to perform a linear or nonlinear transformation [wherein the one or more shared layers comprise a … encoder]on its respective input, wherein the input is based upon output of nodes in a layer immediately beneath the hidden layer.)
 	The Ting reference does not expressly disclose the limitation: the one or more shared layers comprise a lexicon encoder.
	Chat teaches expressly teaches the claim limitation: the one or more shared layers comprise a lexicon encoder. (lexicon encoder as claimed lexicon encoder depicted in Fig. 3B, and in 0067: The functions depicted in FIG. 3A are implemented as the neural network system depicted in FIG. 3B. FIG. 3B is a block diagram that illustrates an example overall neural network architecture 302 … These models share an identical overall encoder-decoder based architecture shown in FIG. 3B. They adopt a dual encoding approach where two separate but architecturally similar encoders 312 and 314 are used for the context (Context Encoder) [claimed the one or more shared layers comprise a lexicon encoder] and the cue phrase (Cue Encoder), respectively.


    PNG
    media_image4.png
    605
    529
    media_image4.png
    Greyscale

)
The Ting and Chat references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing a natural language processing techniques using neural network structures for multi-task machine learning tasks.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for using a deep neural network including encoder and decoder components for natural language processing as disclosed by Chat with the method for natural language processing using deep neural networks for multi task learning as disclosed by Ting.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods to process natural language text  from word embeddings as token representations to be processed as vectors with neural networks, (Chat, 0003-0004); Doing so allow for processing natural language context using neural networks to encoded text features as tokens and assign weights to specific tokens in an input sequence based on how semantically-related the tokens are with respect to the word being encoded as part of the natural language processing tasks,  (Chat, 0048-0049).

Regarding claim 5, the rejection of claim 4 is incorporated and Ting does not expressly teach claim 4 limitation. Chat does expressly teach claim 5 limitation, wherein the one or more shared layers comprise a transformer encoder. (transformer encoder as encoder decoder as part of the DNN shared layers as depicted in Fig. 3B, in 0083: … Following previous work (Vaswani et al., 201 7), a 6-layer encoder-decoder trans­former [claimed wherein the one or more shared layers comprise a transformer encoder ] was trained with self-attention heads (5 12 dimen­sional states and 8 attention heads). The example networks contained a 2-layer encoder for encoding cue phrase (all other specifications are the same)…)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Ting and Chat for the same reasons disclosed above.

Regarding independent claim 10 limitations, Ting teaches: a system comprising: a hardware processing unit; and a storage resource storing computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to: (in 0054-0058: … the computing device 1000 can be used in a system that comprises an ASR system that comprises an MDNN. The computing device 1000 includes at least one processor 1002 that executes instructions that are stored in a memory 1004… Various functions described herein can be imple­mented in hardware, software, or any combination thereof. If implemented in software, the functions can be stored on or transmitted over as one or more instructions or code on a computer-readable medium… By way of example, and not limitation, such computer-readable storage media can com­prise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer…)
provide a multi-task natural language processing model having shared layers and task-specific layers, (claimed provided model for a recognition system, as depicted in Fig. 1, in 0005-0008: Described herein are various technologies pertain­ing to automatic speech recognition (ASR) systems that are trained using multilingual training data. With more specific­ity, an ASR system can include a deep neural network (DNN) [claimed provide a multi-task natural language processing model having shared layers and task-specific layers,], wherein the DNN includes an input layer that receives a feature vector extracted from a captured utterance in a first language. The DNN also includes a plurality of hidden layers, wherein each hidden layer in the plurality of hidden layers comprises a respective plurality of nodes … The hidden layers [claimed provide a multi-task natural language processing model having shared layers and task-specific layers,] have several parameters associ­ated therewith, such as weights between nodes in separate layers, wherein the weights represent the synaptic strength, as well as weight biases… In an exemplary embodiment, the DNN can include non-hierarchi­cal multiple softmax layers [claimed provide a multi-task natural language processing model … and task-specific layers], one softmax layer for each lan­guage that is desirably subject to recognition by the ASR system… By sharing the hidden layers [claimed provide a multi-task natural language processing model having shared layers and task-specific layers in the DNN and using the joint training strategy described above, recognition accuracy across all languages decodable by the DNN can be improved over monolingual ASR systems trained using the acoustic (training) data from each of the individual languages alone.)
the shared layers comprising a … encoder…; lexicon encoder for encoding voice signal into feature vectors that is transformed using a lexicon encoder as part of the shared layer DNN network, in 0005: … With more specific­ity, an ASR system can include a deep neural network (DNN), wherein the DNN includes an input layer that receives a feature vector extracted from a captured utterance in a first language. The DNN also includes a plurality of hidden layers, wherein each hidden layer in the plurality of hidden layers comprises a respective plurality of nodes. Each node in a hidden layer is configured to perform a linear or nonlinear transformation [the shared layers comprising a … encoder…] on its respective input, wherein the input is based upon output of nodes in a layer immediately beneath the hidden layer.)
receive one or more input words; (speech as input words, in 0024: … the recognition sys­tem 100 can be configured to recognize words in multiple languages, wherein the multiple languages include a target language. The recognition system 100 comprises a receiver component 102 that receives an input signal (an acoustic signal), wherein the input signal comprises a spoken utter­ance, the spoken utterance including a word set forth in the target language.)
 provide the one or more input words to the multi-task natural language processing model; (input words provided as source language, in 0026:… The recognition system 100 additionally comprises a multilingual deep neural network (MDNN) 106. As will be described in greater detail below, at least a portion of the MDNN 106 may be trained through utilization of multilin­gual training data, wherein languages in the multilingual training data are referred to herein as "source languages." [claimed provide the one or more input words to the multi-task natural language processing mode] Thus, a "target language" is a language where words spoken therein are desirably recognized by the recognition system 100, and a "source" language is a language included in train­ing data that is used to train the MDNN 106. It can thus be ascertained that a language, in some embodiments, may be both a source language and a target language. The MDNN 106 includes an input layer 108 that receives the feature vector extracted from the at least one frame of the input signal [claimed provide the one or more input words to the multi-task natural language processing model] by the extractor component 104… )
obtain a task-specific result produced by an individual task-specific layer of the multi-task natural language processing model; (in 0035-0036: The MDNN 300 also comprises a plurality of soft­max layers 352-354, wherein each softmax layer in the plu­rality of softmax layers 352-354 corresponds to a different respective language… Simi­larly, the Nth softmax layer 354 includes a plurality of mod­eling units 364-370 that are representative of phonetic ele­ments employed in an Nth language. In the architecture depicted in FIG. 3, the input layer 302 and the plurality of hidden layers 312-318 can be shared across all of the softmax layers 352-354, and thus can be shared across all languages with respect to which spoken words can be recognized [input as spoken words across train languages using MDNN multi-task learning model for processing claimed obtain a task-specific result produced by an individual task-specific layer of the multi-task natural language processing model]  through utilization of the MDNN 300…  The plurality of softmax layers 352-354, how­ever, are not shared, as each language has its own softmax layer that outputs respective posterior probabilities of the phonetic elements that are specific to a language [softmax layers probabilities as task-specific results in claimed obtain a task-specific result produced by an individual task-specific layer of the multi-task natural language processing model].)
and use the task-specific result to perform a natural language processing operation. ( using softmax layer output as claimed task specific result & classification of words in target language as claimed natural language processing operation, in 0029-0030: The MDNN 106 additionally includes a softmax layer 112 that comprises a plurality of output units. Output units in the softmax layer 112 are modeling units that are representative of phonetic elements used in the target lan­guage. For example, the modeling units in the softmax layer 112 can be representative of senones (tied triphone or quin­phone states) used in speech of the target language [and use the task-specific result to perform a natural language processing operation]. For example, the modeling units can be Hidden Markov Models (HMMs) or other suitable modeling units. The softmax layer 112 includes parameters with values associated therewith, wherein the values can be learned during a training phase based upon training data in the target language. With respect to the input signal, the output of the softmax layer 112 is a probability distribution [and use the task-specific result to perform a natural language processing operation] over the phonetic elements (senones) used in the target language that are modeled in the softmax layer 112 [and use the task-specific result to perform a natural language processing operation]… When the recognition system 100 is anASR system, the classification can be the identification [and use the task-specific result to perform a natural language processing operation], in the target language, of a word or words in the input signal.)
Ting does not expressly teach claim 10 limitation the shared layers comprising a lexicon encoder and a transformer encoder;
	Chat teaches expressly teaches the claim limitation: the shared layers comprising a lexicon encoder and a transformer encoder; (lexicon encoder as claimed lexicon encoder and transformer encoder as depicted in Fig. 3B, and in 0067: The functions depicted in FIG. 3A are implemented as the neural network system depicted in FIG. 3B. FIG. 3B is a block diagram that illustrates an example overall neural network architecture 302 … These models share an identical overall encoder-decoder based architecture shown in FIG. 3B. They adopt a dual encoding approach where two separate but architecturally similar encoders 312 and 314 are used for the context (Context Encoder) [claimed the shared layers comprising a lexicon encode] and the cue phrase (Cue Encoder), respectively.; And transformer encoder as encoder decoder as part of the DNN shared layers as depicted in Fig. 3B, in 0083: … Following previous work (Vaswani et al., 201 7), a 6-layer encoder-decoder trans­former [claimed the shared layers comprising a … transformer encoder ] was trained with self-attention heads (5 12 dimen­sional states and 8 attention heads). The example networks contained a 2-layer encoder for encoding cue phrase (all other specifications are the same)…

    PNG
    media_image4.png
    605
    529
    media_image4.png
    Greyscale

)
The Ting and Chat references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing a natural language processing system/methods implementations using neural network structures for multi-task machine learning tasks.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for using a deep neural network including encoder and decoder components for natural language processing of document corpus of semantic text and for speech recognition tasks as disclosed by Chat with the method for natural language processing using deep neural networks for multi task learning as disclosed by Ting.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods to process natural language text  from word embeddings as token representations to be processed as vectors with neural networks, (Chat, 0003-0004); Doing so allow for processing natural language context using neural networks to encoded text features as tokens and assign weights to specific tokens in an input sequence based on how semantically-related the tokens are with respect to the word being encoded as part of the natural language processing tasks,  (Chat, 0048-0049).

wherein the one or more input words comprise a pair of sentences and the task-specific result characterizes a semantic similarity of the pair of sentences. (speech as input words comprising sentence, in 0024: … the recognition sys­tem 100 can be configured to recognize words in multiple languages [sematic similarity in the recognized language], wherein the multiple languages include a target language. The recognition system 100 comprises a receiver component 102 that receives an input signal (an acoustic signal), wherein the input signal comprises a spoken utter­ance [claimed wherein the one or more input words comprise a pair of sentences], the spoken utterance including a word set forth in the target language.; And in 0038-0042: During a training phase for the MDNN 300, values for parameters of the MDNN 300 (e.g., weights of synapses and weight biases) can be learned using multilingual (mul­tiple source language) training data simultaneously [claimed wherein the one or more input words comprise a pair of sentences]; that is, the MDNN 300 is not trained first using training data in a first source language, and then updated using training data in a second source language, and so forth. Rather, to avoid tuning the MDNN 300 to a particular source language, training data for multiple source languages can be utilized simultaneously to learn parameter values of the MDNN 300. For example, when batch training algorithms, such as L-BFGS or the Hes­sian-free algorithm, are used to learn parameter values for the MDNN 300, simultaneous use of training data for multiple source languages [claimed wherein the one or more input words comprise a pair of sentences] is relatively straightforward, since all of the training data can be used in each update of the MDNN 300. If, however, mini-batch training algorithms, such as the mini­batch stochastic gradient ascent (SGA) algorithm are employed, each mini-batch should be drawn from all avail­able training data (across multiple languages)… Rather than including a plurality of softmax layers, the MDNN 400 includes a single softmax layer 402, which comprises a plu­rality of modeling units 404-410 that represent phonetic ele­ments utilized across multiple target languages. Pursuant to an example, an ASR system that includes the MDNN 400 can be configured to recognize words spoken [including claimed the pair of sentences] in a first target language and words spoken in a second target language [claimed the task-specific result characterizes a semantic similarity of the pair of sentences as sentences in a target language class similar to a first, second …etc. class]...)
Additionally, Chat further teaches claim limitation: wherein the one or more input words comprise a pair of sentences … (in 0084: For training the systems, not only are sentence pairs [claimed wherein the one or more input words comprise a pair of sentences] used, but also cue phrases are used which are expected to be entered by a human user. However, to scale the training process for the thousands of pairs useful for good results, an automated way of generating cue phrases relevant to the training sentences is beneficial…)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Ting and Chat for the same reasons disclosed above.
	
Claims 6-7, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (US Pub. No. 2014/0257805, hereinafter ‘Ting’) in view of Chaturvedi et al. (Pub. No. US 2020/0311341, hereinafter ‘Chat’), in further view of Gao et al. (US Pub. No. 2017 /0032035, hereinafter ‘Gao’) 

Regarding claim 6, the rejection of claim 5 is incorporated and Ting in combination with Chat further teaches the method of claim 5, wherein the two or more task-specific layers comprise a single-sentence classification layer, a pairwise text similarity layer, a pairwise text classification layer, and a relevance ranking layer. (speech classification and pairing speech input to a language in the multiple languages, in 0008: After being trained, the ASR system can be employed to recognize speech of multiple languages, so long as acoustic data in each language in the multiple languages had been used to train at least one softmax layer [including claimed a pairwise text similarity layer, a pairwise text classification layer] of the DNN. By sharing the hidden layers in the DNN and using the joint training strategy described above, recognition accuracy across all languages decodable by the DNN can be improved over monolingual ASR systems trained using the acoustic (training) data from each of the individual languages alone.  )
Ting in combination with Chat teach the two or more task specific layers as noted above.
 Ting and Chat does not expressly teach the limitation wherein the two or more task-specific layers comprise a single-sentence classification layer, a pairwise text similarity layer, a pairwise text classification layer, and a relevance ranking layer.
	Gao does expressly teach the claim limitation: wherein the two or more task-specific layers comprise a single-sentence classification layer, a pairwise text similarity layer, a pairwise text classification layer, and a relevance ranking layer. (As depicted in Fig. 5 and in 0063-0065: Based, at least in part, on the similarity measurements, the processor may return a list of relevant documents by estimating P(D1 IQ), P(D2 1 Q) … for each document Dn and rank according to these probabilities [layer for returning relevant documents for ranking as relevance ranking layer, in claimed wherein the two or more task-specific layers comprise…, and a relevance ranking layer]. There may be at least one relevant document D,, for each query Q. FIG. 5 illustrates an example process 500 of a multi-task DNN for representation learning. Process 500 has similarities to process 400 except that, among other things, some operational layers share tasks among one another…; And in 0073-0075: Transitioning from shared operational layer sL2 to sL3 , the processor may map the 300-dimensional vector of shared operational layer sL2 into a 1 28-dimension task specific representation performing the operation sl3=f(W'2 ·sl2), where t denotes different tasks (e.g., for query classification and/or web search) … The processor may generate web search results for a web search task by mapping both the query Q and the document D into 1 28-dimension task­specific representation representations QSq and DSd. Accordingly, the relevance score may be computed by the cosine similarity [Pair semantic Q and D input for layer tasks depicted in Fig 5 for computing cosine similarity as claimed wherein the two or more task-specific layers comprise … a  pairwise text similarity layer …]… In addition to ranking and classification tasks, another example of disparate tasks is sequence to sequence generation, such as translating a Chinese word sequence to its English translation. Other examples of disparate tasks include: question answering, where dialog or chitchat may be viewed as a sequence-to-sequence generation task; sequence labeling tasks [sequence labeling based on input sentence as claimed wherein the two or more task-specific layers comprise a single-sentence classification layer …] , where label-named entities are generated based on an input sentence; multilingual speech recognition, … where one task (e.g., speech recognition of English) may help another task ( e.g., speech recognition for French); and a set of binary classification tasks [binary classification of speech input in a language paired domain as claimed wherein the two or more task-specific layers comprise … a pairwise text classification layer …], each for a different domain.)
The Ting, Chat, and Gao references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing a natural language processing  techniques using neural network structures for multi-task machine learning tasks.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for using a multi-task deep neural network for semantic classification and sematic information retrieval tasks as disclosed by Gao with the method for natural language processing using deep neural networks for multi task learning as collectively disclosed by Ting and Chat.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods to develop techniques for using multi-task deep neural network (DNN) for representation learning in semantic classification (e.g., query classification) and semantic information retrieval tasks (e.g., ranking for web searches) by mapping text input data into sematic vector representation in a low dimensional latent space., (Gao, 0003); Doing so allow for processing natural language context using neural networks 

Regarding claim 7, the rejection of claim 6 is incorporated and Ting in combination with Chat and Gao further the teaches method of claim 6, wherein the multi-task machine learning model is a neural network. (As depicted in Fig. 3: 300 as claimed model, in 0039: … The plurality of hidden layers 312-318 act as a structural regularization to the multilingual DNN 300, and the entire multilingual DNN 300 can be considered as an example of multitask learning…; And, in 0034-0035: With reference now to FIG. 3, an exemplary MDNN 300 is illustrated. The MDNN 300 includes an input layer 302 that comprises nodes 304-310 that receive values for features extracted from an input signal. The multilingual DNN 300 further comprises a plurality of hidden layers 312-318… The MDNN 300 also comprises a plurality of soft­max layers 352-354, wherein each softmax layer in the plu­rality of softmax layers 352-354 corresponds to a different respective language…

    PNG
    media_image5.png
    666
    583
    media_image5.png
    Greyscale

)




Chat teaches the claim limitation(s): wherein the one or more input words comprise a query (As depicted in Fig. 4C, and in 0040: … Yet the semantics for a particular natural language application (such as translation, data retrieval in response to a query [claimed wherein the one or more input words comprise a query], or predicting the next sentence in a conversation or story) can be a subset of the fully loaded semantics and is generally captured in projections of the full dimensional space into a lower dimensional space….

    PNG
    media_image6.png
    713
    494
    media_image6.png
    Greyscale


)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Ting and Chat for the same reasons disclosed above.

Ting and Chat do not expressly teach the limitations: the task-specific result comprises scores that reflect relevance of a plurality of documents to the query, and the natural language processing operation involves ranking individual documents relative to one another based at least on the scores.
 the task-specific result comprises scores that reflect relevance of a plurality of documents to the query, (in 0062-0063: Subsequent to transitioning L3, the processor may measure the respective similarities between the query Q and the documents D1, D2 in the semantic space. To measure the similarities, the processor may perform the operations R(Q, D) for Di , D2 , ... Dn, respectively, on the 128-dimension task-specific representations…Based, at least in part, on the similarity measure­ments, the processor may return a list of relevant documents by estimating P(D1 IQ), P(D2 1Q) ... for each document Dn and rank according to these probabilities [probability as claimed scores; and claimed the task-specific result comprises scores that reflect relevance of a plurality of documents to the query].)
and the natural language processing operation involves ranking individual documents relative to one another based at least on the scores. (in 0063: Based, at least in part, on the similarity measure­ments, the processor may return a list of relevant documents by estimating P(D1 IQ), P(D2 1Q) ... for each document Dn and rank [claimed and the natural language processing operation involves ranking individual documents relative to one another based at least on the scores] according to these probabilities.)
Additionally Gao teaches the limitation: wherein the one or more input words comprise a query, (in 0068: … At the lower level sL0, input X (either a search query Q [claimed wherein the one or more input words comprise a query] and/or a document list L that includes D1, D2, ... D,,) may be initially represented as a bag of words in a relatively large vocabulary,…)
The Ting, Chat, and Gao references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing a natural language processing techniques using neural network structures for multi-task machine learning tasks.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for using a multi-task deep neural 
One of ordinary skill in the arts would have been motivated to combine the disclosed methods to develop techniques for using multi-task deep neural network (DNN) for representation learning in semantic classification (e.g., query classification) and semantic information retrieval tasks (e.g., ranking for web searches) by mapping text input data into sematic vector representation in a low dimensional latent space., (Gao, 0003); Doing so allow for processing natural language context using neural networks to enable the use of the sematic representations input that is mathematically and computationally convenient to process in representation learning tasks,  (Gao, 0003 & 0016).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (US Pub. No. 2014/0257805, hereinafter ‘Ting’) in view of Chaturvedi et al. (Pub. No. US 2020/0311341, hereinafter ‘Chat’), in further view of Wehrmann et al. (NPL: “A multi-task neural network for multilingual sentiment classification and language detection on twitter”, hereinafter ‘Jon’).

	
Regarding claim 11, the rejection of claim 10 is incorporated and Ting in combination with Chat further teaches the system of claim 10, wherein the one or more input words comprise a sentence (speech as input words comprising sentence, in 0024: … the recognition sys­tem 100 can be configured to recognize words in multiple languages, wherein the multiple languages include a target language. The recognition system 100 comprises a receiver component 102 that receives an input signal (an acoustic signal), wherein the input signal comprises a spoken utter­ance [claimed wherein the one or more input words comprise a sentence], the spoken utterance including a word set forth in the target language.)
 and the task-specific result characterizes a sentiment of the sentence as positive or negative.
	Jon does expressly teach claim limitation: and the task-specific result characterizes a sentiment of the sentence as positive or negative. (in Pg. 1807: Right  Col. 2nd para. : … The final layer is responsible for linearly mapping the hidden representation to the specific classes of both tasks. Hence, the first network output generates four classes for identifying each language we are learning from (considering a 4-language detection problem), and the second one learns binary sentiment analysis (positive or negative sentiment) [claimed and the task-specific result characterizes a sentiment of the sentence as positive or negative]…)
The Ting, Chat, and Jon references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing a natural language processing techniques using neural network structures for multi-task machine learning tasks.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for using a deep neural network for natural language processing to perform sentiment analysis as automated computational tasks, as disclosed by Jon with the method for natural language processing using deep neural networks for multi task learning as collectively disclosed by Ting and Chat.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods to perform automated sentiment analysis with multi-task neural networks for computing distinct task specific outputs designed for supporting sentiment analysis and language identification in multilingual language datasets, (Jon, Abstract); Doing so allow for processing language data using deep neural networks to compute distinct task specific outputs, designed to minimize the classification error of either sentiment assignment or language identification for classifying multilingual language datasets,  (Jon, Abstract).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (US Pub. No. 2014/0257805, hereinafter ‘Ting’) in view of Chaturvedi et al. (Pub. No. US 2020/0311341, hereinafter ‘Chat’), in further view of Hashimoto et al. (NPL: “A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks”, hereinafter ‘Hash’).

Regarding claim 13, the rejection of claim 10 is incorporated and Ting in combination with Chat further teaches the system of claim 10, wherein the one or more input words comprise a pair of sentences… 
Ting and Chat do not expressly teach the limitation: … and the task-specific result characterizes the pair of sentences as having an entailment relationship, a contradiction relationship, or a neutral relationship.
Hash does expressly teach the limitation: …and the task-specific result characterizes the pair of sentences as having an entailment relationship, a contradiction relationship, or a neutral relationship. (claimed entailment at the sematic level, for the relates word comprising sentences as claimed pair of sentences, As depicted in Fig. 1, in Secs. 2.5 & 2.6: The next two tasks model the semantic relation-ships between two input sentences. The first task measures the semantic relatedness between two sentences. The output is a real-valued relatedness score for the input sentence pair. The second task is textual entailment [claimed and the task-specific result characterizes the pair of sentences as having an entailment relationship…], which requires one to deter-mine whether a premise sentence entails a hypoth-esis sentence…

    PNG
    media_image7.png
    548
    514
    media_image7.png
    Greyscale
)
The Ting, Chat, and Hash references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing a natural language processing techniques using neural network structures for multi-task machine learning tasks.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for using a deep neural network for natural language processing to model joint task, as disclosed by Hash with the method for natural language processing using deep neural networks for multi task learning as collectively disclosed by Ting and Chat.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods to perform multi tasks learning using joint task model that accounts for linguistic levels of morphology, syntax and semantics, (Hash, Abstract); Doing so allow for processing language data using deep neural networks to model sematic relationships to help improve success in solving increasingly complex tasks and allow for optimization of all model weights to improve a task’s ability for making inferences without interfering with other tasks,  (Hash, Abstract).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWATOSIN O ALABI/              Examiner, Art Unit 2129