Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims

The following claims are pending in this office action: 1-20

The following claims are amended: 1, 12,  and 19

The following claims are new: None

The following claims are cancelled: None

The following claims are rejected: 1-20

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 8/25/22 has been entered.
 
Response to Amendment
Applicant’s arguments, filed 8/25/22, with respect to claims 1-20 have been fully considered and are persuasive.  The previous rejection has been withdrawn and arguments have been responded into the body of the rejection below. 




Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4, 10-12, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chung (PCT No. WO2018126213Al , of record) in view of Phillips (US 2014/0195466)
Regarding claims 1, 12, and 19 Chung discloses A method of training a machine learning model, comprising:
generating a single machine learning model (reads on a single machine learning model comprising both teacher and student machine learning models/classifiers, 0066-0067;
Applicant is reminded that shifting the location of parts does not make an invention patentable.  See In re Japikse, 86 USPQ 70 (CCPA 1950); In re Larson, 144 USPQ 347 (CCPA 1965); and Nerwin v. Erlichman, 168 USPQ 177).) to perform a target task (tasks can be one large task comprising smaller or divided tasks e.g., translating one language/sentence to another or classifying, 0062-0070, especially 0069) based on different training datasets (each teacher model/classifier can use different data, e.g., divided data or sentences, 0063, 0069), wherein the single machine learning model (Fig. 1) includes a first sub-module (one or more teacher models/classifiers, Fig. 1) configured to perform a first task and a second sub-module (second teacher model) configured to perform a second task (each teacher model may classifier different data or different parts of a sentence, 00620-0070; performing multi-task learning, In one method a system obtains a respective set of training data for each of multiple machine learning tasks. For each of the machine learning tasks, the system configures a respective teacher machine learning model to perform the machine learning task by training the teacher machine learning model on the training data. The system trains a single student machine learning model to perform the multiple machine learning tasks using (i) the configured teacher machine learning models, and (ii) the obtained training data., abstract) wherein the first task (e.g., first part of a sentence, 0062-0070) and the second task (second part of a sentence, 0062-0070) are different (divided tasks, different parts of a sentence to be classified) than the target task (translating one full sentence or language to another, 0062-0070) of the single machine learning model (e.g., Fig.1 model comprising teacher and student model/classifier); 
obtaining a first training dataset having a first format; (Chung, Summary, 1st Para. discloses "In general, one innovative aspect of the subject matter described in this specification can be embodied in methods for obtaining a respective set of training data..." and Para. [0007] discloses "In some implementations the training data for each of the plurality of machine learning tasks comprises (i) an input text segment in an input language, and (ii) an output text segment in a target language that is different from the input language.)
obtaining a second training dataset having a second format, the second format being different than the first format; (Chung, Para. [0050] discloses "in some implementations the combined sets of training data for the machine learning tasks may include an equal distribution of text segments in different languages, e.g., the amount of training data for each machine learning task may be the same so that each language is equally represented in the combined training data. For example, in some cases a first set of available training data for performing machine translation for a first language pair, e.g., English and French, may be larger than a second set of available training data for performing machine translation for a second language pair, e.g., English and German")
training, using the first training dataset, the first sub-module to perform the first task, wherein the first sub-module is selected for training using the first training dataset based on the first format; (Chung, Para. [0032] discloses “The multiple teacher machine learning models 102a - 102d are machine learning models that are each configured to perform one of the multiple machine learning tasks 114 using training data 108. The training data 108 includes a respective set of training data for each of the multiple machine learning tasks 114. Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a - 102d. For example, a first set of training data, e.g., training data set 108a, may be used to configure a first teacher machine learning model, e.g., teacher machine learning model 102a, to perform a first machine learning task. A second set of training data, e.g., training data set 108b, may be used to configure a second teacher machine learning model, e.g., teacher machine learning model 108b, to perform a second machine learning task. A third set of training data, e.g., training data set 108c, may be used to configure a third teacher machine learning model, e.g., teacher machine learning model 102c, to perform a third machine learning task. A fourth set of training data, e.g., training data set 108d, may be used to configure a fourth teacher machine learning model, e.g., teacher machine learning model 108d, to perform a fourth machine learning task. For convenience, four teacher machine learning models are illustrated in FIG. 1, however in some implementations the system 100 may include fewer or more teacher machine learning models." And Figure 1) and 
training, using the second training dataset, the second sub-module to perform the second task, wherein the second sub-module is selected for training using the second training dataset based on the second format. (Chung, Para. [0032] discloses “The multiple teacher machine learning models 102a - 102d are machine learning models that are each configured to perform one of the multiple machine learning tasks 114 using training data 108. The training data 108 includes a respective set of training data for each of the multiple machine learning tasks 114. Each machine learning task, and in turn each set of training data,
corresponds to one of the teacher machine models 102a - 102d. For example, a first set of training data, e.g., training data set 108a, may be used to configure a first teacher machine learning model, e.g., teacher machine learning model 102a, to perform a first machine learning task. A second set of training data, e.g., training data set 108b, may be used to configure a second teacher machine learning model, e.g., teacher machine learning model 108b, to perform a second machine learning task. A third set of training data, e.g., training data set 108c, may be used to configure a third teacher machine learning model, e.g., teacher machine learning model 102c, to perform a third machine learning task. A fourth set of training data, e.g., training data set 108d, may be used to configure a fourth teacher machine learning model, e.g., teacher machine learning model 108d, to perform a fourth machine learning task. For convenience, four teacher machine learning models are illustrated in FIG. 1, however in some implementations the system 100 may include fewer or more teacher machine learning models." And Figure 1).
	However, Chung fails to particularly call for the group of teacher modules/classifiers to be a single machine learning module.
	Phillips teaches single machine learning module (ensemble/module comprising one or more machine learning modules Fig. 2B), first task and second task different than target task (“the data management product 108 may break, divide, or split a predictive task (e.g., an analysis request or other query) into multiple streams or threads, and may create different instances of the same learned functions, machine learning ensembles, or other machine learning of the machine learning module 102 for each stream or thread such that multiple instances execute at the same time with different data from the data management product 108”, 0047).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and it is well known to break computer programs into subroutines comprising lines of code and shifting the location of parts does not make an invention patentable.  See In re Japikse, 86 USPQ 70 (CCPA 1950); In re Larson, 144 USPQ 347 (CCPA 1965); and Nerwin v. Erlichman, 168 USPQ 177).  Using an ensemble of classifiers allows for subtasks to be operated on in parallel and then have one or more of their outputs combined into one larger task.


As per claim 2, Chung teaches the method of claim 1, Chung further teaches further comprising;
selecting the first submodule for being trained by the first training dataset based on the first format of the first training dataset; (Chung, Para. [0032] discloses "Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a - 102d. For example, a first set of training data, e.g., training data set 108a, may be used

to configure a first teacher machine learning model, e.g., teacher machine learning model 102a, to perform a first machine learning task} and 
selecting the second sub-module for being trained by the second training dataset based on the second format of the second training dataset. (Chung, Para. [0032] discloses “A second set of training data, e.g., training data set 108b, may be used to configure a second teacher machine learning mode!, e.g., teacher machine learning model 108b, to perform a second machine learning task}.


As per claim 4, Chung teaches the method of claim 1, Chung further teaches:

wherein a combination of a portion of the first training dataset and a portion of the second training dataset is processed by the single machine learning model. (Chung, Fig. 3 step 306 discloses processing selected one or more subsets of data and Para. [0063] discloses "the system trains the single student machine learning model to perform each of the multiple machine learning tasks using (i) the selected one or more subsets...")


As per claim 10, Chung teaches the method of claim 1, Chung further teaches: wherein the first format of the first training dataset includes at least one of a name of the first training dataset or a task for which the first training dataset is applicable, (Chung, Para. [0032] discloses “The training data 108 includes a respective set of training data for each of the multiple machine learning tasks 114. Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a - 102d." and Para. [0036] discloses "in cases where a machine learning model is trained to perform image classification, the model may be trained with an image of an object, e.g., a tree, with a
labeled output, e.g., "tree."") and wherein the second format of the second training dataset includes at least one of a name of the second training dataset or a task for which the second training dataset is applicable. (Chung, Para. [0032] discloses ''The training data 108 includes a respective set of training data for each of the multiple machine learning tasks 114. Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a - 102d." and Para. [0036] discloses "in cases where a machine learning model is trained to perform image classification, the model may be trained with an image of an object, e.g., a tree, with a labeled output, e.g., "tree."")


As per claim 11, Chung teaches the method of claim 1, Chung further teaches:

wherein the single machine learning model is a neural network model. (Chung, Para. [0033] discloses "In these cases the multiple teacher machine learning models may include neural networks" and Para. [0069] discloses "For example, in cases where the machine learning models are neural networks..."))


As per claim 12 (see rejection of claim 1), 

Chung teaches a machine learning system comprising:

an input device configured to receive a first training dataset and a second training dataset, the first training dataset having a first format and the second training dataset having a second format, wherein the first format is different than the second format; (Chung, Summary, 1st Para. discloses "In general, one innovative aspect of the subject matter described in this specification can be embodied in methods for obtaining a respective set of training data..." and Para. [0007] discloses "In some implementations the training data for each of the plurality of machine learning tasks comprises (i) an input text segment in an input language, and (ii) an output text segment in a target language that is different from the input language." And Para. [0050] discloses "n some implementations the combined sets of training data for the machine learning tasks may include an equal distribution of text segments in different languages, e.g., the amount of training data for each machine learning task may be the same so that each language is equally represented in the combined training data. For example, in some cases a first set of available training data for performing machine translation for a first language pair, e.g., English and French, may be larger than a second set of available training data for performing machine translation for a second language pair, e.g., English and German")
a single machine learning model including a first sub-module with a first plurality of neural network layers for performing a first task and a second sub-module with a second plurality of neural network layers for performing a second task; (Chung, Para. [0032] discloses "The multiple teacher machine learning models 102a - 102d are machine learning models that are each configured to perform one of the multiple machine learning tasks 114 using training data 108. The training data 108 includes a respective set of training data for each of the multiple machine learning tasks 114. Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a - 102d. For example, a first set of training data, e.g., training data set 108a, may be used to configure a first teacher machine learning model, e.g., teacher machine learning model 102a, to perform a first

machine learning task. A second set of training data, e.g., training data set 108b, may be used to configure a second teacher machine learning model, e.g., teacher machine learning model 108b, to perform a second machine learning task. A third set of training data, e.g., training data set 108c, may be used to configure a third teacher machine learning model, e.g., teacher machine learning model 102c, to perform a third machine learning task. A fourth set of training data, e.g., training data set 108d, may be used to configure a fourth teacher machine learning model, e.g., teacher machine learning model 108d, to perform a fourth machine learning task. For convenience, four teacher machine learning models are illustrated in FIG. 1, however in some implementations the system 100 may include fewer or more teacher machine learning models." And Figure 1)
a sub-module determination engine configured to:

select, based on the first format, the first sub-module for being trained by the first training dataset, wherein the first plurality of neural network layers of the first sub-module are trained using the first training dataset to perform the first task; (Chung, Para. [0032] discloses "The multiple teacher machine learning models 102a - 102d are machine learning models that are each configured to perform one of the multiple machine learning tasks 114 using training data 108. The training data 108 includes a respective set of training data for each of the multiple machine learning tasks 114. Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a - 102d. For example, a first set of training data, e.g., training data set 108a, may be used to configure a first teacher machine learning model, e.g., teacher machine learning model 102a, to perform a first machine learning task. A second set of training data, e.g., training data set 108b, may be used

to configure a second teacher machine learning model, e.g., teacher machine learning model 108b, to perform a second machine learning task. A third set of training data, e.g., training data set 108c, may be used to configure a third teacher machine learning model, e.g., teacher machine learning model 102c, to perform a third machine learning task. A fourth set of training data, e.g., training data set 108d, may be used to configure a fourth teacher machine learning model, e.g., teacher machine learning model 108d, to perform a fourth machine learning task. For convenience, four teacher machine learning models are illustrated in FIG. 1, however in some implementations the system 100 may include fewer or more teacher machine learning models." And Figure 1)
and select, based on the second format, the second sub-module for being trained by the

second training dataset, wherein the second plurality of neural network layers of the second sub-module are trained using the second training dataset to perform the second task; (Chung, Para. [0032] discloses "The multiple teacher machine learning models 102a - 102d are machine learning models that are each configured to perform one of the multiple machine learning tasks 114 using training data 108. The training data 108 includes a respective set of training data for each of the multiple machine learning tasks 114. Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a
- 102d. For example, a first set of training data, e.g., training data set 108a, may be used to configure a first teacher machine learning model, e.g., teacher machine learning model 102a, to perform a first machine learning task. A second set of training data, e.g., training data set 108b, may be used to configure a second teacher machine learning model, e.g., teacher machine learning model 108b, to perform a second machine learning task. A third set of

training data, e.g., training data set 108c, may be used to configure a third teacher machine learning model, e.g., teacher machine learning model 102c, to perform a third machine learning task. A fourth set of training data, e.g., training data set 108d, may be used to configure a fourth teacher machine learning model, e.g., teacher machine learning model 108d, to perform a fourth machine learning task. For convenience, four teacher machine learning models are illustrated in FIG. 1, however in some implementations the system 100 may include fewer or more teacher machine learning models." And Figure 1)
and an output device configured to output a machine learning output of the single machine learning model based on the first sub-module or the second sub-module. (Chung, Para. [0065] discloses ''The system may process an augmented subset using the student machine learning model to generate a respective student machine learning model output.")

As per claim 19 (see rejection of claim 1 above), a non-transitory computer readable medium having stored thereon instructions that when executed by one or more processors, cause the one or more processors to:
generate a single machine learning model configured to perform one or more tasks based on different training datasets, wherein the single machine learning model includes a first sub-module configured to perform a first task and a second sub-module configured to perform a second task; (Chung, Abstract discloses Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing multi-task learning. In one method a system obtains a respective set of training data for each of multiple machine learning tasks. For each of the machine learning tasks, the system configures a respective teacher machine learning model to perform the machine learning task by training the teacher

machine learning model on the training data. The system trains a single student machine learning model to perform the multiple machine learning tasks using (i) the configured teacher machine learning models, and (ii) the obtained training data.")
obtain a first training dataset having a first format; (Chung, Summary, 1st Para. discloses

"In general, one innovative aspect of the subject matter described in this specification can be embodied in methods for obtaining a respective set of training data..." and Para. [0007] discloses "In some implementations the training data for each of the plurality of machine learning tasks comprises (i) an input text segment in an input language, and (ii) an output text segment in a target language that is different from the input language.)
obtain a second training dataset having a second format, the second format being different than the first format; (Chung, Para. [0050] discloses "n some implementations the combined sets of training data for the machine learning tasks may include an equal distribution of text segments in different languages, e.g., the amount of training data for each machine learning task may be the same so that each language is equally represented in the combined training data. For example, in some cases a first set of available training data for performing machine translation for a first language pair, e.g., English and French, may be
larger than a second set of available training data for performing machine translation for a second language pair, e.g., English and German") train, using the first training dataset, the first sub-module to perform the first task, wherein the first sub-module is selected for training using the first training dataset based on the first format; (Chung, Para. [0032] discloses ''The multiple teacher machine learning models 102a - 102d are machine learning models that are each configured to perform one of the multiple machine learning tasks 114 using training data 108. The training data 108 includes a respective set of training data for each of the multiple machine learning tasks 114. Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a - 102d. For example, a first set of training data, e.g., training data set 108a, may be used to configure a first teacher machine learning model, e.g., teacher machine learning model 102a, to perform a first machine learning task. A second set of training data, e.g., training data set 108b, may be used to configure a second teacher machine learning model, e.g., teacher machine learning model 108b, to perform a second machine learning task. A third set of training data, e.g., training data set 108c, may be used to configure a third teacher machine learning model, e.g., teacher machine learning model 102c, to perform a third machine learning task. A fourth set of training data, e.g., training data set 108d, may be used to configure a fourth teacher machine learning model, e.g., teacher machine learning model 108d, to perform a fourth machine learning task. For convenience, four teacher machine learning models are illustrated in FIG. 1, however in some implementations the system 100 may include fewer or more teacher machine learning models." And Figure 1)
and train, using the second training dataset, the second sub-module to perform the second task, wherein the second sub-module is selected for training using the second training dataset based on the second format. (Chung, Para. [0032] discloses "The multiple teacher machine learning models 102a - 102d are machine learning models that are each configured to perform one of the multiple machine learning tasks 114 using training data 108. The training data 108 includes a respective set of training data for each of the multiple machine

learning tasks 114. Each machine learning task, and in turn each set of training data, corresponds to one of the teacher machine models 102a - 102d. For example, a first set of training data, e.g., training data set 108a, may be used to configure a first teacher machine learning model, e.g., teacher machine learning model 102a, to perform a first machine learning task. A second set of training data, e.g., training data set 108b, may be used to configure a second teacher machine learning model, e.g., teacher machine learning model 108b, to perform a second machine learning task. A third set of training data, e.g., training data set 108c, may be used to configure a third teacher machine learning model, e.g., teacher machine learning model 102c, to perform a third machine learning task. A fourth set of training data, e.g., training data set 108d, may be used to configure a fourth teacher machine learning model, e.g., teacher machine learning model 108d, to perform a fourth machine learning task. For convenience, four teacher machine learning models are illustrated in FIG. 1, however in some implementations the system 100 may include fewer or more teacher machine learning models." And Figure 1)
Claim Rejections - 35 USC§ 103


Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chung and Phillips, as set forth above, in view of U.S. Pub. No. US 20170069327 Al to Heigold, et al. (hereinafter, "Heigold")
As per claim 3, Chung teaches the method of claim 1, Chung further teaches:

obtaining an output from the first submodule based on the training of the first sub-- module using the first training dataset; (Chung, Para. [0035] discloses "Each of the teacher machine learning models 102a - 102d may be configured to perform a respective machine learning task from the multiple machine learning tasks 114 using standard machine learning techniques	The output may then be compared to a known training output included in the
set of training data by computing a loss function, and backpropagating loss function gradients

with respect to current neural network parameters to determine an updated set of neural network parameters that minimizes the loss function/')
Chung fails to explicitly teach:

and selecting an additional dataset for training the first sub-module based on {[the

obtained output]]

However, Heigold teaches:

and selecting an additional dataset for training the first sub-module based on [[the obtained output]] (Heigold, Para. [0081] discloses "The process 300 may then repeat stages 304-312, and continue selecting additional sets
of training data for additional training iterations until a limit is reached For example, after

a number of training iterations, the neural network may be tested against a held-out set of data that was not used during the training process 300. Training may continue until tests on the held-out set indicate that the neural network has achieved at least the target performance level.")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to select additional training data as disclosed by Heigold. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the accuracy of a machine learning model by selecting additional datasets based off whether or not the output of the model satisfies a predetermined threshold.


As per claim 13, Chung teaches the machine learning system of claim 12, Chung further teaches:
wherein the output device is configured to obtain an output from the first sub-module based on the training of the first sub- module using the first training dataset, (Chung, Para. [0035] discloses "Each of the teacher machine learning models 102a - 102d may be configured to perform a respective machine learning task from the multiple machine learning tasks 114 using standard machine learning techniques	The output may then be compared to a known
training output included in the set of training data by computing a loss function, and backpropagating loss function gradients with respect to current neural network parameters to determine an updated set of neural network parameters that minimizes the loss function."}

However, Chung fails to explicitly teach:

and wherein [[the input device is configured to]] select an additional dataset for

training [[the first sub-module based on the obtained output]] (Heigold, Para. [0081] discloses "The process 300 may then repeat stages 304-312, and continue selecting additional sets
of training data for additional training iterations until a limit is reached	For example, after

a number of training iterations, the neural network may be tested against a held-out set of data that was not used during the training process 300. Training may continue until tests on the held-out set indicate that the neural network has achieved at !east the target performance level.")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to select additional training data as disclosed by Heigold. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the accuracy of a machine learning model by selecting additional datasets based off whether or not the output of the model satisfies a predetermined threshold.
Claim Rejections - 35 USC § 103


Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chung and Phillips, as set forth above in view of U.S. Pub. No. US20190102098 to Biswas, et al. (hereinafter, "Biswas")
As per claim 5, Chung teaches the method of claim 4, while Chung teaches the first and second training datasets (see Chung, Fig. 3, and Para. [00631), Chung fails to explicitly teach:

wherein a percentage of data from [[the first training dataset and a percentage of data from the second training dataset included in the combination]] are configurable using one or more parameters input to the single machine learning model
However, Biswas teaches:

wherein a percentage of data from [[the first training dataset and a percentage of data from the second training dataset included in the combination]] are configurable using one or more parameters input to the single machine learning model (Biswas, Abstract discloses "he machine learning server computer displays through a graphical user interface, a plurality of selectable parameter options, each of which defining a value for a
machine learning parameter. The machine learning server computer receives a particular input dataset.")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to select input parameters as disclosed by Biswas. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the accuracy of a machine learning model by selectively fine tuning parameters for a model such that the model can be appropriately adapted and fine tuned to input.


As per claim 14, Chung as shown above teaches the machine learning system of claim 12, while Chung teaches the first and second training datasets (see Chung, Fig. 3, and Para. [00631), Chung fails to explicitly teach:

wherein [[the input device is configured to]] receive one or more parameters, and wherein [[the single machine learning model]] is configured to use the one or more parameters to determine a percentage of data from [[the first training dataset and a percentage of data from the second training dataset]] to use for training [[the first sub- module and the second sub-module]].
However, Biswas teaches:

wherein [[the input device is configured to]] receive one or more parameters, and wherein [[the single machine learning model]] is configured to use the one or more parameters to determine a percentage of data from [[the first training dataset and a percentage of data from the second training dataset]] to use for training [[the first sub- module and the second sub-module]]. (Biswas, Abstract discloses "he machine learning server computer displays through a graphical user interface, a plurality of
selectable parameter options, each of which defining a value for a

machine learning parameter. The machine learning server computer receives a particular input dataset.")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to select input parameters as disclosed by Biswas. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the accuracy of a machine learning model by selectively fine tuning parameters for a model such that the model can be appropriately adapted and fine tuned to input.

Claim Rejections - 35 USC § 103

Claims 6, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chung and Phillips, as set forth above,

in view of U.S. Patent No. US10089576 to Gao, et al. (hereinafter, "Gao")

As per claim 6, Chung teaches the method of claim 1, Chung fails to explicitly teach:

wherein the single machine learning model includes at least one shared layer included in the first sub-module and the second sub-module, and includes at least one non-shared layer included in the first sub-module and not included in the second sub- module.
However, Gao teaches:

wherein the single machine learning model includes at least one shared layer included in the first sub-module and the second sub-module, and includes at least one non-shared layer included in the first sub-module and not included in the second sub- module. (Gao,, Col 11, Line 1 discloses "FIG. 5 illustrates an example process 500 of a multi-task DNN for representation learning. Process 500 has similarities to process 400 except that, among other things, some operational layers share tasks among one another. For example,
process 400 includes unshared operational layers Lo, L1, L2, and L3 that individually

represent task-specific outputs. On the other hand, process 500 includes

lower layers 502 that are shared across different tasks, whereas the top layers represent task- specific outputs. In particular, tasks, which may be disparate, may
be shared within shared operational layers slo, sl1, and sl2. Tasks may be disparate in the sense that operations involved in the respective tasks may be fundamentally and markedly distinct in character. For example, operations to perform a task for classification may be disparate from operations to perform a task for ranking In another example,

disparate tasks may include a scenario where data generation processes differ. Also, various domains in query classification may be considered disparate tasks.")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to use shared layers as disclosed by Gao. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the performance of a machine learning model as interpretability gets improved as data is shared between layers which thereby improves the features being utilized and learned.


As per claim 15, Chung as shown above teaches the machine learning system of claim 12, Chung fails to explicitly teach:
wherein [[the first sub-module and the second sub-module]] include at least one shared neural network layer included in both the first sub-module and the second sub-module, and wherein the [Uirst sub-module]] includes at least one non-shared neural network layer included in [[the first sub-module and not included in the second sub-module.]]
However, Gao teaches:

wherein [[the first sub-module and the second sub-module]] include at least one shared neural network layer included in both the first sub-module and the second sub-module, and wherein the [Uirst sub-module]] includes at least one non-shared neural network layer included in [[the first sub-module and not included in the second sub-module.]] (Gao,, Col 11, Line 1 discloses "FIG. 5 illustrates an example process 500 of a multi-task DNN for representation learning. Process 500 has similarities to process 400 except that, among other

things, some operational layers share tasks among one another. For example,

process 400 includes unshared operational layers Lo, L1, Lz, and '-3 that individually represent task-specific outputs. On the other hand, process 500 includes
lower layers 502 that are shared across different tasks, whereas the top layers represent task-

specific outputs. In particular, tasks, which may be disparate, may

be shared within shared operational layers slo, sli, and slz. Tasks may be disparate in the sense that operations involved in the respective tasks may be fundamentally and markedly distinct in character. For example, operations to perform a task for classification may be disparate from operations to perform a task for ranking In another example,
disparate tasks may include a scenario where data generation processes differ. Also, various domains in query classification may be considered disparate tasks.")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to use shared layers as disclosed by Gao. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the performance of a machine learning model as interpretability gets improved as data is shared between layers which thereby improves the features being utilized and learned.


As per claim 20, Chung teaches the non-transitory computer-readable medium of claim

19, Chung fails to explicitly teach:

wherein [[the first sub-module and the second sub-module]] include at least one shared neural network layer included in both the first sub-module and the second sub-module, and

wherein the [Uirst sub-module]] includes at least one non-shared neural network layer included in [[the first sub-module and not included in the second sub-module.]]
However, Gao teaches:

wherein [[the first sub-module and the second sub-module]] include at least one shared neural network layer included in both the first sub-module and the second sub-module, and wherein the [Uirst sub-module]] includes at least one non-shared neural network layer included in [[the first sub-module and not included in the second sub-module.]] (Gao,, Col 11, Line 1 discloses "FIG. 5 illustrates an example process 500 of a multi-task DNN for representation learning. Process 500 has similarities to process 400 except that, among other things, some operational layers share tasks among one another. For example,
process 400 includes unshared operational layers Lo, L1, Lz, and L3 that individually

represent task-specific outputs. On the other hand, process 500 includes

lower layers 502 that are shared across different tasks, whereas the top layers represent task-

specific outputs. In particular, tasks, which may be disparate, may

be shared within shared operational layers slo, sli, and slz. Tasks may be disparate in the sense that operations involved in the respective tasks may be fundamentally and markedly distinct in character. For example, operations to perform a task for classification may be disparate from operations to perform a task for ranking In another example,
disparate tasks may include a scenario where data generation processes differ. Also, various domains in query classification may be considered disparate tasks.")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as

disclosed by Chung to use shared layers as disclosed by Gao. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the performance of a machine learning model as interpretability gets improved as data is shared between layers which thereby improves the features being utilized and learned.

Claim Rejections - 35 USC § 103

Claims 7, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chung and Phillips, as set forth above, in view of Gao, further in view of U.S. Pub. No. US20190114528Al to Xiong, et al. (hereinafter, "Xiong'')
As per claim 7, the combination of Chung and Gao as shown above teaches the method of claim 6, while Chung and Gao discloses shared layers and training data, the combination of Chung and Gao fails to explicitly teach:
wherein [[the at least one shared layer]] is trained using [[the first training dataset and the second training dataset]], and wherein the at least one [[non-shared layer]] is trained using [[the first training dataset]]
However, Xiong teaches:

wherein [[the at least one shared layer]] is trained using [[the first training dataset and the second training dataset]], and wherein the at least one [[non-shared layer]] is trained using [[the first training dataset]] (Xiong, Para. [0046] discloses "In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers, independent layers associated with the specific task")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as

disclosed by Chung to utilize training of layers as disclosed by Xiong. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the performance of a machine learning model as training increases the output accuracy of a model.


As per claim 16, the combination of Chung and Gao as shown above teaches the machine learning system of claim 15, while Chung and Gao discloses shared layers and training data, the combination of Chung and Gao fails to explicitly teach:
wherein the [lat least one shared neural network layer]] is trained using [[the first

training dataset and the second training dataset]], and wherein the [lat least one non-shared neural network layer]] is trained using [[the first training dataset and not the second training dataset.]] Xiong, Para. [0046] discloses "In some embodiments, during the multi-
task learning, for a specific task, the model module 330 trains shared layers, independent layers associated with the specific task")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to utilize training of layers as disclosed by Xiong. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the performance of a machine learning model as training increases the output accuracy of a model.

Claim Rejections - 35 USC § 103
Claims 8-9. And 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chung and Phillips, as set forth above, in view of Gao, further in view of U.S. Patent No. US10713593B2 to Chen, et al. (hereinafter, "Chen")
As per claim 8, Chung teaches the method of claim 1, while Chung teaches the first and second sub modules, Chung fails to explicitly teach:
obtaining a third training dataset

and training, using the third training dataset, [[the first sub-module and the second sub-module]] to perform at least a third task
However, Chen teaches:

obtaining a third training dataset (Chen, Col. 11, Line 12 discloses, ''The system trains the machine learning model to perform the machine learning task that was not represented by the training data obtained at step 302 using the obtained parallel training data")
and training, using the third training dataset, [[the first sub-module and the second sub-module]] to perform at least a third task (Chen, Col. 11, Line 32 discloses "By incrementally training the multi-task machine learning model on additional parallel data for zero-shot directions, i.e., for machine learning tasks that were not represented by the originally obtained training data, the system may further refine the multi-task machine
learning model and improve the accuracy of results obtained from using the machine learning model at run time")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to utilize training as disclosed by Chen. 





As per claim 9, Chung teaches the method of claim 1, while Chung teaches the first and second sub modules, Chung fails to explicitly teach:
obtaining a third training dataset

and training, using [[the first training dataset]] and the third training dataset, [[the first sub-module]] to perform at least the first task
However, Chen teaches:

obtaining a third training dataset (Chen, Col. 11, Line 12 discloses, ''The system trains the machine learning model to perform the machine learning task that was not represented by the training data obtained at step 302 using the obtained parallel training data")
and training, using [[the first training dataset]] and the third training dataset, [[the first sub-module]] to perform at least the first task (Chen, Col. 11, Line 32 discloses "By incrementally training the multi-task machine learning model on additional parallel data for zero-shot directions, i.e., for machine learning tasks that were not represented by the originally obtained training data, the system may further refine the multi-task machine learning model and improve the accuracy of results obtained from using the machine learning model at run time")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to utilize training as disclosed by Chen. The combination would have been





AS per claim 17, Chung as shown above teaches the machine learning system of claim 12, Chung fails to explicitly teach:
wherein the [[input device is configured to]] obtain a third training dataset,

and wherein [[the first plurality of neural network layers of the first sub-module and the second plurality of neural network layers of the second sub-module]] are trained using the third training dataset to perform at least a third task.
However, Chen teaches:

wherein the [[input device is configured to]] obtain a third training dataset, (Chen, Col. 11, Line 12 discloses, ''The system trains the machine learning model to perform the machine learning task that was not represented by the training data obtained at step 302 using the obtained parallel training data")
and wherein [[the first plurality of neural network layers of the first sub-module and the second plurality of neural network layers of the second sub-module]] are trained using the third training dataset to perform at least a third task. (Chen, Col. 11, Line 32 discloses "By incrementally training the multi-task machine learning model on additional parallel data for zero-shot directions, i.e., for machine learning tasks that were not represented by the originally obtained training data, the system may further refine the multi-task machine learning model and improve the accuracy of results obtained from using the machine learning model at run time")

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to utilize training as disclosed by Chen. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the performance of a machine learning model as training increases the output accuracy of a model.


AS per claim 18, Chung as shown above teaches the machine learning system of claim 12, Chung fails to explicitly teach:
wherein the [[input device is configured to]] obtain a third training dataset,

and wherein [[the first plurality of neural network layers of the first sub-module]] are trained using the first training dataset and the third training dataset to perform at least a first task.
However, Chen teaches:

wherein the [[input device is configured to]] obtain a third training dataset, (Chen, Col. 11, Line 12 discloses, "The system trains the machine learning model to perform the machine learning task that was not represented by the training data obtained at step 302 using the obtained parallel training data")
and wherein [[the first plurality of neural network layers of the first sub-module]] are

trained using the first training dataset and the third training dataset to perform at least a first task. (Chen, Col. 11, Line 32 discloses "By incrementally training the multi-task machine learning model on additional parallel data for zero-shot directions, i.e., for machine learning tasks that were not represented by the originally obtained training data, the system may

further refine the multi-task machine learning model and improve the accuracy of results obtained from using the machine learning model at run time")
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning model as disclosed by Chung to utilize training as disclosed by Chen. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the performance of a machine learning model as training increases the output accuracy of a model.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID R VINCENT whose telephone number is (571)272-3080. The examiner can normally be reached ~Mon-Fri 12-8:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 5712703428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVID R VINCENT/Primary Examiner, Art Unit 2123