DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Acknowledgement is made of Applicant's claim amendments on 9/24/2020. The claim amendments are entered. Presently, claims 1-20 remain pending. Claims 1, 9, and 16 have been amended.

Response to Arguments
Applicant's arguments filed on 9/24/2020 have been fully considered but they are not persuasive.

Applicant argues that the combination of the cited references fail to cure the deficiencies because they do not teach the newly amended claim limitations (Applicant’s Reply pgs. 12-18). While the cited references do not explicitly teach the newly amended claim limitations, their combination does teach the amended claim limitations when considered in conjunction with Contreras, which has been incorporated into the rejection of the independent claims as necessitated by Applicant’s amendments. 

Applicant also argues Chung allegedly does not teach the claim limitations because it uses a student machine learning model with various teacher machine learning models to perform Chung can denote ML components, wherein the ML models can perform various tasks and generate a plurality of outputs as recited in the claims.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly 

Claims 1-4, 7-11, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Chung et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0325308, hereinafter Chung) in view of Schuster et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0188566, hereinafter Schuster), Bradski et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2008/0050014, hereinafter Bradski), and Contreras et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2015/0235130, hereinafter Contreras). 

Regarding claim 1, Chung teaches:
A method ([0031]-[0032] and [0074]: “method”) of configuring a processing system with an application utilizing a plurality of machine learning components ([0075]-[0076]: describing a “data processing apparatus” that can encompass a variety of hardware that can enable a “multi-task learning system 100 [that] includes multiple teacher machine learning models 102a-102d [and] a single student machine learning model 104” ([0033]) that can perform various machine learning application tasks, including “natural language” processing tasks ([0039]). Wherein the learning “system [can be] implemented as computer programs on one or more computers in one or more locations” ([0033]).), the method comprising: 
training each machine learning component of the plurality of machine learning components individually with respective training data to perform a corresponding task ([0034], [0044], and [0059]: describing that the various machine learning (ML) models comprising machine learning components can perform “machine learning tasks 114 using [respective] training data” for the corresponding models.); 
training the application on an initial set of training data ([0034]: describing “a first set of training data” that can be used for training the machine learning models.);
executing the application with the trained machine learning components on the processing system using another set of training data to produce a plurality of different outputs for each of a plurality of elements of the other set of training data ([0044]: describing that the natural language processing task being performed by the learning system comprises a configuration of “[t]he student machine learning model 104 … to perform each of the multiple machine learning tasks 114 using the configured multiple teacher machine learning models 102a-102d and the training data 108.” See also [0037]-[0038]: describing the various outputs for the respective training data sets.),…;
identifying, for each of the elements of the other set of training data, one or more outputs among the different outputs of the application that concur with … data ([0053]: describing that the “system may then compare the training output [of the teacher ML models] to a known output included in the set of training data by computing a loss function” followed by a training of the student ML model using the training data derived from the teacher ML models. Similarly, see [0037]-[0038]: further describing the various respective outputs.); 
identifying the candidate outputs of the trained machine learning component that produce the identified outputs concurring with the … data ([0068] and [0070]-[0071]: describing that the outputs can be identified and compared with each other to determine if there a match between the various outputs, wherein the machine learning model parameters can be adjusted accordingly to obtain the desired outputs.); 2AMENDMENT IN RESPONSE TO THE OFFICE ACTION MAILED AUGUST 5, 2020 APPLICATION No. 15/444,734 
…; and 
adapting the plurality of trained machine learning components to select based on the weighting a candidate output from the trained machine learning component that provides an identified output concurring with the .. data ([0045]: describing that “[t]he generated student machine learning model output may be compared to the generated teacher machine learning output, and used to determine an updated set of student machine learning model parameters that minimizes the difference between the generated student machine learning model output and the teacher machine learning output.” Wherein the outputs of the trained teacher ML models being utilized in the student ML model can be compared with a known training output by adjusting parameters of the models ([0070]-[0072]), in addition to being evaluated with weights to update the machine learning models ([0053]).)
and …
…for an inquiry ([0047]: describing the received data input can be a query for translation and the output is the translation in another language.). 

While Chung teaches the limitations of claim 1, Chung does not explicitly teach: “ground truth” on lines 15, 17 and 22. Schuster discloses the claim limitations, teaching: a training of the machine learning system and models on an “initial training data” with corresponding “ground truth pairs” (Schuster [0026], [0028], and [0038]). Similarly, see Schuster [0039]: “The system then generates modified training data from the initial training data (step 206). For example, the system generates modified training data by generating, for each of one or more of the training examples in the initial training data, an auxiliary output for the training example from the ground truth output for the training example.” 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the models and system in the cited references to include the ground truth data in Schuster. Doing so would enable a “machine learning system, implemented as computer programs on one or more computers in one or more locations can train a machine learning model to perform a machine learning task, e.g., a structured prediction machine learning task.” (Schuster [0021]). 

While the cited references teach the limitations of claim 1, they do not explicitly teach: 
“wherein each different output is based on a corresponding one of a first plurality of candidate outputs generated for each of the elements by a trained machine learning component of the plurality of trained machine learning components performing the corresponding task” on lines 10-13 and “to propagate the selected candidate output through remaining machine learning components to produce a result of the application” on lines 22-24. Bradski discloses the claim limitations, teaching:
“wherein each different output is based on a corresponding one of a first plurality of candidate outputs generated for each of the elements by a trained machine learning component of the plurality of trained machine learning components performing the corresponding task”: describing that the plurality of processing units can perform tasks by executing code, e.g. classification code, on a plurality of components, e.g. classification components, to generate a plurality of different decision outputs (Bradski [0018]-[0019]). Similarly, see also Bradski [0024]: describing the classification decision outputs as well as the validation decision outputs of the corresponding classification decision outputs.  
“to propagate the selected candidate output through remaining machine learning components to produce a result of the application”: describing that a manager of the processing units “may select the classification value selected by the majority of classification components 10 a, 10 b . . . 10 n or take the average of the classification decisions from the classification components 10 a, 10 b . . . 10 n to determine a final classification value for the received data” (Bradski [0018]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the models and system in the cited references to include the techniques in Bradski. Doing so would enable a method and system “for training and using classification components on multiple processing units. A plurality of processing units each has a memory including one of a plurality of subsets of a set of data points. At least two of the processing units have different subsets of data points. A plurality of classification components are executed by the processing units. Classification components executing at the processing units are trained, wherein each classification component is trained with the subset of data points in the memory of the processing unit that is executing the classification component.” (Bradski Abstract). 
While the cited references teach the limitations of claim 1, they do not explicitly teach: “weighting each of the first plurality of candidate outputs, wherein weighting for the identified candidate outputs is greater relative to weighting for other candidate outputs” on lines 18-19. Contreras discloses the claim limitations, teaching: that the candidate answers can be weighted using scores, wherein the scores of the various answers are compared with each other in correlation with predetermined thresholds to determine the ranking with which to present the answers to a user (Contreras [0048]-[0050]). Wherein the statistical analysis with the weighting and scores are “repeated for each of the candidate answers until the Watson™ QA system identifies candidate answers that surface as being significantly stronger [i.e. with a weight/score that is greater] than others and thus, generates a final answer, or ranked set of answers, for the input question.” (Contreras [0031]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the models and system in the cited references to include the weighting in Contreras. Doing so would enable “question answering system ranks the plurality of candidate answers and outputs one or more of the candidate answers” (Contreras Abstract). Whereby the “[c]andidate answer ranking 417 ranks the candidate answers 414 by confidence score and outputs one or candidate answers 402 having a highest confidence score” (Contreras [0073]). 

Regarding claim 2, Chung further teaches:
The method of claim 1 further comprising: 
generating the first plurality of candidate outputs from each of the plurality of machine learning components based on the other set of training data ([0037]: describing that “cases where one or more of the teacher machine learning models are neural networks, each neural network, e.g., teacher machine learning model 102a, may be trained on a respective set of training data, e.g., training data set 108a, by processing a training input included in the set of training data to generate a corresponding output according to a given machine learning task.” See also Fig. 1: showing the student ML model and teacher ML models, wherein each teacher ML models can generate its own respective outputs ([0038]) for correlation with a student ML model. 
See also [0062]-[0064]: describing the use of training data subsets to train the teacher ML models and generating the corresponding outputs based on the selected subset of training data.); and 
training a component model for each of the plurality of machine learning components by executing the application for each of the first plurality of candidate outputs generated from that component and adjusting the component model based on results produced by the application ([0037]: describing that “the output [of the teacher ML model] may then be compared to a known training output included in the set of training data by computing a loss function, and backpropagating loss function gradients with respect to current neural network parameters to determine an updated set of neural network parameters that minimizes the loss function.” 
See also [0066]-[0067]: describing the use of the teacher ML models trained using the data subsets that is then used to train the student ML models, whereby the “system may then adjust the values of student machine learning model parameters to match the generated student machine learning model to a corresponding generated teacher machine learning model output.”).


Regarding claim 3, Chung further teaches:
The method of claim 2 further comprising: 
executing the application with the trained component model for each of the plurality of machine learning components by ([0033]: describing an example “multi-task learning system 100. The multi-task learning system 100 includes multiple teacher machine learning models 102a-102d, a single student machine learning model 104 and an augmentation module 106. The system 100 is an example of a system implemented as computer programs on one or more computers in one or more locations, in which the systems, components, and techniques described below can be implemented”. Wherein the system can execute the natural language processing tasks.): 
generating a second plurality of candidate outputs from each machine learning component ([0034]: describing that the teacher ML model can generate a second of outputs from a second set of training data. Wherein this data can be used to train the student ML model as part of the overall multi-task learning system so that the overall system can generate outputs ([0044]-[0045]).) ; 
applying the trained component model for each machine learning component to select a candidate output for that component from among the second plurality of candidate outputs generated from that component ([0044]-[0045]: describing that the respective outputs from the teacher ML models, which can comprise a second candidate output, can be applied as trained models to train the student ML model to generate its output. Wherein the student ML models can select the entire respective training data or a subset of the training data to apply based on the outputs from the respective teacher ML models ([0059]-[0060]).); and 
([0059]-[0060]: describing the generation of the final generated ML model and its output, which is derived by utilizing the outputs and training from the teacher ML models and student ML models.) .

Regarding claim 4, Chung further teaches:
The method of claim 1, wherein the processing system includes a Natural Language Processing (NLP) system, and the machine learning components perform Natural Language Processing (NLP) operations ([0039]: describing that the multi-task learning system comprising of student and teacher ML models can perform “multiple machine learning tasks 114 [that] may include different classification tasks, such as natural language classification tasks”).

Regarding claim 7, Chung further teaches:
The method of claim 1, further comprising: 
training a component model for a set of the machine learning components by executing the application with one from each of a plurality of candidate outputs generated from the set of machine learning components and adjusting the component model for the set of machine learning components based on results produced by the application ([0044]-[0045]: “The student machine learning model 104 may be configured to perform each of the multiple machine learning tasks 114 using the configured multiple teacher machine learning models 102 a-102 d and the training data 108….  The generated student machine learning model output may be compared to the generated teacher machine learning output, and used to determine an updated set of student machine learning model parameters that minimizes the difference between the generated student machine learning model output and the teacher machine learning output.” Wherein the student and teacher ML models are part of the multi-task learning system. Similarly, see [0053]: describing the update/adjustment of the student and teacher ML models.); and 
executing the application with the trained component model for the set of machine learning components ([0059]: “the [learning] system may continue the [update/adjustment] process for performing multi-task learning by providing the student machine learning model for further processing. For example, in some cases the above described steps 202-206 may be repeated for multiple sets of training data that correspond to respective sets of machine learning tasks to generate multiple student machine learning models. The system may then perform the steps 202-206 again, this time using the generated student machine learning models as teacher machine learning models, to generate a final student machine learning model.” That is, the multi-task learning system can execute the machine learning tasks via updating/adjusting the trained ML models and their parameters as needed.).

Regarding claim 8, Chung further teaches:
The method of claim 2, wherein the component model is trained based on output from the application in response to a new domain for the application ([0047]-[0048]: describing that the multi-task learning system comprising student and teacher ML models can be augmented by the augmentation module when there is a change in a natural language processing task, e.g. translation from English to French as opposed to the initial English to German translation. Wherein the change in target languages can denote a domain change.)
Regarding claim 9, Chung teaches:
An apparatus to configure a processing system with an application utilizing a plurality of machine learning components ([0075]-[0076]: describing a “data processing apparatus” that can encompass a variety of hardware that can enable a “multi-task learning system 100 [that] includes multiple teacher machine learning models 102a-102d [and] a single student machine learning model 104” ([0033]) that can perform various machine learning application tasks, including “natural language” processing tasks ([0039]). Wherein the learning “system [can be] implemented as computer programs on one or more computers in one or more locations” ([0033]).), the apparatus comprising: 
a processor configured to ([0076] and [0080]: “processor”): 
train each machine learning component of the plurality of machine learning components individually with respective training data to perform a corresponding task ([0034], [0044], and [0059]: describing that the various machine learning (ML) models comprising machine learning components can perform “machine learning tasks 114 using [respective] training data” for the corresponding models.); 5AMENDMENT IN RESPONSE TO THE OFFICE ACTION MAILED AUGUST 5, 2020 APPLICATION No. 15/444,734 
train the application on an initial set of training data ([0034]: describing “a first set of training data” that can be used for training the machine learning models.); 
SVL920160236US1/0920.0470C Page 30 of 34execute the application with the trained machine learning components on the processing system using another set of training data to produce a plurality of different outputs for each of a plurality of elements of the other set of training data ([0044]: describing that the natural language processing task being performed by the learning system comprises a configuration of “[t]he student machine learning model 104 … to perform each of the multiple machine learning tasks 114 using the configured multiple teacher machine learning models 102a-102d and the training data 108.” See also [0037]-[0038]: describing the various outputs for the respective training data sets.), …; 
identify, for each of the elements of the other set of training data, one or more outputs among the different outputs of the application that concur with … data ([0053]: describing that the “system may then compare the training output [of the teacher ML models] to a known output included in the set of training data by computing a loss function” followed by a training of the student ML model using the training data derived from the teacher ML models. Similarly, see [0037]-[0038]: further describing the various respective outputs.); 
identify the candidate outputs of the trained machine learning component that produce the identified outputs concurring with the … data ([0068] and [0070]-[0071]: describing that the outputs can be identified and compared with each other to determine if there a match between the various outputs, wherein the machine learning model parameters can be adjusted accordingly to obtain the desired outputs.); 
…; and 
adapt the plurality of trained machine learning components to select based on the weighting a candidate output from the trained machine learning component that provides an identified output concurring with the … data ([0045]: “The generated student machine learning model output may be compared to the generated teacher machine learning output, and used to determine an updated set of student machine learning model parameters that minimizes the difference between the generated student machine learning model output and the teacher machine learning output.” Wherein the outputs of the trained teacher ML models being utilized in the student ML model can be compared with a known training output by adjusting parameters of the models ([0070]-[0072]), in addition to being evaluated with weights to update the machine learning models ([0053]).)
and ….
…… for an inquiry ([0047]: describing the received data input can be a query for translation and the output is the translation in another language.).

While the cited reference teaches the limitations of claim 9, they do not explicitly teach: “ground truth” on lines 16, 18, and 23. Schuster discloses the claim limitations, teaching: a training of the machine learning system and models on an “initial training data” with corresponding “ground truth pairs” (Schuster [0026], [0028], and [0038]). Similarly, see Schuster [0039]: “The system then generates modified training data from the initial training data (step 206). For example, the system generates modified training data by generating, for each of one or more of the training examples in the initial training data, an auxiliary output for the training example from the ground truth output for the training example.” 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the models and system in the cited reference to include the ground truth data in Schuster. Doing so would enable a “machine learning system, implemented as computer programs on one or more computers in one or more locations can train a machine learning model to perform a machine learning task, e.g., a structured prediction machine learning task.” (Schuster [0021]). 


While the cited references teach the limitations of claim 9, they do not explicitly teach: 
“wherein each different output is based on a corresponding one of a first plurality of candidate outputs generated for each of the elements by a trained machine learning component of the plurality of trained machine learning components performing the corresponding task” on lines 11-14 and “to propagate the selected candidate output through remaining machine learning components to produce a result of the application” on lines 23-25. Bradski discloses the claim limitations, teaching:
“wherein each different output is based on a corresponding one of a first plurality of candidate outputs generated for each of the elements by a trained machine learning component of the plurality of trained machine learning components performing the corresponding task”: describing that the plurality of processing units can perform tasks by executing code, e.g. classification code, on a plurality of components, e.g. classification components, to generate a plurality of different decision outputs (Bradski [0018]-[0019]). Similarly, see Bradski [0024]: describing the classification decision outputs as well as the validation decision outputs of the corresponding classification decision outputs.
“to propagate the selected candidate output through remaining machine learning components to produce a result of the application”: describing that a manager of the processing units “may select the classification value selected by the majority of classification components 10 a, 10 b . . . 10 n or take the average of the classification decisions from the classification components 10 a, 10 b . . . 10 n to determine a final classification value for the received data” (Bradski [0018]). 
 to include the techniques in Bradski. Doing so would enable a method and system “for training and using classification components on multiple processing units. A plurality of processing units each has a memory including one of a plurality of subsets of a set of data points. At least two of the processing units have different subsets of data points. A plurality of classification components are executed by the processing units. Classification components executing at the processing units are trained, wherein each classification component is trained with the subset of data points in the memory of the processing unit that is executing the classification component.” (Bradski Abstract).

While the cited references teach the limitations of claim 9, they do not explicitly teach: “weighting each of the first plurality of candidate outputs, wherein weighting for the identified candidate outputs is greater relative to weighting for other candidate outputs” on lines 19-20. Contreras discloses the claim limitations, teaching: that the candidate answers can be weighted using scores, wherein the scores of the various answers are compared with each other in correlation with predetermined thresholds to determine the ranking with which to present the answers to a user (Contreras [0048]-[0050]). Wherein the statistical analysis with the weighting and scores are “repeated for each of the candidate answers until the Watson™ QA system identifies candidate answers that surface as being significantly stronger [i.e. with a weight/score that is greater] than others and thus, generates a final answer, or ranked set of answers, for the input question.” (Contreras [0031]). 
 to include the weighting in Contreras. Doing so would enable “question answering system ranks the plurality of candidate answers and outputs one or more of the candidate answers” (Contreras Abstract). Whereby the “[c]andidate answer ranking 417 ranks the candidate answers 414 by confidence score and outputs one or candidate answers 402 having a highest confidence score” (Contreras [0073]).

Regarding claim 10, Chung further teaches:
The apparatus of claim 9, further comprising: 
a data storage unit configured to store component models for the machine learning components; and wherein the processor is further configured to ([0075] and [0076]: describing a “computer storage medium” that can comprise storage devices that can implement the multi-task learning system.): 
generate the first plurality of candidate outputs from each of the plurality of machine learning components based on the other set of training data data ([0037]: describing that “cases where one or more of the teacher machine learning models are neural networks, each neural network, e.g., teacher machine learning model 102a, may be trained on a respective set of training data, e.g., training data set 108a, by processing a training input included in the set of training data to generate a corresponding output according to a given machine learning task.” See also Fig. 1: showing the student ML model and teacher ML models, wherein each teacher ML models can generate its own respective outputs ([0038]) for correlation with a student ML model. 
See also [0062]-[0064]: describing the use of training data subsets to train the teacher ML models and generating the corresponding outputs based on the selected subset of training data.); and 
train a component model for each of the plurality of machine learning components by executing the application for each of the first plurality of candidate outputs generated from that component and adjusting the component model based on results produced by the application ([0037]: describing that “the output [of the teacher ML model] may then be compared to a known training output included in the set of training data by computing a loss function, and backpropagating loss function gradients with respect to current neural network parameters to determine an updated set of neural network parameters that minimizes the loss function.” 
See also [0066]-[0067]: describing the use of the teacher ML models trained using the data subsets that is then used to train the student ML models, whereby the “system may then adjust the values of student machine learning model parameters to match the generated student machine learning model to a corresponding generated teacher machine learning model output.”).

Regarding claim 11, claim 11 is substantially similar to claim 3 and therefore is rejected on the same ground as claim 3. Claim 11 is a system claim that corresponds to method claim 3.

Regarding claim 14, claim 14 is substantially similar to claim 7 and therefore is rejected on the same ground as claim 7. Claim 14 is a system claim that corresponds to method claim 7
Regarding claim 15, claim 15 is substantially similar to claim 8 and therefore is rejected on the same ground as claim 8. Claim 11 is a system claim that corresponds to method claim 8.

Regarding claim 16, Chung teaches:
A computer program product for configuring a processing system with an application utilizing a plurality of machine learning components ([0075]-[0076]: describing a “data processing apparatus” that can encompass a variety of hardware and software that can enable a “multi-task learning system 100 [that] includes multiple teacher machine learning models 102a-102d [and] a single student machine learning model 104” ([0033]) that can perform various machine learning application tasks, including “natural language” processing tasks ([0039]). Wherein the learning “system [can be] implemented as computer programs on one or more computers in one or more locations” ([0033]).), 
the computer program product comprising one or more computer readable storage media collectively having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to (describing a [0075: “[computer] machine-readable medium” with program instructions with a “data processing apparatus” that can comprise the medium and a processor ([0076]) to run the multi-task learning system ([0033]).): 
train each machine learning component of the plurality of machine learning components individually with respective training data to perform a corresponding task ([0034], [0044], and [0059]: describing that the various machine learning (ML) models comprising machine learning components can perform “machine learning tasks 114 using [respective] training data” for the corresponding models.); 
train the application on an initial set of training data ([0034]: describing “a first set of training data” that can be used for training the machine learning models.);
execute the application with the trained machine learning components on the processing system using another set of training data to produce a plurality of different outputs for each of a plurality of elements of the other set of training data ([0044]: describing that the natural language processing task being performed by the learning system comprises a configuration of “[t]he student machine learning model 104 … to perform each of the multiple machine learning tasks 114 using the configured multiple teacher machine learning models 102a-102d and the training data 108.” See also [0037]-[0038]: describing the various outputs for the respective training data sets.), …;
identify, for each of the elements of the other set of training data, one or more outputs among the different outputs of the application that concur with … data ([0053]: describing that the “system may then compare the training output [of the teacher ML models] to a known output included in the set of training data by computing a loss function” followed by a training of the student ML model using the training data derived from the teacher ML models. Similarly, see [0037]-[0038]: further describing the various respective outputs.); 
identify the candidate outputs of the trained machine learning component that produce the identified outputs concurring with the … data ([0068] and [0070]-[0071]: describing that the outputs can be identified and compared with each other to determine if there a match between the various outputs, wherein the machine learning model parameters can be adjusted accordingly to obtain the desired outputs.); 
…; and 
trained machine learning components to select based on the weighting a candidate output from the trained machine learning component  that provides an identified output concurring with the … data ([0045]: “The generated student machine learning model output may be compared to the generated teacher machine learning output, and used to determine an updated set of student machine learning model parameters that minimizes the difference between the generated student machine learning model output and the teacher machine learning output.” Wherein the outputs of the trained teacher ML models being utilized in the student ML model can be compared with a known training output by adjusting parameters of the models ([0070]-[0072]), in addition to being evaluated with weights to update the machine learning models ([0053]).)
and ….
…for an inquiry ([0047]: describing the received data input can be a query for translation and the output is the translation in another language.).

While the cited reference teaches the limitations of claim 16, Chung does not explicitly teach: “ground truth” on lines 17, 19, and 24. Schuster discloses the claim limitations, teaching: a training of the machine learning system and models on an “initial training data” with corresponding “ground truth pairs” (Schuster [0026], [0028], and [0038]). Similarly, see Schuster [0039]: “The system then generates modified training data from the initial training data (step 206). For example, the system generates modified training data by generating, for each of one or more of the training examples in the initial training data, an auxiliary output for the training example from the ground truth output for the training example.” 
 to include the ground truth data in Schuster. Doing so would enable a “machine learning system, implemented as computer programs on one or more computers in one or more locations can train a machine learning model to perform a machine learning task, e.g., a structured prediction machine learning task.” (Schuster [0021]).

While the cited references teach the limitations of claim 16, they do not explicitly teach: 
“wherein each different output is based on a corresponding one of a first plurality of candidate outputs generated for each of the elements by a trained machine learning component of the plurality of trained machine learning components performing the corresponding task” on lines 12-15 and “to propagate the selected candidate output through remaining machine learning components to produce a result of the application” on lines 24-26. Bradski discloses the claim limitations, teaching:
“wherein each different output is based on a corresponding one of a first plurality of candidate outputs generated for each of the elements by a trained machine learning component of the plurality of trained machine learning components performing the corresponding task”: describing that the plurality of processing units can perform tasks by executing code, e.g. classification code, on a plurality of components, e.g. classification components, to generate a plurality of different decision outputs (Bradski [0018]-[0019]). Similarly, see Bradski [0020]-[0025]: describing the classification decision outputs as well as the validation decision outputs of the corresponding classification decision outputs.
“to propagate the selected candidate output through remaining machine learning components to produce a result of the application”: describing that a manager of the processing units “may select the classification value selected by the majority of classification components 10 a, 10 b . . . 10 n or take the average of the classification decisions from the classification components 10 a, 10 b . . . 10 n to determine a final classification value for the received data” (Bradski [0018]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the models and system in the cited references to include the techniques in Bradski. Doing so would enable a method and system “for training and using classification components on multiple processing units. A plurality of processing units each has a memory including one of a plurality of subsets of a set of data points. At least two of the processing units have different subsets of data points. A plurality of classification components are executed by the processing units. Classification components executing at the processing units are trained, wherein each classification component is trained with the subset of data points in the memory of the processing unit that is executing the classification component.” (Bradski Abstract).

While the cited references teach the limitations of claim 16, they do not explicitly teach: “weight each of the first plurality of candidate outputs, wherein weighting for the identified candidate outputs is greater relative to weighting for other candidate outputs” on lines 20-21. Contreras discloses the claim limitations, teaching: that the candidate answers can be weighted using scores, wherein the scores of the various answers are compared with each other in correlation with predetermined thresholds to determine the ranking with which to present the answers to a user (Contreras [0048]-[0050]). Wherein the statistical analysis with the weighting and scores are “repeated for each of the candidate answers until the Watson™ QA system identifies candidate answers that surface as being significantly stronger [i.e. with a weight/score that is greater] than others and thus, generates a final answer, or ranked set of answers, for the input question.” (Contreras [0031]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the models and system in the cited references to include the weighting in Contreras. Doing so would enable “question answering system ranks the plurality of candidate answers and outputs one or more of the candidate answers” (Contreras Abstract). Whereby the “[c]andidate answer ranking 417 ranks the candidate answers 414 by confidence score and outputs one or candidate answers 402 having a highest confidence score” (Contreras [0073]).

Regarding claim 17, claim 17 is substantially similar to claim 2 and therefore is rejected on the same ground as claim 2. Claim 17 is a computer program product claim that corresponds to method claim 2.

Regarding claim 18, claim 18 is substantially similar to claim 3 and therefore is rejected on the same ground as claim 3. Claim 18 is a computer program product claim that corresponds to method claim 3.

Claims 5, 6, 12, 13, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chung et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0325308, hereinafter Chung), Schuster et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0188566, hereinafter Schuster), Bradski et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2008/0050014, hereinafter Bradski), and Contreras et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2015/0235130, hereinafter Contreras) in view of Allen et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2015/0339574, hereinafter Allen).

Regarding claim 5, while the cited references teach the limitations of claim 2, they do not explicitly teach: “wherein generating the first plurality of candidate outputs comprises: generating the first plurality of candidate outputs as being variations of a primary candidate output.” Allen discloses the claim limitation, teaching: “the QA system generates a set of hypotheses or candidate answers to the input question” (Allen [0017]), wherein the set of candidate answers are variations of a final answer, as determined by a statistical model analysis and ranking of the candidate answers to determine and select the candidate answer with the highest rank/score as the final answer (Allen [0018]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the machine learning models and system in the cited references to include the various candidate answers in Allen. Doing so would enable a way to configure “a QA system to answer input questions, [whereby] it is important to train the QA system using machine learning techniques to iteratively modify the operation of the QA system until a desired performance is achieved, e.g., a determined level of accuracy. A method of determining the accuracy of a QA system includes, during training of the QA system, verifying an answer provided by the QA system using a set of acceptable answers.” (Allen [0019]). 

Regarding claim 6, while the cited references teach the limitations of claim 2 and the plurality of components, they do not explicitly teach: “determining features for the first plurality of candidate outputs generated from each of the plurality of machine learning components; wherein training a component model for each of the plurality of machine learning components includes weighting the features of each of the first plurality of candidate outputs generated from that component based on results produced by the application to enable the selection of a candidate output for the application.” Allen discloses the claim limitations, teaching:
“determining features for the first plurality of candidate outputs generated from each of the plurality of machine learning components (Allen [0067]: describing that the QA system “performs a deep analysis and comparison of the language of the input question and the language of each hypothesis or “candidate answer” as well as performs evidence scoring to evaluate the likelihood that the particular hypothesis is a correct answer for the input question. As mentioned above, this may involve using a plurality of reasoning algorithms, each performing a separate type of analysis of the language of the input question and/or content of the corpus that provides evidence in support of, or not, of the hypothesis. Each reasoning algorithm generates a score based on the analysis it performs which indicates a measure of relevance of the individual portions of the corpus of data/information extracted by application of the queries as well as a measure of the correctness of the corresponding hypothesis, i.e. a measure of confidence in the hypothesis.” Wherein the reasoning algorithms “may look at the matching of terms and synonyms within the language of the input question and the found portions of the corpus of data [used to generate the candidate answers]” (Allen [0051]). That is, the features of the candidate answers are determined in the deep analysis phase to determine if the candidate answer is correct/ relevant.); 
(Allen [0068]-[0069]: describing that confidence scores may be computed for the plurality of candidate answers wherein “[t]his process may involve applying weights to the various scores, where the weights have been determined through training of the statistical model employed by the QA system and/or dynamically updated, as described hereafter. The weighted scores may be processed in accordance with a statistical model generated through training of the QA system that identifies a manner by which these scores may be combined to generate a confidence score or measure for the individual hypotheses or candidate answers…. The hypotheses/candidate answers may be ranked according to these comparisons to generate a ranked listing of hypotheses/ candidate answers (hereafter simply referred to as “candidate answers”). From the ranked listing of candidate answers, at stage 380, a final answer and confidence score, or final set of candidate answers and confidence scores, may be generated and output to the submitter of the original input question.”).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the machine learning models and system in the cited references to weigh the candidate answers to determine the final answer in Allen. Doing so would enable the “QA system [to] identif[y] candidate answers that surface as being significantly stronger than others and thus, generates a final answer, or ranked set of answers, for the input question. Wherein “the hypotheses/candidate answers may be ranked according to these comparisons to generate a ranked listing of hypotheses/ candidate answers (hereafter simply referred to as “candidate answers”).” (Allen [0018]). 

Regarding claim 12, claim 12 is substantially similar to claim 5 and therefore is rejected on the same ground as claim 5. Claim 12 is a system claim that corresponds to method claim 5.

Regarding claim 13, claim 13 is substantially similar to claim 6 and therefore is rejected on the same ground as claim 6. Claim 13 is a system claim that corresponds to method claim 6.

Regarding claim 19, claim 19 is substantially similar to claim 5 and therefore is rejected on the same ground as claim 5. Claim 19 is a computer program product claim that corresponds to method claim 5.

Regarding claim 20, claim 20 is substantially similar to claim 6 and therefore is rejected on the same ground as claim 6. Claim 20 is a computer program product claim that corresponds to method claim 6.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762.  The examiner can normally be reached on M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.H./Examiner, Art Unit 2121                                                                                                                                                                                                        




/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121