Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to submission of application on 2/13/2018.
Claims 1-27 are presented for examination.

Drawings
The drawings are objected to because 37 CFR § 1.85.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
FIG. 10B 1020 include 5 neurons not 6 neurons - Corrections to drawings
              FIG. 10C 1040 has 1 input not 3 inputs - Corrections to drawings

Specification
The disclosure is objected to because of the following informalities: 
In paragraph [0056] line 4 model should read “models”   
In paragraph [0059] line 6 pedestrian” should read "pedestrian"  
In paragraph [0060] line 2 label should read “labels”  
In paragraph [0067] line 8 model should read “models”   
In paragraph [0090] line 2 according to the drawing should read 5 neurons not 6 neurons 
 Appropriate correction is required.

Claim Rejections - 35 USC § 102
 	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.




Claim(s) 1-4, 6-8, 11, 13-16, 18-20, 23, & 27 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Lin et al.(US 8370280 B1 hereinafter Lin).
Regarding claim 1,
Lin discloses A method to reduce memory and processor consumption required (Lin, Col. 2 line [023-025] “Further, by selectively reducing the number of predictive models to be combined, computational resources can be conserved.”) in creating a resulting machine learning model (Lin, Col. 1 line [059-62] “In the foregoing aspect, generating a final output can include: defining a combining technique based on the subset of predictive models; and combining the plurality of outputs according to the combing technique”). according to user-specified criteria, (Lin, Col. 1 line [046-047] "receiving a feature vector, the feature vector including one or more elements ;" Lin, Col. 8 line [023-025] " According to process 400, a feature vector can be received (402). For example, a user may submit a query including the feature vector to a server.” user query or specification) the method comprising
receiving from a first user device the user-specified criteria, (Lin, Col. 1 Line [046-47] “method can include the actions of: receiving a feature vector, the feature vector including one or more elements.” and ” (Lin, Col. 6 Line [010-012] “FIG. 2 illustrates an example predictive modeling system 200. The system 200 includes one or more clients”) the user-specified criteria comprising a label describing a feature to identify from input data using the resulting machine learning model; (Lin, Col. 1 Line [046-47] “method can include the actions of: receiving a feature vector, the feature vector including one or more elements.” and “(Lin, Col. 8 Line [030-032] “For instance, according to process 400, the type (e.g., data type, such as binary, string, real value, etc.) of each feature included in the feature vector can be identified (404)”).   
obtaining from a plurality of sources a plurality of machine learning models trained to identify the label from the input data; (Lin, Col. 5 Line [019-022] “Predictive modeling generally refers to techniques for extracting information from data to build a model that can predict an output from a given input.”) and Col. 1 Line [048-055] ”from a set of predictive models, a subset of one or more predictive models based on the element types and one or more performance indicators associated with each predictive model in the set of predictive models; processing the feature vector using the subset of predictive models, each predictive model of the subset of predictive models generating an output based on the feature vector to provide a plurality of outputs”)
based on the plurality of machine learning models trained to identify the label from the input data, (Lin, Col. 8 Line [028-041] “once received, one or more attributes or properties describing the feature vector can be identified. For instance, according to process 400, the type (e.g., data type, such as binary, string, real value, etc.) of each feature included in the feature vector can be identified (404). Other suitable attributes describing the received feature vector can also be determined, such as the dimensionality of the feature vector. A subset of one or more predictive models can be selected from a set of predictive models (406). In some implementations, the subset of predictive models is selected based on the identified types of features included in the received feature vector and one or more performance indicators associated with the predictive models.”) creating the resulting machine learning model to identify the label from the input data within a predefined accuracy, (Lin, Col. 2 Line [001-008] “the final output is generated using a decision maker, the decision maker receiving the plurality of outputs; the plurality of outputs are combined to define a second feature vector, the second feature being processed by a final predictive model to generate the final output; selecting a subset of one or more predictive models includes comparing respective performance indicators (e.g., accuracy metrics) associated with each predictive model of the set of predictive models.”), said creating the resulting machine learning model comprising improving an accuracy of a machine learning model in the plurality of machine learning models to at least the predefined accuracy. (Lin, Col. 2 Line [001-008] “the accuracy of the predicted outcomes ultimately generated by a predictive model can be augmented by increasing the number of training examples and/or the number of features per training example. Observed features and/or outcomes included in the training examples can be provided in any suitable form or data type.”)
Regarding claim 2, 
 	Lin discloses the method of claim 1, said creating the resulting machine learning model comprising: (Lin, Col. 1 line [059-62] “In the foregoing aspect, generating a final output can include: defining a combining technique based on the subset of predictive models; and combining the plurality of outputs according to the combing technique”). 
creating a label classification based on a plurality of labels and the plurality of machine learning models trained to identify at least one label in the plurality of labels,  (Lin, Col. 3 Line [046-056] “In some implementations, a training dataset can be applied to a selected machine learning algorithm to train a predictive model. More specifically, the machine learning algorithm can train a predictive model by systematically analyzing the applied training dataset and defining an inferred function that “fits” the training data. The trained predictive model can be representative of the training dataset and operable to map a feature vector to a predictive outcome according to the inferred function.") wherein the label classification establishes relationships among the plurality of labels; (Lin, Col. 3 Line [023-031] "In some implementations, feature vectors and associated outcomes of a training dataset can be organized in tabular form. For example, the training dataset illustrated by TABLE 1 includes of a number of training examples related to categorization of email messages as SPAM or NOT SPAM. As shown, the feature vector (which, in this example, includes a single observed feature) of each training example includes an email message subject Line, and the related outcome is a category that indicates whether the email message is spam.")
upon receiving the label from the first user device, finding the label in the label classification and a first machine learning model associated with the label; (Lin, Col. 1 Line [046-052] “methods that include the actions of:  receiving a feature vector, the feature vector including one or more elements; identifying an element type for each of the one or more elements; selecting, from a set of predictive models, a subset of one or more predictive models based on the element types and one or more performance indicators associated with each predictive model in the set of predictive models.”) Where predictive models is the machine model.
testing the accuracy of the first machine learning model associated with the label, said testing the accuracy comprising obtaining the accuracy of the first machine learning model; and (Lin, Col. 8 Line [036-048] "A subset of one or more predictive models can be selected from a set of predictive models (406). In some implementations, the subset of predictive models is selected based on the identified types of features included in the received feature vector and one or more performance indicators associated with the predictive models.” and  “In this example, a performance indicator can be considered any suitable quantitative measure (e.g., a metric), qualitative designation (e.g., labels such as “highly accurate”, “robust”, etc.) or ranking which describes the performance of a predictive model.") Where the predictive model is the machine model.")
when the accuracy of the first machine learning model is below the predefined accuracy, (Lin, Col. 4 Line [036-042] “training examples that were particularly difficult to process in a first predictive model can be given a greater weight than other training examples in the training dataset, such that a second predictive model receiving the modified dataset can become an “expert” in domains where the first predictive model has proven to be relatively weak.”). improving the accuracy of the first machine learning model by determining a problem label causing the low accuracy, and combining the first machine learning model associated with the label with a second machine learning model associated with the problem label (Lin, Col. 2 Line [023-025] “output from a number of distinct predictive models can be combined to achieve predictions that can be more accurate to predictions provided by individual models. Such predictions can be further improved by selecting a specific subset of predictive models for combination from a set of available models.”).
Regarding claim 3,  
Lin discloses the method of claim 1, said creating the resulting machine learning model comprising: (Lin, Col. 1 line [059-62] “In the foregoing aspect, generating a final output can include: defining a combining technique based on the subset of predictive models; and combining the plurality of outputs according to the combing technique”).
creating a label classification based on a plurality of labels and the plurality of machine learning models trained to identify at least one label in the plurality of labels, wherein the label classification establishes relationships among the plurality of labels; (Lin, Col. 3 Line [001-013] “In some implementations, a predictive model can be constructed (or “trained”) using a training dataset in conjunction with a machine learning algorithm. Training datasets can include any number of training examples (e.g., tens, hundreds, thousands, or millions of examples) embodying a patterned occurrence. Each training example can include a number of elements (for example, observed features) related to a known outcome (e.g., a category or a numeric value). In some examples, the observed feature(s) for each training example can be considered a feature vector. The dimensionality of a feature vector can be equal to, or less than, the number of observed features included therein.”)
upon receiving the label from the first user device, finding the label in the label classification, a first machine learning model associated with the label, a related label, and a second machine learning model associated with the related label; (Lin, Col. 3 Line [047-054] “In some implementations, a training dataset can be applied to a selected machine learning algorithm to train a predictive model. More specifically, the machine learning algorithm can train a predictive model by systematically analyzing the applied training dataset and defining an inferred function that “fits” the training data. The trained predictive model can be representative of the training dataset and operable to map a feature vector to a predictive outcome according to the inferred function.” and Col. 4 Line [036-043] “training examples that were particularly difficult to process in a first predictive model can be given a greater weight than other training examples in the training dataset, such that a second predictive model receiving the modified dataset can become an “expert” in domains where the first predictive model has proven to be relatively weak.”)
combining the first machine learning model associated with the label and the second machine learning model associated with the related label to obtain the resulting machine learning model. (Lin, Col. 2 Line [057-063] “In some cases, output from a number of distinct predictive models can be combined to achieve predictions that can be more accurate to predictions provided by individual models. Such predictions can be further improved by selecting a specific subset of predictive models for combination from a set of available models.”)
Regarding claim 4,  
Lin discloses The method of claim 3, said combining the first machine learning model associated with the label with the second machine learning model associated with the related label, comprises: (Lin, Col. 2 Line [056-060] “predictive model is considered a regression model if its predictive outcomes are numeric values. In some cases, output from a number of distinct predictive models can be combined to achieve predictions that can be more accurate to predictions provided by individual models.”)
making a serial combination of the first machine learning model associated with the label and the second machine learning model associated with the related label to obtain the resulting machine learning model. (Lin, Col. 4 Line [031-041] “The boosting operations can include training a set of predictive models in series and re-weighting the training dataset between training iterations based on output from an earlier predictive model. For example, training examples that were particularly difficult to process in a first predictive model can be given a greater weight than other training examples in the training dataset, such that a second predictive model receiving the modified dataset can become an “expert”.”) Where the predictive model is the machine model.
Regarding claim 6,
Lin discloses The method of claim 3, said combining the first machine learning model associated with the label with the second machine learning model associated with the related label, comprises: (Lin, Col. 2 Line [056-060] “predictive model is considered a regression model if its predictive outcomes are numeric values. In some cases, output from a number of distinct predictive models can be combined to achieve predictions that can be more accurate to predictions provided by individual models.”)
making a parallel combination of the machine learning model associated with the label and the machine learning model associated with the related label to obtain the resulting machine learning model. (Lin, Col.10 Line [025-030] “A 602, B 604, C 606, D 608, and E 610 represent respective implementations of selected predictive models that can be executed in parallel based on a received feature vector. Element 612 represents a decision maker module for applying the fixed output combining rule to achieve a final prediction.”).
Regarding claim 7,
Lin discloses the method of claim 3, comprising: 
identifying a portion of the resulting machine model with lowest accuracy; (Lin, Col.10 Line [025-030] “selecting a subset of one or more predictive models includes comparing respective performance indicators (e.g., accuracy metrics) associated with each predictive model of the set of predictive models, the respective performance indicators being selected for comparison based on the element types.”)
training only the portion of the resulting machine model. (Lin, Col.10 Line [025-030] “Predictive model combination techniques can be further enhanced by selecting a subset of predictive models for combination that are expected to perform well under identified conditions.”)
Regarding claim 8,
Lin discloses, the method of claim 1, said obtaining the plurality of machine learning models comprising: (Lin, Col.8 Line [066-060] “In some implementations, a database including performance indicators for a plurality of available predictive models can be provided.”).
creating a label classification of the plurality of labels, the plurality of machine learning models trained to identify at least one label in the plurality of labels, (Lin, Col.4 Line [006-011] “Each training example can include a number of elements (for example, observed features) related to a known outcome (e.g., a category or a numeric value). In some examples, the observed feature(s) for each training example can be considered a feature vector.”) and a plurality of input data associated with the plurality of labels, wherein the label classification establishes relationships among the plurality of labels, among the plurality of machine learning models trained to identify labels, and among the plurality of input data associated with the plurality of labels; (Lin, Col.8 Line [028-035]”once received, one or more attributes or properties describing the feature vector can be identified. For instance, according to process 400, the type (e.g., data type, such as binary, string, real value, etc.) of each feature included in the feature vector can be identified (404). Other suitable attributes describing the received feature vector can also be determined, such as the dimensionality of the feature vector.”)
upon receiving the label from the first user device, searching the label classification by retrieving at least one of the machine learning models trained to identify the label or the input data associated with the label. (Lin, Col.4 Line [006-011] “database including performance indicators for a plurality of available predictive models can be provided.”)  and (Lin, Col.1 Line [044-050] “one aspect of the subject matter described in this specification can be embodied in methods that include the actions of: receiving a feature vector, the feature vector including one or more elements; identifying an element type for each of the one or more elements; selecting, from a set of predictive models, a subset of one or more predictive models based on the element types“).
Regarding claim 11, 
Lin teaches The method of claim 1, comprising: training a first machine learning model and a second machine learning model to identify the label, and to produce a first confidence level and a second confidence level associated with the identified label respectively, wherein the first machine learning model is less complex than the second machine learning model. (Lin, Col. 4 Line [035-042] “training examples that were particularly difficult to process in a first predictive model can be given a greater weight than other training examples in the training dataset, such that a second predictive model receiving the modified dataset can become an “expert” in domains where the first predictive model has proven to be relatively weak.”)
Regarding claim 13, 
Lin teaches A non-transitory computer readable medium storing instructions (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”) for reducing memory and processor consumption required in creating a resulting machine learning model according to user-specified criteria, (Lin, Col. 2 line [023-025] “Further, by selectively reducing the number of predictive models to be combined, computational resources can be conserved.”)  the instructions when executed by at least one processor cause the at least one processor to implement operations comprising: (Lin, Col. 11 [39-42] “The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output.”).
 receiving from a first user device the user-specified criteria, (Lin, Col. 1 Line [046-47] “method can include the actions of: receiving a feature vector, the feature vector including one or more elements.” and ” (Lin, Col. 6 Line [010-012] “FIG. 2 illustrates an example predictive modeling system 200. The system 200 includes one or more clients”) the user-specified criteria comprising a label describing a feature to identify from input data using the resulting machine learning model; (Lin, Col. 1 Line [046-47] “method can include the actions of: receiving a feature vector, the feature vector including one or more elements.” and “(Lin, Col. 8 Line [030-032] “For instance, according to process 400, the type (e.g., data type, such as binary, string, real value, etc.) of each feature included in the feature vector can be identified (404)”).   
obtaining from a plurality of sources a plurality of machine learning models trained to identify the label from the input data; (Lin, Col. 5 Line [019-022] “Predictive modeling generally refers to techniques for extracting information from data to build a model that can predict an output from a given input.”) and Col. 1 Line [048-055] ”from a set of predictive models, a subset of one or more predictive models based on the element types and one or more performance indicators associated with each predictive model in the set of predictive models; processing the feature vector using the subset of predictive models, each predictive model of the subset of predictive models generating an output based on the feature vector to provide a plurality of outputs”)
based on the plurality of machine learning models trained to identify the label from the input data, (Lin, Col. 8 Line [028-041] “once received, one or more attributes or properties describing the feature vector can be identified. For instance, according to process 400, the type (e.g., data type, such as binary, string, real value, etc.) of each feature included in the feature vector can be identified (404). Other suitable attributes describing the received feature vector can also be determined, such as the dimensionality of the feature vector. A subset of one or more predictive models can be selected from a set of predictive models (406). In some implementations, the subset of predictive models is selected based on the identified types of features included in the received feature vector and one or more performance indicators associated with the predictive models.”) creating the resulting machine learning model to identify the label from the input data within a predefined accuracy, (Lin, Col. 2 Line [001-008] “the final output is generated using a decision maker, the decision maker receiving the plurality of outputs; the plurality of outputs are combined to define a second feature vector, the second feature being processed by a final predictive model to generate the final output; selecting a subset of one or more predictive models includes comparing respective performance indicators (e.g., accuracy metrics) associated with each predictive model of the set of predictive models.”), said creating the resulting machine learning model comprising improving an accuracy of a machine learning model in the plurality of machine learning models to at least the predefined accuracy. (Lin, Col. 2 Line [001-008] “the accuracy of the predicted outcomes ultimately generated by a predictive model can be augmented by increasing the number of training examples and/or the number of features per training example. Observed features and/or outcomes included in the training examples can be provided in any suitable form or data type.”)
Regarding claim 14, 
 	Lin discloses The non-transitory computer readable medium of claim 13, (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”)  said creating the resulting machine learning model comprising: (Lin, Col. 1 line [059-62] “In the foregoing aspect, generating a final output can include: defining a combining technique based on the subset of predictive models; and combining the plurality of outputs according to the combing technique”). 
creating a label classification based on a plurality of labels and the plurality of machine learning models trained to identify at least one label in the plurality of labels,  (Lin, Col. 3 Line [046-056] “In some implementations, a training dataset can be applied to a selected machine learning algorithm to train a predictive model. More specifically, the machine learning algorithm can train a predictive model by systematically analyzing the applied training dataset and defining an inferred function that “fits” the training data. The trained predictive model can be representative of the training dataset and operable to map a feature vector to a predictive outcome according to the inferred function.") wherein the label classification establishes relationships among the plurality of labels; (Lin, Col. 3 Line [023-031] "In some implementations, feature vectors and associated outcomes of a training dataset can be organized in tabular form. For example, the training dataset illustrated by TABLE 1 includes of a number of training examples related to categorization of email messages as SPAM or NOT SPAM. As shown, the feature vector (which, in this example, includes a single observed feature) of each training example includes an email message subject Line, and the related outcome is a category that indicates whether the email message is spam.")
upon receiving the label from the first user device, finding the label in the label classification and a first machine learning model associated with the label; (Lin, Col. 1 Line [046-052] “methods that include the actions of:  receiving a feature vector, the feature vector including one or more elements; identifying an element type for each of the one or more elements; selecting, from a set of predictive models, a subset of one or more predictive models based on the element types and one or more performance indicators associated with each predictive model in the set of predictive models.”) Where predictive models is the machine model.
testing the accuracy of the first machine learning model associated with the label, said testing the accuracy comprising obtaining the accuracy of the first machine learning model; and (Lin, Col. 8 Line [036-048] "A subset of one or more predictive models can be selected from a set of predictive models (406). In some implementations, the subset of predictive models is selected based on the identified types of features included in the received feature vector and one or more performance indicators associated with the predictive models.” and  “In this example, a performance indicator can be considered any suitable quantitative measure (e.g., a metric), qualitative designation (e.g., labels such as “highly accurate”, “robust”, etc.) or ranking which describes the performance of a predictive model.") Where the predictive model is the machine model.")
when the accuracy of the first machine learning model is below the predefined accuracy, (Lin, Col. 4 Line [036-042] “training examples that were particularly difficult to process in a first predictive model can be given a greater weight than other training examples in the training dataset, such that a second predictive model receiving the modified dataset can become an “expert” in domains where the first predictive model has proven to be relatively weak.”). improving the accuracy of the first machine learning model by determining a problem label causing the low accuracy, and combining the first machine learning model associated with the label with a second machine learning model associated with the problem label (Lin, Col. 2 Line [023-025] “output from a number of distinct predictive models can be combined to achieve predictions that can be more accurate to predictions provided by individual models. Such predictions can be further improved by selecting a specific subset of predictive models for combination from a set of available models.”).
Regarding claim 15,  
Lin discloses The non-transitory computer readable medium of claim 13, (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”)  said creating the resulting machine learning model comprising: (Lin, Col. 1 line [059-62] “In the foregoing aspect, generating a final output can include: defining a combining technique based on the subset of predictive models; and combining the plurality of outputs according to the combing technique”).
creating a label classification based on a plurality of labels and the plurality of machine learning models trained to identify at least one label in the plurality of labels, wherein the label classification establishes relationships among the plurality of labels; (Lin, Col. 3 Line [001-013] “In some implementations, a predictive model can be constructed (or “trained”) using a training dataset in conjunction with a machine learning algorithm. Training datasets can include any number of training examples (e.g., tens, hundreds, thousands, or millions of examples) embodying a patterned occurrence. Each training example can include a number of elements (for example, observed features) related to a known outcome (e.g., a category or a numeric value). In some examples, the observed feature(s) for each training example can be considered a feature vector. The dimensionality of a feature vector can be equal to, or less than, the number of observed features included therein.”)
upon receiving the label from the first user device, finding the label in the label classification, a first machine learning model associated with the label, a related label, and a second machine learning model associated with the related label; (Lin, Col. 3 Line [047-054] “In some implementations, a training dataset can be applied to a selected machine learning algorithm to train a predictive model. More specifically, the machine learning algorithm can train a predictive model by systematically analyzing the applied training dataset and defining an inferred function that “fits” the training data. The trained predictive model can be representative of the training dataset and operable to map a feature vector to a predictive outcome according to the inferred function.” and Col. 4 Line [036-043] “training examples that were particularly difficult to process in a first predictive model can be given a greater weight than other training examples in the training dataset, such that a second predictive model receiving the modified dataset can become an “expert” in domains where the first predictive model has proven to be relatively weak.”)
combining the first machine learning model associated with the label and the second machine learning model associated with the related label to obtain the resulting machine learning model. (Lin, Col. 2 Line [057-063] “In some cases, output from a number of distinct predictive models can be combined to achieve predictions that can be more accurate to predictions provided by individual models. Such predictions can be further improved by selecting a specific subset of predictive models for combination from a set of available models.”)
Regarding claim 16,  
Lin discloses The non-transitory computer readable medium of claim 13, (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”) , said combining the first machine learning model associated with the label with the second machine learning model associated with the related label, comprises: (Lin, Col. 2 Line [056-060] “predictive model is considered a regression model if its predictive outcomes are numeric values. In some cases, output from a number of distinct predictive models can be combined to achieve predictions that can be more accurate to predictions provided by individual models.”)
making a serial combination of the first machine learning model associated with the label and the second machine learning model associated with the related label to obtain the resulting machine learning model. (Lin, Col. 4 Line [031-041] “The boosting operations can include training a set of predictive models in series and re-weighting the training dataset between training iterations based on output from an earlier predictive model. For example, training examples that were particularly difficult to process in a first predictive model can be given a greater weight than other training examples in the training dataset, such that a second predictive model receiving the modified dataset can become an “expert”.”) Where the predictive model is the machine model.
Regarding claim 18,
Lin discloses The non-transitory computer readable medium of claim 15, (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”) , said combining the first machine learning model associated with the label with the second machine learning model associated with the related label, comprises: (Lin, Col. 2 Line [056-060] “predictive model is considered a regression model if its predictive outcomes are numeric values. In some cases, output from a number of distinct predictive models can be combined to achieve predictions that can be more accurate to predictions provided by individual models.”)
making a parallel combination of the machine learning model associated with the label and the machine learning model associated with the related label to obtain the resulting machine learning model. (Lin, Col.10 Line [025-030] “A 602, B 604, C 606, D 608, and E 610 represent respective implementations of selected predictive models that can be executed in parallel based on a received feature vector. Element 612 represents a decision maker module for applying the fixed output combining rule to achieve a final prediction.”).
Regarding claim 19,
Lin discloses The non-transitory computer readable medium of claim 15, (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”), the operations comprising: 
identifying a portion of the resulting machine model with lowest accuracy; (Lin, Col.10 Line [025-030] “selecting a subset of one or more predictive models includes comparing respective performance indicators (e.g., accuracy metrics) associated with each predictive model of the set of predictive models, the respective performance indicators being selected for comparison based on the element types.”)
training only the portion of the resulting machine model. (Lin, Col.10 Line [025-030] “Predictive model combination techniques can be further enhanced by selecting a subset of predictive models for combination that are expected to perform well under identified conditions.”)
Regarding claim 20,
Lin discloses The non-transitory computer readable medium of claim 15, (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”), said obtaining the plurality of machine learning models comprising: (Lin, Col.8 Line [066-060] “In some implementations, a database including performance indicators for a plurality of available predictive models can be provided.”).
creating a label classification of the plurality of labels, the plurality of machine learning models trained to identify at least one label in the plurality of labels, (Lin, Col.4 Line [006-011] “Each training example can include a number of elements (for example, observed features) related to a known outcome (e.g., a category or a numeric value). In some examples, the observed feature(s) for each training example can be considered a feature vector.”) and a plurality of input data associated with the plurality of labels, wherein the label classification establishes relationships among the plurality of labels, among the plurality of machine learning models trained to identify labels, and among the plurality of input data associated with the plurality of labels; (Lin, Col.8 Line [028-035]”once received, one or more attributes or properties describing the feature vector can be identified. For instance, according to process 400, the type (e.g., data type, such as binary, string, real value, etc.) of each feature included in the feature vector can be identified (404). Other suitable attributes describing the received feature vector can also be determined, such as the dimensionality of the feature vector.”)
upon receiving the label from the first user device, searching the label classification by retrieving at least one of the machine learning models trained to identify the label or the input data associated with the label. (Lin, Col.4 Line [006-011] “database including performance indicators for a plurality of available predictive models can be provided.”)  and (Lin, Col.1 Line [044-050] “one aspect of the subject matter described in this specification can be embodied in methods that include the actions of: receiving a feature vector, the feature vector including one or more elements; identifying an element type for each of the one or more elements; selecting, from a set of predictive models, a subset of one or more predictive models based on the element types“).
Regarding claim 23, 
Lin discloses The non-transitory computer readable medium of claim 15, (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”), the operations comprising: 
training a first machine learning model and a second machine learning model to identify the label, and to produce a first confidence level and a second confidence level associated with the identified label respectively, wherein the first machine learning model is less complex than the second machine learning model. (Lin, Col. 4 Line [035-042] “training examples that were particularly difficult to process in a first predictive model can be given a greater weight than other training examples in the training dataset, such that a second predictive model receiving the modified dataset can become an “expert” in domains where the first predictive model has proven to be relatively weak.”)
Regarding claim 27,
Lin teaches the method of claim 25, said modifying inputs of the machine learning model in the plurality of machine learning models and outputs of the machine learning model (Lin, Col. 3 Line [0024] “The input labels can also correspond to desired outputs (e.g., the input label can include the desired output result).”), comprises:
creating an input interface layer such that inputs of the input interface layer match the canonical inputs and creating an output interface layer such that outputs of the output interface layer match the canonical outputs (Lin, Col. 4 Line [005-010] “the output of the model implementation associated with node A 302 serves as input to the model implementation associated with node C 304. The input to the model implementation associated with node D 308 is the output of the model implementations associated with nodes C 304 and B 306. The output of the model implementation associated with node D 308 serves as input to the model implementation associated with node E 310.”).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 9-10, 12, 21-22, 24-26. is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Kulkarni et al. (US20150058331A1, hereafter Kulkarni).
Regarding claim 9
Lin does not explicitly disclose, the method of claim 1, comprising: inspecting the plurality of machine learning models, said inspecting the plurality of machine learning models comprising identifying an overfitted machine learning model in the plurality of machine learning models; and 
However, Kulkarni in analogous art discloses  inspecting the plurality of machine learning models, said inspecting the plurality of machine learning models comprising identifying an overfitted machine learning model in the plurality of machine learning models; and (Kulkarni, Pag. 3 [0026] “In an example, training involves inputting data, observing the results, and indicating incorrect results. The machine learning system can then adjust its model (e.g., internal model 135) and the data can be fed through again, and again the results can be observed and incorrect results can be indicated to the machine learning system. This process can repeat until a satisfactory threshold of error (or correctness), desired number of cycles, or other predetermined condition is achieved.”)
Lin and Kulkarni are both directed to neural networks and efficiently. In view of the teachings of Kulkarni, it would have been obvious to one of ordinary skill in the art to apply the teachings of Kulkarni to Lin before the effective filing date of the claimed invention in order to efficiently use deep neural networks. (Kulkarni, Pag.1 [0014] “large amounts of information can be used to rank search results to meet provider goals without expending vast human resource capital.”).
tagging the machine learning model as a security breach, said tagging comprising excluding the machine learning model from the resulting machine learning model. (Kulkarni, Pag. 3 [0029] “The training module 115 can be configured to iteratively refine the internal model's 135 output until a predetermined condition is met when the training module 115 receives errors in the output”. and ” In an example, the training module 115 can refine the output using one or more facilities of the internal model 135. For example, neural network systems often include a number of output “neurons” where the goal 130 can be realized as output on a first neuron and the internal model 135 observed output can be activation of a second neuron. When the first and second neurons are different, there is an observed error. The neural network can then accept the first (e.g., goal 130) neuron as input and adjust internal neuron connection weights to increase the likelihood that the second neuron is the first neuron (e.g., that the observed output equals the goal 130 output). The process can continue over several cycles, for example, until the predetermined condition is met. In an example, the predetermined condition is an error threshold.”)  
Regarding claim 10,  
Kulkarni teaches the method of claim 1, farther comprising: providing to the first user device the resulting machine learning model. (Kulkarni, Pag. 4 [0032] “Model output from the goal model 140 can be communicated by the goal module 120 to a user 145. In an example, the presentation module 155 is configured to receive the model output and construct a user interface to display the search results to the user 145.”).  
Regarding claim 12, 
Kulkarni teaches The method of claim 11, comprising: providing a new input data to the first machine learning model; (Kulkarni, Pag. 5 [0057] “In an example, the training process can involve repetition of the following: input data factors into the machine learning system; observe the results, and indicate errors to the machine learning system.”) 
based on the input data, obtaining from the first machine learning model the label, and the first confidence level associated with the label; (Kulkarni, Pag. 5 [0057] “the machine learning system can accept the inputs and apply its trained model (e.g., a weighted neural network, or decision tree) to produce the results.” and ” One advantage of this machine learning approach is that the approach itself identifies pertinent data from the inputs to achieve the desired outputs. Thus, many piece of information can be used, whether or not they are particularly pertinent, and the machine learning system will sift through and identify the important pieces.“) and
when the confidence level from the first machine learning model is below a confidence level threshold, providing the new input data to the second machine learning model. (Kulkarni, Pag. 3 [0026] “Collected observations can be fed into the machine learning system when the system is being trained. In an example, training involves inputting data, observing the results, and indicating incorrect results. The machine learning system can then adjust its model (e.g., internal model 135) and the data can be fed through again, and again the results can be observed and incorrect results can be indicated to the machine learning system. This process can repeat until a satisfactory threshold of error (or correctness), desired number of cycles, or other predetermined condition is achieved.”).
Regarding claim 21,
Kulkarni discloses, The non-transitory computer readable medium of claim 13, (Kulkarni, Pag. 6 [0068] “The disk drive unit 616 includes a machine-readable medium 622 on which is stored one or more sets of instructions (e.g., software 624).”) the operations comprising:
inspecting the plurality of machine learning models, said inspecting the plurality of machine learning models comprising identifying an overfitted machine learning model in the plurality of machine (Kulkarni, Pag. 3 [0026] “In an example, training involves inputting data, observing the results, and indicating incorrect results. The machine learning system can then adjust its model (e.g., internal model 135) and the data can be fed through again, and again the results can be observed and incorrect results can be indicated to the machine learning system. This process can repeat until a satisfactory threshold of error (or correctness), desired number of cycles, or other predetermined condition is achieved.”).
tagging the machine learning model as a security breach, said tagging comprising excluding the machine learning model from the resulting machine learning model. (Kulkarni, Pag. 3 [0029] “The training module 115 can be configured to iteratively refine the internal model's 135 output until a predetermined condition is met when the training module 115 receives errors in the output”. and ” In an example, the training module 115 can refine the output using one or more facilities of the internal model 135. For example, neural network systems often include a number of output “neurons” where the goal 130 can be realized as output on a first neuron and the internal model 135 observed output can be activation of a second neuron. When the first and second neurons are different, there is an observed error. The neural network can then accept the first (e.g., goal 130) neuron as input and adjust internal neuron connection weights to increase the likelihood that the second neuron is the first neuron (e.g., that the observed output equals the goal 130 output). The process can continue over several cycles, for example, until the predetermined condition is met. In an example, the predetermined condition is an error threshold.”)  
Regarding claim 22,  
Kulkarni teaches The non-transitory computer readable medium of claim 13, (Kulkarni, Pag. 6 [0068] “The disk drive unit 616 includes a machine-readable medium 622 on which is stored one or more sets of instructions (e.g., software 624).”)  the operations comprising;
providing to the first user device the resulting machine learning model. (Kulkarni, Pag. 4 [0032] “Model output from the goal model 140 can be communicated by the goal module 120 to a user 145. In an example, the presentation module 155 is configured to receive the model output and construct a user interface to display the search results to the user 145.”).
Regarding claim 24, 
Kulkarni teaches The non-transitory computer readable medium of claim 13, (Kulkarni, Pag. 6 [0068] “The disk drive unit 616 includes a machine-readable medium 622 on which is stored one or more sets of instructions (e.g., software 624).”) the operations comprising: 
providing a new input data to the first machine learning model; (Kulkarni, Pag. 5 [0057] “In an example, the training process can involve repetition of the following: input data factors into the machine learning system; observe the results, and indicate errors to the machine learning system.”) 
based on the input data, obtaining from the first machine learning model the label, and the first confidence level associated with the label; (Kulkarni, Pag. 5 [0057] “the machine learning system can accept the inputs and apply its trained model (e.g., a weighted neural network, or decision tree) to produce the results.” and ” One advantage of this machine learning approach is that the approach itself identifies pertinent data from the inputs to achieve the desired outputs. Thus, many piece of information can be used, whether or not they are particularly pertinent, and the machine learning system will sift through and identify the important pieces.“) and
when the confidence level from the first machine learning model is below a confidence level threshold, providing the new input data to the second machine learning model. (Kulkarni, Pag. 3 [0026] “Collected observations can be fed into the machine learning system when the system is being trained. In an example, training involves inputting data, observing the results, and indicating incorrect results. The machine learning system can then adjust its model (e.g., internal model 135) and the data can be fed through again, and again the results can be observed and incorrect results can be indicated to the machine learning system. This process can repeat until a satisfactory threshold of error (or correctness), desired number of cycles, or other predetermined condition is achieved.”).
Regarding claim 25,
Lin teaches A method to reduce memory and processor consumption required (Lin, Col. 2 line [023-025] “Further, by selectively reducing the number of predictive models to be combined, computational resources can be conserved.”) in creating a resulting machine learning model (Lin, Col. 1 line [059-62] “In the foregoing aspect, generating a final output can include: defining a combining technique based on the subset of predictive models; and combining the plurality of outputs according to the combing technique”). according to user-specified criteria, (Lin, Col. 1 line [046-047] "receiving a feature vector, the feature vector including one or more elements ;") the method comprising
obtaining from a plurality of sources a plurality of machine learning models, (Lin, Col. 5 Line [019-022] “Predictive modeling generally refers to techniques for extracting information from data to build a model that can predict an output from a given input.”) and Col. 1 Line [048-055] ”from a set of predictive models, a subset of one or more predictive models based on the element types and one or more performance indicators associated with each predictive model in the set of predictive models; processing the feature vector using the subset of predictive models, each predictive model of the subset of predictive models generating an output based on the feature vector to provide a plurality of outputs”) an input format the machine learning model receives, and an output label the machine learning model is trained to identify; (Lin, Col. 1 Line [018-020] “predictive model is trained with training data that includes input data and output data that mirror the form of input data that will be entered into the predictive model and the desired predictive output, respectively.”).
based on the plurality of machine learning models, the input format and the output label, creating a canonical machine learning model comprising canonical inputs and canonical outputs, wherein the canonical inputs receive the input format, and the canonical outputs identify the output label; (Lin, Col. 4 Line [005-010] “a machine learning algorithm can include a number of ordered steps or operations for analyzing training data and generating a predictive model. In some implementations, a machine learning algorithm can be embodied by one or more computer programs operable to receive input and emit output.”).
However, Lin does not explicitly disclose based on the canonical machine learning model, modifying inputs of a machine learning model in the plurality of machine learning models and outputs of the machine learning model to match the canonical inputs, and the canonical outputs, respectively. 
Kulkarni teaches based on the canonical machine learning model, modifying inputs of a machine learning model in the plurality of machine learning models and outputs of the machine learning model to match the canonical inputs, and the canonical outputs, respectively. (Kulkarni, Pag. 5 [0045] “At 325 the input dataset can be inputted into the machine learning system. Generally, such inputting of data will result in output of the machine learning system.”).
Lin and Kulkarni are both directed to neural networks and efficiently. In view of the teachings of Kulkarni, it would have been obvious to one of ordinary skill in the art to apply the teachings of Kulkarni to Lin before the effective filing date of the claimed invention in order to efficiently use deep neural networks. (Kulkarni, Pag.1 [0014] “large amounts of information can be used to rank search results to meet provider goals without expending vast human resource capital.”).
Regarding claim 26,
Kulkarni teaches the method of claim 25, said creating the canonical machine learning model (Kulkarni, Pag. 5 [0014] "goals can be defined as input labels into a machine learning system and correlated with factored information” and  “After the model is produced, it can accept factorized information and produce an output to find or present user search results.”) comprising:
identifying a minimal number of inputs and a minimal number of outputs, such that the canonical machine learning model upon receiving the input format, identifies the output label. (Kulkarni, Pag. 5 [0045] "At 325 the input dataset can be inputted into the machine learning system. Generally, such inputting of data will result in output of the machine learning system.").
Claim 5, & 17. is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Fink et al. (US5657255A, hereafter Fink).
Regarding claim 5,
Lin teaches The method of claim 4, said making the serial combination comprising: determining an initial machine learning model in the serial combination, and a subsequent machine learning model in the serial combination; and (Lin, Col. 10 Line [035-041] “however, output from the first layer predictive models serves as input to an implementation of a second layer predictive model represented by node F 614. As described above, the second layer predictive model can be operable to provide a final prediction based on an intermediate query including output from the first layer predictive models.”) Where the layers of the predictive models connected serially. It would have been obvious to one of ordinary skill in the art to note the serial connection.
However, Lin does not explicitly disclose creating an interface mechanism such that an input of the interface mechanism connects to an output of the initial machine learning model, and an output of the interface mechanism connects to an input of the subsequent machine learning model.
Fink disclose creating an interface mechanism such that an input of the interface mechanism connects to an output of the initial machine learning model, and an output of the interface mechanism connects to an input of the subsequent machine learning model. (Fink, Col. 12 Line [002] “FIG. 8 shows an example of Linking together two models.” And Line [013-015]” The interface 106 is essentially a model of the interaction between the two models which are being connected.”)
Lin and Fink are both directed to time efficiently. In view of the teachings of Fink, it would have been obvious to one of ordinary skill in the art to apply the teachings of Fink to Lin before the effective filing date of the claimed invention in order to efficiently use deep neural networks. (Fink, Col.1 [033-035] “Current methods of obtaining data for biological processes require extremely time consuming laboratory experiments.”).
Regarding claim 17, is  directed to articles of manufacture configured to perform methods substantially identical to those recited in claims 5. Therefore, the rejections to claims 5 apply equally here.
Lin teaches The non-transitory computer readable medium of claim 16, (Lin, Col. 10 [57-61] “A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate.”), said making the serial combination comprising: determining an initial machine learning model in the serial combination, and a subsequent machine learning model in the serial combination; and (Lin, Col. 10 Line [035-041] “however, output from the first layer predictive models serves as input to an implementation of a second layer predictive model represented by node F 614. As described above, the second layer predictive model can be operable to provide a final prediction based on an intermediate query including output from the first layer predictive models.”) Where the layers of the predictive models connected serially. It would have been obvious to one of ordinary skill in the art to note the serial connection.
However, Lin does not explicitly disclose creating an interface mechanism such that an input of the interface mechanism connects to an output of the initial machine learning model, and an output of the interface mechanism connects to an input of the subsequent machine learning model.
Fink disclose creating an interface mechanism such that an input of the interface mechanism connects to an output of the initial machine learning model, and an output of the interface mechanism connects to an input of the subsequent machine learning model. (Fink, Col. 12 Line [002] “FIG. 8 shows an example of Linking together two models.” And Line [013-015]” The interface 106 is essentially a model of the interaction between the two models which are being connected.”)
Lin and Fink are both directed to time efficiently. In view of the teachings of Fink, it would have been obvious to one of ordinary skill in the art to apply the teachings of Fink to Lin before the effective filing date of the claimed invention in order to efficiently use deep neural networks. (Fink, Col.1 [033-035] “Current methods of obtaining data for biological processes require extremely time consuming laboratory experiments.”).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAGI S. NOUMAN whose telephone number is (571) 272-8922.  The examiner can normally be reached on Mon - Fri 7:30AM - 5:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Huang, Miranda can be reached on (571) 270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/N.S.N/Examiner, Art Unit 2124                                                                                                                                                                                                       /MIRANDA M HUANG/  Supervisory Patent Examiner, Art Unit 2124