DETAILED ACTION
This office action is in response to the RCE and amendments filed 28 December 2020 for application 15/225932 filed 2 August 2016. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 28 December 2020 has been accepted. Currently claims 24-46 are pending. Claims 1-23 have been canceled. All references in the IDS have been considered. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 24-26, 28, 30, 32-34, 36, 38, 40-42, and 44 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Brand et al. (US2016/0071027, published 10 March 2016), hereinafter referred to as Brand.


In regard to claim 24, Brand teaches A computer-implemented method performed by at least a processor for executing instructions from a memory, the method comprising: storing a set of model training data and a composite model object persistently in a relational database management system, ([0041, 0046, 0150, Figure 1, Figure 2] An event stream 102 is a sequence of events, e.g., a data tuple, with each event including information that can be identified as a key-value pair map. For instance, events can include information describing a time stamp of the event., The local modelers 110a-n can aggregate information 114 associated with routed events 106. For instance, the local modelers can store data identifying a number of occurrences of a particular piece of information included in the routed events 106…The local modelers 110a-n provide this aggregated information 114 to the central modeler 120., The term “data processing apparatus’ refers to data processing hardware and encompasses all kinds of physical apparatus, devices, and machines for processing data, including … a data base management system, an operating system, or a combination of one or more of them., wherein a framework that includes the functionality of a data base management system (relational because the database is organized according to key-values that determine how particular data is differentially accessed relative to other data) includes a (persistently) stored set of model training data consisting of an aggregation of (locally) relevant events (data tuples) that are used to determine/learn model parameters associated with each component of a machine learning model in which the composite model object is the set of local modelers (each having a distinct model) in combination with one or more central modelers (Figures 1, 2).) wherein a partitioning scheme for the composite model object is associated with a partitioning key, a partitioning technique, and a plurality of partitions; ([0042, 0085, 0115, 0126, Figure 1, Figure 2] For instance, the routing node 104 can route events to local modelers 110a-n according to a routing key included in each event that identifies a local modeler., The developer can utilize the configuration file to specify a mapping of information from an event stream to the local modelers, e.g., to each local modeler independently, or to all local modelers. For instance, each event of the event stream can have three key-value pairs, described above, and the developer can identify a mapping from each key-value pair to variables used in operations of the local modeler., In some implementations the routing node 604 can perform a hashing process on the routing key, and obtain an identifier of a partition of context data. In some implementations, the routing node 604 can store information identifying mappings between ranges of routing keys and respective partitions of context data that store context data related to routing keys included in a range., In some implementations the event can be routed by performing a hashing process on information included in the event, e.g., a routing key. The routing key can identify a particular type of information included in the event, e.g., a name, a telephone number, an address, with each type of information identified as a key in a key-value pair included in the event. After performing the hashing process, a value can be obtained, e.g., by hashing the routing key, that identifies the partition of context data, e.g., the value can be mapped to an identifier of the partition., wherein the event database is partitioned according to a partition key across a plurality of partitions corresponding to the plurality of local modelers, and wherein various partitioning techniques are evident in this framework including a hash-based technique, the routing of any single event data to one or to a plurality of local modelers, the identification of the key-value pairs that determine the mapping/routing, or a hash-based technique that identifies a context data partition to map the data to.) and for each partition of the plurality of partitions, building a respective component of the composite model object using respective data from the set of model training data that corresponds to the each partition of the plurality of partitions.  ([0038, 0064, Figure 1, Figure 2] The central modeler receives aggregated information and determines parameters of a respective machine learning model. The central modeler can provide model parameters of a machine learning model to one or more other central modelers. The parameters can also be provided to the local modelers, which then apply the machine learning model to the event stream., Each central modeler can determine parameters 232 of a respective machine learning model using received aggregated information. In some implementations the central modelers 230a-n can each determine parameters of a different machine learning model. In some other implementations each central modeler determines a portion of an overall machine learning model., wherein an overall/composite model consists of sub-models trained by a central modeler using training data locally partitioned to a particular local modeler such that the overall/composite model includes the plurality of sub-models formed across the plurality of central modeler-local modeler pairs.)

In regard to claim 25, the rejection of claim 24 is incorporated and Brand further teaches further comprising: parsing a set of scoring data within the relational database system into at least one scoring data partition based on at least one scoring data partition key associated with the set of scoring data, wherein the at least one scoring data partition is identified by the at least one scoring data partition key and is loaded into a memory of the relational database system; ORA160547-US-NPPage 2 of 16Application No.: 15/225,932 Attorney Docket No.: ORA160547 (0-403)([0040, 0042, 0045, 0112, 0128] The routing node 104, e.g., an ingestion system, receives an event stream 102 of events, and routes each event to one of the local modelers 110a-n, e.g., over a network., In some implementations the routing node 104 can route events to local modelers 110a-n according to information included in each event, e.g., information included in the data tuple. For instance, the routing node 104 can route events to local modelers 110a-n according to a routing key included in each event that identifies a local modeler., The local modelers 110a-n can process each routed event 106 to perform scoring of the event stream using the machine learning model. Scoring refers to characterizing events. In Some implementations scoring refers to applying a received machine learning model, e.g., applying a set of rules determined from the machine learning model., The local modelers 610a-n each include operational memory, e.g., high-speed memory designed for fast random access by a processor, e.g., dynamic random access memory. Each of the local modelers 610a-n maintains a partition of context data, with each partition of context data maintained in operational memory by the respective local modeler. In some implementations the partition of context data 612a-n is maintained by a same operating system process executing operations of the stream processing engine 614a-n, e.g., in the same process of a JAVA virtual machine. For example, the operating system process can obtain context data for a particular event and then process the event using the context data within the same operating system process., In some implementations, the local modeler executes the operations in a same operating system process, e.g., a JAVA virtual machine process,  that also maintains a partition of context data stored in operational memory., wherein each event data received in an event stream is parsed according to the information in the data tuple associated with each event such that each event is partitioned/routed to a local modeler for scoring according to the local modeler-specific partition/routing key and wherein this scoring data is loaded into a memory of the relational database system at least by virtue of the data ingestion function which makes the event data available to be routed to different local modelers (i.e., placed in a system memory for processing at the local modelers) but also by virtue of the processing the event data in an operational memory (in which the context data is used with the event data to perform the processing of the event).) loading a respective component of the composite model object that corresponds to each partition of the at least one scoring data partition into the memory of the relational database system; ([0045, 0106] The local modelers 110a-n can store parameters of a machine learning model. The parameters can be provided to the local modelers by a central modeler 120, described below. The local modelers 110a-n can process each routed event 106 to perform scoring of the event stream using the machine learning model., The system provides the parameters to one or more local modelers (step 510). The central modeler can provide the parameters in an asynchronous call to the local modelers. The local modelers receive the parameters store them, and perform scoring of the event stream using the machine learning model., wherein each local model (local modeler) is loaded into memory in the relational database system by virtue of the local modeler receiving and storing those parameters and subsequently using the parameters of that model to score the event stream routed (partitioned) to that local modeler.) and scoring each partition of the at least one scoring data partition by applying the corresponding component of the composite model object to the each partition of the at least one scoring data partition. ([0045] The local modelers 110a-n can store parameters of a machine learning model. The parameters can be provided to the local modelers by a central modeler 120, described below. The local modelers 110a-n can process each routed event 106 to perform scoring of the event stream using the machine learning model.,  wherein each local modeler scores the event data routed (partitioned) to that modeler according to the sub-model (component of a composite/overall model) associated with/stored by the local modeler.)

In regard to claim 26, the rejection of claim 24 is incorporated and Brand further teaches further comprising: adding at least one component to the composite model object without having to re- build the composite model object.  ([0014, 0051, 0052, 0061] The process of building a model on the fly from incoming data is usually computationally intensive. For this reason, it is advantageous to split the algorithm creating the model architecturally into several components, as described above and referred to in this specification as central modelers and local modelers., The central modeler 120 stores a machine learning model 122, e.g., a Predictive Model Markup Language (PMML) file that identifies the machine learning model 122. Upon receipt of the aggregated information 114 from the local modelers 110a-n, the central modeler can determine parameters, e.g., updated parameters, to the machine learning model 122. The central modeler 120 can determine when to determine parameters, e.g., updated parameters, based on how many of the local modelers 110a-n have provided aggregated information to the central modeler 120. For example, the central modeler 120 can determine updated parameters when at least a threshold percentage, e.g., 50%, 60%, or a user definable percentage, of the local modelers 110a-n have provided aggregated information., The central modeler 120 can determine whether to provide the parameters 124 to the local modelers 110a-n. If the central modeler 120 has never provided parameters to the local modelers 110a-n, then the central modeler 120 can provide parameters for storage., If the local modelers 202a, have stored parameters of a machine learning model, they can process the event stream to perform scoring using the machine learning model, and also aggregate information in parallel., wherein a new local model is formed at the local modeler by the reception of model parameters (for the first time) by a central modeler such that this update is performed without regard to the other local models in the overall model and wherein the local model update is, in a more general sense, a substitution process in which a previous local model parameters are replaced by adding in their place new model parameters (in other words, each component in the composite model is initially formed or updated on the fly without requiring the reformation of all of the individual components of the composite model).) 

In regard to claim 28, the rejection of claim 24 is incorporated and Brand further teaches further comprising: selecting by a user one or more data fields associated with the set of model training data upon which the set of model training data is to be partitioned.  ([0084, 0114, 0122] The developer can utilize the configuration file to specify a mapping of information from an event stream to the local modelers, e.g., to each local modeler independently, or to all local modelers. For instance, each event of the event stream can have three key-value pairs, described above, and the developer can identify a mapping from each key-value pair to variables used in operations of the local modeler., The routing node 604 can identify the information to obtain in each event from a configuration file identifying a particular key-value pair, e.g., a developer can define that events are to be routed by phone number., In some other implementations the different routing node can receive the event 616a and determine a partition of context databased on information included in the event 616a, e.g., a developer can define a key-value pair in each event to route events by. The information included in the event, e.g., the key-value pair, can be different than information that the routing node 604 uses. That is, for example, a developer can identify that events should be routed according to phone number, and then routed according to last name., wherein the developer/user specifies the particular event information (fields/key-values) for routing/partitioning that event data (for either scoring or training) to particular local modelers.)

In regard to claim 30, the rejection of claim 24 is incorporated and Brand further teaches further comprising: automatically determining one or more data fields associated with the set of model training data upon which the set of model training data is to be partitioned.  ([0135, 0136] The updated routing strategy 824 provided to the routing node 804 can be one or more rules that identify a process to route events. For instance, the rules can identify that depending on the value of a particular piece of information included in events, e.g., a value mapped to a particular key, the event should be routed to particular local modelers. The rules can be represented as a series of “if then statements, e.g., conditioned on information included in events, that ultimately identify a local modeler to receive an event. Additionally, the rules can be represented as a tree., In some implementations, the central modeler 820 can update the routing strategy of the routing node 804 by varying the routing strategy to a slight degree. The central modeler 820 then can determine whether the clustering process determines clusters that better classifies events, e.g., each cluster includes events that differ by particular information included in each event. If so, the central modeler 820 can continue updating the routing strategy until it determines that the routing strategy best separates events into clusters. For example, the central modeler 820 can determine that events should be routed by a location identified in an event, e.g., events each with information identifying San Francisco should be routed differently than events each with information identifying Los Angeles based on identifying clusters of events with each cluster including events with the same location., wherein the key-value based partitioning strategy is automatically modified according to a clustering analysis of model performance to optimize the event attribute rules that partition/route the event data (e.g., routing of events related to San Francisco differently from events related to Los Angeles).

Claim 32 is also rejected because it is just a system and medium implementation of the same subject matter of claim 24 which can be found in Brand. In addition, it is noted that the claim also recites a system including a processor, memory, and computer readable memory with instructions which may also be found in Brand ([0148] Embodiments of the subject matter and the operations or actions described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer Software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of data processing apparatus.).

Claim 33/32 is also rejected because it is just a system and medium implementation of the same subject matter of claim 25/24 which can be found in Brand.

Claim 34/32 is also rejected because it is just a system and medium implementation of the same subject matter of claim 26/24 which can be found in Brand.

In regard to claim 36, the rejection of claim 32 is incorporated and Brand further teaches further comprising a visual user interface module stored in the non-transitory computer-readable medium including instructions that when executed cause the processor to provide a graphical user interface or an application program interface that facilitates: ORA160547-US-NPPage 6 of 16Application No.: 15/225,932 Filing Date: 08/02/2016 Attorney Docket No.: ORA160547 (0-403) selecting by a user one or more data fields associated with the set of model training data upon which the set of model training data is to be partitioned.  ([0084, 0092, 0114, 0122, 0158] The developer can utilize the configuration file to specify a mapping of information from an event stream to the local modelers, e.g., to each local modeler independently, or to all local modelers. For instance, each event of the event stream can have three key-value pairs, described above, and the developer can identify a mapping from each key-value pair to variables used in operations of the local modeler., In some implementations the system can provide a user interface configured to receive input from a developer identifying communication links, e.g., a stream processing graph. For instance, the interface can allow for a developer to identify stream processing vertices connected by directed edges, with each directed edge passing an event stream to a Vertex. The vertices can be graphically represented, e.g., as boxes or nodes in the user interface, and the developer can assign names or identifiers to each vertex. Each vertex can be associated with a set of operations, and upon selection of a vertex, the system can identify a local modeler that performs the set of operations., The routing node 604 can identify the information to obtain in each event from a configuration file identifying a particular key-value pair, e.g., a developer can define that events are to be routed by phone number., In some other implementations the different routing node can receive the event 616a and determine a partition of context databased on information included in the event 616a, e.g., a developer can define a key-value pair in each event to route events by. The information included in the event, e.g., the key-value pair, can be different than information that the routing node 604 uses. That is, for example, a developer can identify that events should be routed according to phone number, and then routed according to last name., Embodiments of the subject matter described in this specification can be implemented in a computing system that includes …a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter …., wherein the developer/user specifies the particular event information (fields/key-values) for routing/partitioning that event data (for either scoring or training) to particular local modelers, wherein the system may include a graphical user interface to implement various components of the modeling framework in general but, in particular, includes a graphical user interface to facilitate the association of particular nodes, corresponding event data/event streams and operations to be performed on them, to local models/modelers, and wherein it is noted that the user access and manipulation of the configuration file may be considered an application program user interface.)

In regard to claim 38, the rejection of claim 32 is incorporated and Brand further teaches  further comprising a visual user interface module stored in the non-transitory computer-readable medium including instructions that when executed cause the processor to provide a graphical user interface or an application program interface that facilitates: automatically determining one or more data fields associated with the set of model training data upon which the set of model training data is to be partitioned.  
([0048, 0132, 0134, 0135, 0136, 0158] For instance, the configuration file can identify a threshold amount of time, e.g., 50 milliseconds, 100 milliseconds, 1 second, for the central modeler to wait before providing requests to each local modeler 110a-n for aggregated information. In some implementations each local modeler can store the threshold amount of time and automatically provide the aggregated information to the central modeler 120. , The routing strategy can initially be identified by a developer, e.g., the routing node can provide events randomly to local modelers, by performing a round-robin process of local modelers, or by determining a local modeler storing context data needed to process an event, e.g., as described in reference to FIG. 7., In the process of determining parameters of the machine learning model 822, the central modeler 820 can determine that local modelers should receive events based on particular information included in each event. That is, the central modeler 820 can determine that respective local modelers should receive specific sub-populations of the event stream 802, and aggregate information associated with the Sub-population. Upon a positive determination, the central modeler 820 updates the routing strategy 824 of the routing node 804, e.g., the central modeler 820 sends information 824 specifying that events should be routed by one or more data elements in a data tuple defining each event., The updated routing strategy 824 provided to the routing node 804 can be one or more rules that identify a process to route events. For instance, the rules can identify that depending on the value of a particular piece of information included in events, e.g., a value mapped to a particular key, the event should be routed to particular local modelers. The rules can be represented as a series of “if then statements, e.g., conditioned on information included in events, that ultimately identify a local modeler to receive an event. Additionally, the rules can be represented as a tree., In some implementations, the central modeler 820 can update the routing strategy of the routing node 804 by varying the routing strategy to a slight degree. The central modeler 820 then can determine whether the clustering process determines clusters that better classifies events, e.g., each cluster includes events that differ by particular information included in each event. If so, the central modeler 820 can continue updating the routing strategy until it determines that the routing strategy best separates events into clusters. For example, the central modeler 820 can determine that events should be routed by a location identified in an event, e.g., events each with information identifying San Francisco should be routed differently than events each with information identifying Los Angeles based on identifying clusters of events with each cluster including events with the same location., Embodiments of the subject matter described in this specification can be implemented in a computing system that includes …a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter …., wherein the key-value based partitioning strategy is automatically modified according to a clustering analysis of model performance to optimize the event attribute rules that partition/route the event data (e.g., routing of events related to San Francisco differently from events related to Los Angeles) and wherein this automatic modification is facilitated by a GUI/API because the user specifies in a configuration file (API interface, with GUI functionality also noted for the system at [0158]) parameters that determine how much data (as well as temporal contexts) are processed by the central modelers (which perform the clustering to determine the automatic rerouting), that specify an initial routing/partitioning strategy that is then automatically modified, or that specify a random routing/partitioning strategy (an automatic determination of data field assignments to local modelers).)

Claim 40 is also rejected because it is just a computer readable memory implementation of the same subject matter of claim 24 which can be found in Brand. In addition, it is noted that the claim also recites a computer readable memory with instructions which may also be found in Brand ([0148] Embodiments of the subject matter and the operations or actions described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer Software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of data processing apparatus.).

Claim 41/40 is also rejected because it is just a computer readable memory implementation of the same subject matter of claim 25/24 which can be found in Brand.

Claim 42/40 is also rejected because it is just a computer readable memory implementation of the same subject matter of claim 26/24 which can be found in Brand.

Claim 44/40 is also rejected because it is just a computer readable memory implementation of the same subject matter of claim 28/24 which can be found in Brand.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 27, 31, 35, 39, 43, and 46 are rejected under 35 U.S.C. 103 as being unpatentable over Brand in view of Xiong Deng (“Dynamic Data Mining: Methodology and Algorithms”, PhD Thesis, Imperial College London Department of Computing, December 2010, pp. 1-189), hereinafter referred to as Deng.

In regard to claim 27, the rejection of claim 24 is incorporated and Brand does not further teach further comprising: removing at least one component from the composite model object without having to re-build the composite model object.  Although Brand teaches the training of model components on the fly (i.e., the updating of individual models) and teaches the evaluation of concept drift within any given local model, he does not explicitly disclose that the model component is removed from the composite model. 
However Deng, in the analogous environment of training individual models of a composite/ensemble machine learning model, teaches further comprising: removing at least one component from the composite model object without having to re-build the composite model object.  ([p. 5, Section 1.2.1, p. 100, Section 6.4.1, p. 100, Section 6.4.2, pp. 100-101 6.4.3, p. 104, section 6.4.5, Figure 7.5 ], To resolve these issues, a combination of the data categorization strategy and the ensemble models would be a good choice. That is, during online training, we categorize streaming data into data partitions, each of which may contain a valuable and different (even conflicting) concept. Then the concepts are learned into different supervised base models. Hence, during online classification, we only need to develop appropriate model selection mechanisms to choose the most matched base models with the current target concept. This is dynamic data mining., The WMS is a number of data mining models dynamically selected and weighted from the knowledge base for online prediction. Figure 6.3 has already illustrated the relationship between the knowledge base and the WMS. There are generally three model operations. • Removing degraded working models: the degraded working models represent distinct concepts from the target concept and are, therefore, removed from the WMS. • Updating weights of the remaining: if a remaining working model continuously makes correct classifications, it tends to be consistent with the target concept. Its weight is gradually increased. Otherwise, its weight is reduced to a pre-defined small value. • Adding weighted models: models are added to the WMS, if they are measured to be consistent with the target concept. These models are weighted due to the same principle., Identifying and Removing Degraded Working Models We propose to identify a degraded working model by comparing its current short-term with average classification accuracy. The inherent effectiveness of a model can be expressed by its average accuracy, and if the short-term accuracy is worse than the average accuracy, this may indicate that an inconsistent concept emerges., Updating Weights of Remaining Working Models   This phase updates the weight of each remaining working model. We introduce a rigorous weighting strategy for handling abrupt concept drift and gradual concept drift:….., Figure 6.11 further shows two examples of the online classification process under empty and non-empty WMS. During gradual concept drift, there are always models available in the working model set. The process tends to reduce the impact of the models misclassifying a previous example but to increase gradually the impact of models continuously making correct classifications., wherein each model of a set of models that comprises an ensemble/composite model is trained according to a respective (concept-based) partition of a dataset such that individual models of the ensemble of models are removed if the observed classification accuracy (evaluation/scoring over a data partition categorized according to the respective model) is excessively degraded and wherein the removal of this model is performed without rebuilding/retraining the composite object because models that are not degraded are retained in the composite model (with only an update applied to a weight indicative of how closely that model is tuned to its associated current concept).) 
	It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brand to incorporate the teachings of Deng to remove a model component from the composite model without rebuilding the composite mode. The modification would have been obvious because one of ordinary skill would have been motivated to achieve improved efficiency, accuracy, and performance of the construction of data mining ensemble models through dynamic model selection that is responsive to concept drift in which degraded models are selectively removed and non-degraded models are retained (Deng, [p. 125, Section 6.8, pp. 120-121, Section 6.6.9, Table 6.1, Table 6.7]).

In regard to claim 31, the rejection of claim 25 is incorporated and Brand does not further teach further comprising: examining the at least one scoring data partition; and upon loading a component of the composite model object and scoring each partition of the at least one scoring data partition, removing a particular component of the composite model object that is no longer being used.  Although Brand teaches the training of model components on the fly (i.e., the updating of individual models) and teaches the evaluation of concept drift within any given local model, he does not explicitly disclose that the model component is removed from the composite model. 
However Deng, in the analogous environment of training individual models of a composite/ensemble machine learning model, teaches further comprising: examining the at least one scoring data partition; and upon loading a component of the composite model object and scoring each partition of the at least one scoring data partition, removing a particular component of the composite model object that is no longer being used.  ([p. 5, Section 1.2.1, p. 100, Section 6.4.1, p. 100, Section 6.4.2, pp. 100-101 6.4.3, p. 104, section 6.4.5, Figure 7.5 ], To resolve these issues, a combination of the data categorization strategy and the ensemble models would be a good choice. That is, during online training, we categorize streaming data into data partitions, each of which may contain a valuable and different (even conflicting) concept. Then the concepts are learned into different supervised base models. Hence, during online classification, we only need to develop appropriate model selection mechanisms to choose the most matched base models with the current target concept. This is dynamic data mining., The WMS is a number of data mining models dynamically selected and weighted from the knowledge base for online prediction. Figure 6.3 has already illustrated the relationship between the knowledge base and the WMS. There are generally three model operations. • Removing degraded working models: the degraded working models represent distinct concepts from the target concept and are, therefore, removed from the WMS. • Updating weights of the remaining: if a remaining working model continuously makes correct classifications, it tends to be consistent with the target concept. Its weight is gradually increased. Otherwise, its weight is reduced to a pre-defined small value. • Adding weighted models: models are added to the WMS, if they are measured to be consistent with the target concept. These models are weighted due to the same principle., Identifying and Removing Degraded Working Models We propose to identify a degraded working model by comparing its current short-term with average classification accuracy. The inherent effectiveness of a model can be expressed by its average accuracy, and if the short-term accuracy is worse than the average accuracy, this may indicate that an inconsistent concept emerges., Updating Weights of Remaining Working Models   This phase updates the weight of each remaining working model. We introduce a rigorous weighting strategy for handling abrupt concept drift and gradual concept drift:….., Figure 6.11 further shows two examples of the online classification process under empty and non-empty WMS. During gradual concept drift, there are always models available in the working model set. The process tends to reduce the impact of the models misclassifying a previous example but to increase gradually the impact of models continuously making correct classifications., wherein each model of a set of models that comprise an ensemble/composite model is trained according to a respective (concept-based) partition of a dataset such that individual models of the ensemble of models are removed if the observed classification accuracy (evaluation/scoring over a data partition categorized according to the respective model) is excessively degraded and wherein such a degraded model designated for removal is no longer being used (i.e., is no longer useful) because it fails to reflect a concept change (either gradual or abrupt) in the data based upon the evaluation/scoring process (an evaluation which is applied to each component model in the composite/ensemble model).)  
	It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brand to incorporate the teachings of Deng to examine the results of a scoring data partition and removing a model component from the composite model because it is no longer being used. The modification would have been obvious because one of ordinary skill would have been motivated to achieve improved efficiency, accuracy, and performance of the construction of data mining ensemble models through dynamic model selection that is responsive to concept drift (Deng, [p. 125, Section 6.8, pp. 120-121, Section 6.6.9, Table 6.1, Table 6.7]).

Claim 35/32 is also rejected because it is just a system and medium implementation of the same subject matter of claim 27/24 which can be found in Brand and Deng.

Claim 39/33 is also rejected because it is just a system and medium implementation of the same subject matter of claim 31/25 which can be found in Brand and Deng.

Claim 43/40 is also rejected because it is just a computer readable memory implementation of the same subject matter of claim 27/24 which can be found in Brand and Deng.

Claim 46/41 is also rejected because it is just a computer readable memory implementation of the same subject matter of claim 31/25 which can be found in Brand and Deng.


Claims 29, 37, and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Brand in view of Brueckner et al. (US2016/0078361 published 17 March 2016), hereinafter referred to as Brueckner.

In regard to claim 29, the rejection of claim 24 is incorporated and Brand does not further teach further comprising: utilizing a table associated with the set of model training data upon which the set of model training data is to be partitioned.  
In other words, although Brand teaches key-value hash partitioning in a relational database, he does not explicitly disclose that that partitioning technique makes use of a table (e.g., [0126]).
However Brueckner, in the analogous art of partitioning in a relational database for training machine learning models, teaches further comprising: utilizing a table associated with the set of model training data upon which the set of model training data is to be partitioned. ([0085, 0112, 0150, 0179, 0193, 0200, 0221, Figure 12, Figure 27, Figure 33, Figure 38, Figure 40, Figure 44], A client request to create a data source artifact 602 may include, for example, an indication of a source URI (universal resource identifier) to which HTTP GET requests can be directed to retrieve the data records, an address of a storage object at a provider network storage service, or a database table identifier may be provided. The format (e.g., the sequence and types of the fields or columns of the data records) may be indicated in some implementations via a separate comma separated variable (csv) file., FIG. 12 illustrates example sections of a recipe, according to at least some embodiments. In the depicted embodiment, the text of a recipe 1200 may comprise four separate sections—a group definitions section 1201, an assignments section 1204, a dependencies section 1207, and an output/destination section 1210. … In at least one embodiment, a destination model (i.e., a machine learning model to which the output of the recipe transformations is to be provided) may be indicated in a separate section than the output section., In at least one embodiment, the OR extraction request 2401 may include chunking preferences 2414 indicating, for example, a particular acceptable chunk size or a range of acceptable chunk sizes. The destination(s) to which the output of the filtering operation sequence is to be directed (e.g., a feature processing recipe or a model) may be indicated in field 2416., A number of machine learning methodologies, for example techniques used for classification and regression problems, may involve the use of decision trees. … A training set 3302 comprising a plurality of observation records (ORs) such as OR 3304A, OR 3304B and OR3304C is to be used for training a model to predict the value of a dependent variable DV. Each OR in the training set 3302 contains values for some number of independent variables (IVs), such as IV1, IV2, IV3, IVn (for example, in OR 3304A, IV1's value is x, IV2's value is y, IV3's value is k, IV4's value is m, and IVn's value is q) as well as a value of the dependent variable DV (whose value is X in the case of OR 3304A)., FIG. 38 illustrates examples of a plurality of jobs that may be generated for training a model that uses an ensemble of decision trees at a machine learning service, according to at least some embodiments. In the depicted embodiment, respective training samples 3805A, 3805B and 3805C may be obtained from a larger training set 3802 (e.g., using any of a variety of sampling methodologies Such as random sampling with replacement), and each Such sample may be used to create a respective decision tree using the depth-first approach described above., The term “feature' may refer to a value (e.g., either a single numerical, categorical, or binary value, or an array of such values) of a property of an observation record indexed by a feature identifier. The term “feature vector” may refer to a set of pairs or tuples of (feature identifiers, feature values), which may, for example, be stored in a key-value structure (such as a hash map) or a compressed vector. …The term “parameter vector” may refer to a set of pairs or tuples (feature identifier, parameter), which may also be stored in a key-value structure Such as a hash map or a compressed vector.,  A model generator or trainer may then begin implementing one or more learning iterations in the depicted embodiment. A set of one or more observation records may be identified for the next learning iteration (element 5407). Depending on the nature of the observation records, some preliminary data type transformations and/or normalization operations may have to be performed (element 5410)…. . A key value structure such as a hash map or hash table may be used to store (feature identifier, parameter) pairs of the parameter vector in Some implementations, e.g., with feature identifiers as the keys., wherein a machine learning service performs partitioning of a dataset in which the feature data partitioned to train different machine learning models is represented as a table (Figures 33 and 40, hash table, data array, database table identifier), wherein the data transformation process (e.g., Figures 33, 40) is a table/array-based representation of the mapping of features (group of variables) to models  (particular destinations), and wherein it is noted that the partition is directed to learning component models of a composite model such as individual nodes in a decision tree or an ensemble of decision trees (Figure 33, Figure 38) but the partitioning process may include more generally the parsing/pruning/filtering of the data to form training data features of interest (Figure 40) or partitioning the data into training and testing data segments (e.g., Figure 27).)
	It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brand to incorporate the teachings of Brueckner to use a table associated with the partitioning of model training data. The modification would have been obvious because one of ordinary skill would have been motivated to provide an environment to users that provides broad functional control over model training to permit efficient model development, particularly for non-experts, by leveraging previously learned templates and recipes and using table-based representations of training data partitioned/allocated to particular models (Brueckner, [0053, 0059, 0155, 0201]).

In regard to claim 37, the rejection of claim 32 is incorporated and Brand does not further teach further comprising a visual user interface module stored in the non-transitory computer-readable medium including instructions that when executed cause the processor to provide a graphical user interface or an application program interface that facilitates: utilizing a table associated with the set of model training data upon which the set of model training data is to be partitioned.
However Brueckner, in the analogous art of partitioning in a relational database for training machine learning models, teaches further comprising a visual user interface module stored in the non-transitory computer-readable medium including instructions that when executed cause the processor to provide a graphical user interface or an application program interface that facilitates: utilizing a table associated with the set of model training data upon which the set of model training data is to be partitioned ([0055, 0085, 0112, 0150, 0179, 0193, 0200, 0221, Figure 12, Figure 27, Figure 33, Figure 38, Figure 40, Figure 44], According to some embodiments, a number of different types of entities related to machine learning tasks may be generated, modified, read, executed, and/or queried/ searched via MLS programmatic interfaces. Supported entity types in one embodiment may include, among others, data Sources (e.g., descriptors of locations or objects from which input records for machine learning can be obtained), sets of statistics generated by analyzing the input data, recipes (e.g., descriptors of feature processing transformations to be applied to input data for training models), processing plans (e.g., templates for executing various machine learning tasks), models (which may also be referred to as predictors), parameter sets to be used for recipes and/or models, …, A client request to create a data source artifact 602 may include, for example, an indication of a source URI (universal resource identifier) to which HTTP GET requests can be directed to retrieve the data records, an address of a storage object at a provider network storage service, or a database table identifier may be provided. The format (e.g., the sequence and types of the fields or columns of the data records) may be indicated in some implementations via a separate comma separated variable (csv) file., FIG. 12 illustrates example sections of a recipe, according to at least some embodiments. In the depicted embodiment, the text of a recipe 1200 may comprise four separate sections—a group definitions section 1201, an assignments section 1204, a dependencies section 1207, and an output/destination section 1210. … In at least one embodiment, a destination model (i.e., a machine learning model to which the output of the recipe transformations is to be provided) may be indicated in a separate section than the output section., In at least one embodiment, the OR extraction request 2401 may include chunking preferences 2414 indicating, for example, a particular acceptable chunk size or a range of acceptable chunk sizes. The destination(s) to which the output of the filtering operation sequence is to be directed (e.g., a feature processing recipe or a model) may be indicated in field 2416., A number of machine learning methodologies, for example techniques used for classification and regression problems, may involve the use of decision trees. … A training set 3302 comprising a plurality of observation records (ORs) such as OR 3304A, OR 3304B and OR3304C is to be used for training a model to predict the value of a dependent variable DV. Each OR in the training set 3302 contains values for some number of independent variables (IVs), such as IV1, IV2, IV3, IVn (for example, in OR 3304A, IV1's value is x, IV2's value is y, IV3's value is k, IV4's value is m, and IVn's value is q) as well as a value of the dependent variable DV (whose value is X in the case of OR 3304A)., FIG. 38 illustrates examples of a plurality of jobs that may be generated for training a model that uses an ensemble of decision trees at a machine learning service, according to at least some embodiments. In the depicted embodiment, respective training samples 3805A, 3805B and 3805C may be obtained from a larger training set 3802 (e.g., using any of a variety of sampling methodologies Such as random sampling with replacement), and each Such sample may be used to create a respective decision tree using the depth-first approach described above., The term “feature' may refer to a value (e.g., either a single numerical, categorical, or binary value, or an array of such values) of a property of an observation record indexed by a feature identifier. The term “feature vector” may refer to a set of pairs or tuples of (feature identifiers, feature values), which may, for example, be stored in a key-value structure (such as a hash map) or a compressed vector. …The term “parameter vector” may refer to a set of pairs or tuples (feature identifier, parameter), which may also be stored in a key-value structure Such as a hash map or a compressed vector.,  A model generator or trainer may then begin imple menting one or more learning iterations in the depicted embodiment. A set of one or more observation records may be identified for the next learning iteration (element 5407). Depending on the nature of the observation records, some preliminary data type transformations and/or normalization operations may have to be performed (element 5410)…. . A key value structure such as a hash map or hash table may be used to store (feature identifier, parameter) pairs of the parameter vector in Some implementations, e.g., with feature identifiers as the keys., wherein a GUI-based machine learning service performs partitioning of a dataset according to client specifications selected or indicated in an application program interface (API) such that the feature data partitioned to train different machine learning models is represented as a table (Figures 33 and 40, hash table, data array, database table identifier), wherein the client may specify not only the data source and goals (Figure 44, which affect/control the partitioning/filtering/transformation process) but also characteristics of entities related to the partitioned data used for training such as feature transformations (in which the client may specify grouping of variables and destination models for a given training set – Figure 12), and wherein it is noted that the partition is directed to learning component models of a composite model such as individual nodes in a decision tree or an ensemble of decision trees (Figure 33, Figure 38) but the partitioning process may include more generally the parsing/pruning/filtering of the data to form training data features of interest (Figure 40) or partitioning the data into training and testing data segments (e.g., Figure 27) in which the user also can modulate according to API inputs.)
	It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brand to incorporate the teachings of Brueckner to have a GUI/API interface that facilitates the use of a table associated with the partitioning of model training data. The modification would have been obvious because one of ordinary skill would have been motivated to provide a GUI/API environment to users that provides broad functional control over model training to permit efficient model development, particularly for non-experts, by leveraging previously learned templates and recipes and using table-based representations of training data partitioned/allocated to particular models (Brueckner, [0053, 0059, 0155, 0201]).

Claim 45/40 is also rejected because it is just a computer readable memory implementation of the same subject matter of claim 29/24 which can be found in Brand and Brueckner.



Response to Arguments
Applicant’s arguments with respect to claims 24-46 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT LEWIS KULP whose telephone number is (571)272-7983.  The examiner can normally be reached on M, Th, F 8-5:30; Tu 8-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki, can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ROBERT LEWIS KULP/Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122