DETAILED ACTION
Currently claim 1-20 are pending for application 16/685002 filed 15 November 2019. All references in the IDS have been considered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
A rejection based on double patenting of the “same invention” type finds its support in the language of 35 U.S.C. 101 which states that “whoever invents or discovers any new and useful process... may obtain a patent therefor...” (Emphasis added). Thus, the term “same invention,” in this context, means an invention drawn to identical subject matter. See Miller v. Eagle Mfg. Co., 151 U.S. 186 (1894); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Ockert, 245 F.2d 467, 114 USPQ 330 (CCPA 1957).
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-3, 5-13, and 15-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 1, 9, 9, 1, 4, 5, 2, 7, 10, 10, (1, 9), (1,9), 10, 13, and 18-20, of copending Application No. 16/684973, respectively, in view of Pratama et al. (“Deep stacked stochastic configuration networks for lifelong learning of non-stationary data streams”, Information Sciences, 495, 3 May 2019, pp. 150-174). This is a provisional non-statutory double patenting rejection since the claims directed to the same invention have not in fact been patented as follows:

Instant Application
Application No. 16/684973
Claim 1
Claim 1
A method comprising: determining, using an autonomic function executing using a processor and a memory in an artificial intelligence environment, that a fused model responsive to a new problem space has an accuracy below a threshold level of accuracy in the new problem space; 

cloning, using the autonomic function, a spliced layer in the fused model to form a cloned layer, the spliced layer having been extracted from a second model and inserted at a location in the fused model;

 inserting, using the autonomic function, the cloned layer at a second location in the fused model; 

constructing, using the autonomic function, a vector transformation, wherein the vector transformation transforms an output vector of a previous layer in an immediately previous location in the model relative to the second location; 

and fusing, using the autonomic function, the cloned layer in the fused model using the transformed output vector as input to the cloned layer, the fusing forming a deep fused model that has a revised accuracy, the revised accuracy being higher than the accuracy relative to an ontology of the new problem space.
A method comprising: causing an autonomic function to execute using a processor and a memory in an artificial intelligence environment to detect a new problem space; 

selecting, using the autonomic function, a first model, wherein the first model comprises a first trained neural network corresponding to a first ontology; 

identifying, using the autonomic function, a second model, wherein the second model comprises a second trained neural network corresponding to a second ontology; 

extracting, using the autonomic function, a layer from the second model; 

inserting, using the autonomic function, the layer into a location in the first model; 

constructing, using the autonomic function, a vector transformation, wherein the vector transformation transforms an output vector of a previous layer in an immediately previous location in the model relative to the location; 

and fusing, using the autonomic function, the layer in the first model using the transformed output vector as input to the layer, the fusing forming a fused model that is operable on an ontology of the new problem space.
Co-pending application 16/684973 fails to teach the limitation in bold above. However Pratama et al teach those limitations in ([pp. 157-158, Section 5.3, pp. 158-160, Section 5.4, p. 171, Section 7, Algorithm 1, Algorithm 2, Figure 3]), where the fused model (Figure 3) is modified in response to a level of accuracy indicative of concept drift in which this modification includes the replication (cloning) of the architecture of an existing hidden layer and its insertion (splicing), according to a second model (eSCN) with any newly spliced-in layer being a clone of a previously spliced-in layer, thereby deepening the deep architecture. It would have been obvious at the time of the filing of the applicant’s invention to modify the teachings of co-pending application 16/684973 by incorporating that a fused model responsive to a new problem space has an accuracy below a threshold level of accuracy in the new problem space; cloning, using the autonomic function, a spliced layer in the fused model to form a cloned layer, the spliced layer having been extracted from a second model and inserted at a location in the fused model; inserting, using the autonomic function, the cloned layer at a second location in the fused model; the cloned layer, the fusing forming a deep fused model that has a revised accuracy,
 as taught by Pratama et al in order to improve the robustness of the processing of non-stationary inputs by a neural network by replicating individual layers in that network in response to the detection of concept drift and tuning them to that concept drift while accordingly deepening the neural network (Pratama, [Abstract, p. 151, Section 1, Table 4]).
Claim 2
Claim 1
wherein the second location is immediate adjacent to the location of the spliced layer

Co-pending application 16/684973 fails to teach the limitation in bold above. However Pratama et al teach those limitations in ([Algorithm 1, Algorithm 2, Figure 3]), where a next layer in the fused model (Figure 3) is determined/cloned and inserted after a previously cloned and spliced/inserted layer (e.g., eSCND relative to eSCND-1). It would have been obvious at the time of the filing of the applicant’s invention to modify the teachings of co-pending application 16/684973 by incorporating that the second location is immediate adjacent to the location of the spliced layer, as taught by Pratama et al in order to improve the robustness of the processing of non-stationary inputs by a neural network by replicating individual layers in that network in response to the detection of concept drift and tuning them to that concept drift while accordingly deepening the neural network (Pratama, [Abstract, p. 151, Section 1, Table 4]).
Claim 3
Claim 9
constructing, using the autonomic function, a second vector transformation, wherein the second vector transformation transforms an output vector of the cloned layer Page 28 of 34 Docket No. P201904491AUS01into an input vector for a next layer in an immediately next location in the deep fused model relative to the second location.
constructing, using the autonomic function, a second vector transformation, wherein the second vector transformation transforms an output vector of the layer to produce an input vector for a next layer located immediately after the location.
Co-pending application 16/684973 fails to teach the limitation in bold above. However Pratama et al teach those limitations in ([pp. 157-158, Section 5.3, pp. 158-160, Section 5.4, p. 171, Section 7, Algorithm 1, Algorithm 2, Figure 3]), where the modification of the deep fused model (Figure 3) includes the replication (cloning) of the architecture of an existing hidden layer and its insertion (splicing), with a vector transformation defined on the output of a previously cloned and spliced layer into a subsequent layer. It would have been obvious at the time of the filing of the applicant’s invention to modify the teachings of co-pending application 16/684973 by incorporating construct, using the autonomic function, a second vector transformation, wherein the second vector transformation transforms an output vector of the cloned layer Page 28 of 34 Docket No. P201904491AUS01into an input vector for a next layer in an immediately next location in the deep fused model relative to the second location, as taught by Pratama et al in order to improve the robustness of the processing of non-stationary inputs by a neural network by replicating individual layers in that network in response to the detection of concept drift and tuning to that concept drift while also employing layer-to-layer transformations (Pratama, [Abstract, p. 151, Section 1, Table 4]).
Claim 5
Claim 9
wherein the previous layer is the spliced layer, and wherein the vector transformation transforms an output vector of the spliced layer into an input vector of the cloned layer.
constructing, using the autonomic function, a second vector transformation, wherein the second vector transformation transforms an output vector of the layer to produce an input vector for a next layer located immediately after the location.
Co-pending application 16/684973 fails to teach the limitation in bold above. However Pratama et al teach those limitations in ([pp. 157-158, Section 5.3, pp. 158-160, Section 5.4, p. 171, Section 7, Algorithm 1, Algorithm 2, Figure 3]), where the modification of the deep fused model (Figure 3) includes the replication (cloning) of the architecture of an existing hidden layer and its insertion (splicing), with a vector transformation defined on the output of a previously cloned and spliced layer into a subsequent layer, including when the previous layer is the spliced layer. It would have been obvious at the time of the filing of the applicant’s invention to modify the teachings of co-pending application 16/684973 by incorporating …the previous layer is the spliced layer, and wherein the vector transformation transforms an output vector of the spliced layer into an input vector of the cloned layer, as taught by Pratama et al in order to improve the robustness of the processing of non-stationary inputs by a neural network by replicating individual layers in that network in response to the detection of concept drift and tuning to that concept drift while also employing layer-to-layer transformations (Pratama, [Abstract, p. 151, Section 1, Table 4]).
Claim 6
Claim 1
selecting, using the autonomic function, a first model, wherein the first model comprises a first trained neural network corresponding to a first ontology; 

identifying, using the autonomic function, a second model, wherein the second model comprises a second trained neural network corresponding to a second ontology; 


extracting, using the autonomic function, a layer from the second model; 

splicing, using the autonomic function, the layer into the location in the first model as the spliced layer; 


constructing, using the autonomic function, a third vector transformation, wherein the third vector transformation transforms an output vector of a previous layer in an immediately previous location in the model relative to the location; 

and fusing, using the autonomic function, the layer in the first model using the transformed output vector as input to the layer, the fusing forming the fused model that is operable on an ontology of the new problem space.  
A method comprising: causing an autonomic function to execute using a processor and a memory in an artificial intelligence environment to detect a new problem space; 

selecting, using the autonomic function, a first model, wherein the first model comprises a first trained neural network corresponding to a first ontology; 

identifying, using the autonomic function, a second model, wherein the second model comprises a second trained neural network corresponding to a second ontology; 

extracting, using the autonomic function, a layer from the second model; 

inserting, using the autonomic function, the layer into a location in the first model; 

constructing, using the autonomic function, a vector transformation, wherein the vector transformation transforms an output vector of a previous layer in an immediately previous location in the model relative to the location; 

and fusing, using the autonomic function, the layer in the first model using the transformed output vector as input to the layer, the fusing forming a fused model that is operable on an ontology of the new problem space



The limitations in instant claim 6 are anticipated by claim 1 of the ‘973 application
Claim 7
Claim 4
wherein the layer is a penultimate layer in the second model
wherein the layer is a penultimate layer in the second model


The limitations in instant claim 7 are anticipated by claim 4 of the ‘973 application
Claim 8
Claim 5
wherein the location is a position of the layer in the second model relative to a last layer in the second model.
wherein the location is a position of the layer in the second model relative to a last layer in the second model

The limitations in instant claim 8 are anticipated by claim 5 of the ‘973 application
Claim 9
Claim 2
wherein the second ontology has a threshold similarity with a gap between the first ontology and the ontology of the new problem space
further comprising: selecting the location in the first model such that the layer at the location identifies a feature in a gap between a first ontology of the first model and an ontology of the new problem space
The limitations in instant claim 8 are anticipated by claim 9 of the ‘973 application
Claim 10
Claim 7
wherein the first ontology has a threshold similarity with the ontology of the new problem space
wherein the first ontology has a threshold similarity with the ontology of the new problem space
The limitations in instant claim 10 are anticipated by claim 10 of the ‘973 application


Claims 11-13 and 15-17 recite a computer product implementation of the same subject matter as claims 1-3 and 5-7, respectively, and, therefore, are rejected for the same reasons as claims 1-3 and 5-7 by claims 10, 10, (1, 9), (1,9), 10, and 13, respectively, of co-pending application ‘973, where it is noted that the method and the computer product implementations are obvious variants of one another.

In addition, dependent claims 18-19 which depend from claim 11 are rejected as follows:

Claim 18
Claim 18
wherein the stored program instructions are stored in a computer readable storage medium in a data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system
wherein the stored program instructions are stored in a computer readable storage medium in a data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system
The limitations in instant claim 18 are anticipated by claim 18 of the ‘973 application
Claim 19
Claim 19
wherein the stored program instructions are stored in a computer readable storage medium in a server data processing system, and wherein the stored program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage medium associated with the remote data processing system, further comprising: program instructions to meter use of the computer usable code associated with the request; and program instructions to generate an invoice based on the metered use
wherein the stored program instructions are stored in a computer readable storage medium in a server data processing Page 31 of 34 Docket No. P201904491US01system, and wherein the stored program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage medium associated with the remote data processing system, further comprising: program instructions to meter use of the computer usable code associated with the request; and program instructions to generate an invoice based on the metered use
The limitations in instant claim 19 are anticipated by claim 19 of the ‘973 application



 Claim 20 recites a CRM implementation of the same subject matter as claims , and, therefore, is rejected for the same reasons as claims 1 by claim 20 of co-pending application ‘973.

Claims 4 and 14 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims (1,9) and (1,9) of copending Application No. 16/684973, respectively, in view of Pratama et al. (“Deep stacked stochastic configuration networks for lifelong learning of non-stationary data streams”, Information Sciences, 495, 3 May 2019, pp. 150-174), and in further view of Kauschke et al. (“Towards Neural Network Patching: Evaluating Engagement-Layers and Patch-Architectures”, https://https://arxiv.org/abs/1812.03468, arXiv:1812.03468v2 [cs.LG] 17 Jan 2019, pp. 1-72). This is a provisional non-statutory double patenting rejection since the claims directed to the same invention have not in fact been patented as follows:


Instant Application
Application No. 16/684973
Claims 1, 4
Claims 1, 9
Claim 1:
A method comprising: determining, using an autonomic function executing using a processor and a memory in an artificial intelligence environment, that a fused model responsive to a new problem space has an accuracy below a threshold level of accuracy in the new problem space; 

cloning, using the autonomic function, a spliced layer in the fused model to form a cloned layer, the spliced layer having been extracted from a second model and inserted at a location in the fused model;

 inserting, using the autonomic function, the cloned layer at a second location in the fused model; 

constructing, using the autonomic function, a vector transformation, wherein the vector transformation transforms an output vector of a previous layer in an immediately previous location in the model relative to the second location; 

and fusing, using the autonomic function, the cloned layer in the fused model using the transformed output vector as input to the cloned layer, the fusing forming a deep fused model that has a revised accuracy, the revised accuracy being higher than the accuracy relative to an ontology of the new problem space.

Claim 4:
wherein the next layer is the spliced layer, and wherein the second vector transformation transforms the output vector of the cloned layer into an input vector for the spliced layer.
Claim 1:
A method comprising: causing an autonomic function to execute using a processor and a memory in an artificial intelligence environment to detect a new problem space; 

selecting, using the autonomic function, a first model, wherein the first model comprises a first trained neural network corresponding to a first ontology; 

identifying, using the autonomic function, a second model, wherein the second model comprises a second trained neural network corresponding to a second ontology; 

extracting, using the autonomic function, a layer from the second model; 

inserting, using the autonomic function, the layer into a location in the first model; 

constructing, using the autonomic function, a vector transformation, wherein the vector transformation transforms an output vector of a previous layer in an immediately previous location in the model relative to the location; 

and fusing, using the autonomic function, the layer in the first model using the transformed output vector as input to the layer, the fusing forming a fused model that is operable on an ontology of the new problem space.

Claim 9/1:
constructing, using the autonomic function, a second vector transformation, wherein the second vector transformation transforms an output vector of the layer to produce an input vector for a next layer located immediately after the location.
Co-pending application 16/684973 fails to teach the limitation in bold above. However Pratama et al teach those limitations in ([pp. 157-158, Section 5.3, pp. 158-160, Section 5.4, p. 171, Section 7, Algorithm 1, Algorithm 2, Figure 3]), where the fused model (Figure 3) is modified in response to a level of accuracy indicative of concept drift in which this modification includes the replication (cloning) of the architecture of an existing hidden layer and its insertion (splicing) relative to an existing layer, according to a second model (eSCN) with any newly spliced-in layer being a clone of a previously spliced-in layer, thereby deepening the deep architecture. It would have been obvious at the time of the filing of the applicant’s invention to modify the teachings of co-pending application 16/684973 by incorporating that a fused model responsive to a new problem space has an accuracy below a threshold level of accuracy in the new problem space; cloning, using the autonomic function, a spliced layer in the fused model to form a cloned layer, the spliced layer having been extracted from a second model and inserted at a location in the fused model; inserting, using the autonomic function, the cloned layer at a second location in the fused model; the cloned layer, the fusing forming a deep fused model that has a revised accuracy, as taught by Pratama et al in order to improve the robustness of the processing of non-stationary inputs by a neural network by replicating individual layers in that network in response to the detection of concept drift and tuning them to that concept drift while accordingly deepening the neural network (Pratama, [Abstract, p. 151, Section 1, Table 4]). However Pratama does not teach the insertion of the cloned in front of the spliced layer so that the transformed output of the cloned layer is inputted into the spliced layer. However, Kauschke et al teach those limitations in ([p. 7, Section 2, pp. 16-21, Section 4.1, Figure 1, Figure 6]) where an inserted new layer at an optimally selected depth in the deep neural network, with transformations relative to a succeeding layer that is not the final layer, and relative to which the new output of the patched/new layer is being directed according to the modified architecture. It would have been obvious at the time of the filing of the applicant’s invention to modify the teachings of co-pending application 16/684973 by incorporating that the next layer is the spliced layer, and that the second vector transformation transforms the output vector of the cloned layer into an input vector for the spliced layer, as taught by Kauschke et al in order to achieve improved accuracy and efficiency in the performance and training of machine learning models in response to the emergence of concept drift by modifying/augmenting an internal layer suitably selected to optimize the effectiveness of that modification without needing to retrain the entire deep neural network (Kauschke, [pp. 52-54, Section 6.1, p. 62, Section 6.6, Figure 14, Table 27]).


Claim 14 recites a computer product implementation of the same subject matter as claims 4, and, therefore, is rejected for the same reasons as claim 9/1 of co-pending application ‘973, where it is noted that the method and the computer product implementations are obvious variants of one another.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1-3, 5-13, 15-17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Pratama et al. (“Deep stacked stochastic configuration networks for lifelong learning of non-stationary data streams”, Information Sciences, 495, 3 May 2019, pp. 150-174), hereinafter referred to as Pratama, in view of Lecue et al. (“Learning from Ontology Streams with Semantic Concept Drift”, https:// https://arxiv.org/pdf/1704.07466.pdf, arXiv:1704.07466v1 [cs.AI] 24 Apr 2017, pp. 1-8), hereinafter referred to as Lecue.

In regards to claim 1, Pratama teaches A method comprising: determining, using an autonomic function executing using a processor and a memory in an artificial intelligence environment, that a fused model responsive to a new problem space has an accuracy below a threshold level of accuracy in the new problem space; ([p. 153, Section 3, pp. 157-158, Section 5.3, p. 171, Section 7, Algorithm 1, Algorithm 2], The typical characteristic of data stream is perceived in the presence of concept drift defined as the change of posterior probability P(Y|X)K = P(Y|K)K−1. Overview of DSSCN’s learning procedures are visualized in Figs. 1 and 2 under the periodic hold-out and prequential test-then-train simulation procedures., Although the notion of drift detection mechanism is often integrated in the context of ensemble learning in which the size of ensemble expands if the drift is detected, this feature can be also implemented in realm of the deep stacked network since it is formed by a collection of local learners injected with different level of abstractions of the original data points. Note that although each layer of DSSCN is connected in series, each layer locally learns data streams with minor interaction with other layers. DSSCN makes use of Hoeffding’s bound drift detection mechanism classifying data stream into three categories: normal, warning and drift [12]…. The drift phase pinpoints an uncovered concept which calls for enrichment of existing structure by integrating a new layer….  two conflict levels: αW <equation 7> … In other words, the cutting point refers to a case when a population mean increases…. From Algorithm 1, it is seen that an extra layer is inserted if the drift status is substantiated by the drift detection mechanism. This mechanism aims to incorporate a new concept using a new layer while retaining previous knowledge in preceding layers., Bias-and-variance dilemma: in the presence of drift, it can be simply said that a model suffers from an underfitting issue because it deals with unexplored regions. This issue incurs high predictive error due to the high bias problem. The appropriate step in handling the high bias problem (underfitting) is by increasing network complexity., wherein a layer is added/fused to a deep stacked neural network architecture when a detection of concept drift is performed to form a fused model such that this architecture/model is evaluated over time to detect if its performance/accuracy falls below a threshold performance on the basis of the (Hoeffding’s) drift detection algorithm in which the detection of this concept drift is indicative of inaccuracy due to underfitting such as due to an associated change in a posterior probability that predicts a classification and wherein the “autonomic function” which performs these functions are the algorithms 1 and 2 that enable the automated lifelong learning process in the presence of non-stationary drift through vertical architecture augmentation/deepening.) cloning, using the autonomic function, a spliced layer in the fused model to form a cloned layer, the spliced layer having been extracted from a second model and inserted at a location in the fused model; ([p. 151, Section 1, pp. 158-160, Section 5.4, Algorithm 1, Algorithm 2, Figure 3], DSSCN is actualized under a deep stacked neural networks structure where every layer is formed by a base-learner, eSCNs., The MIMO architecture is implemented here to infer the final predicted class label of eSCN where each rule comprises multiple output weights per each class attribute…., Once the activation degree per input attribute is elicited using (18) and ((19), it is combined using the product t-norm operator to induce the upper and lower activation degrees Gi , Gi as follows Gi = n j=1 μi,j , Gi =j=1 μi,j. (20) The use of the product t-norm operator in (20) opens possibility to apply the gradient-based learning approach compared to the min t-norm operator. … For simplicity, the final output expression of DSSCN (12) can be also expressed in one compact form as follows: <equation 12>… The MIMO architecture is implemented here to infer the final predicted class label of eSCN where each rule comprises multiple output weights per each class attribute., wherein, in response to the detection of concept drift, a layer is added to a deep stacked neural network architecture in which each successively added layer is a replication/clone of any existing layer, particularly those added in previous iterations (i.e., the replication/clone layer has the same number of nodes of an existing layer to process feature vectors of the same dimension to generate output with the same dimension namely, m outputs and either n or (2n+1) inputs), wherein the spliced layer relative to which the cloned/layer is being cloned is being interpreted as the any of the preceding layers that have been added (i.e., any of eSCN1 through eSCND-1 relative to eSCN_D in Figure 3), each and any of which has been extracted from a distinct base learner model (e.g., eSCN_3) which represents the new concept detected in the concept drift by representing the predicted output as a superposition of Gaussian basis functions of the input (as determined from bounds on the activation functions derived from the covariance matrix.) inserting, using the autonomic function, the cloned layer at a second location in the fused model; ([pp. 154-155, Section 4, Algorithm 1, Algorithm 2, Figure 3], It is worth noting that DSSCN demonstrates the fully elastic deep network structure where a new building unit can be inserted on top of the current hidden layer when the existing structure does not suffice to a given problem. All layers or building blocks are stacked and work in tandem where each layer except the bottom layer produces random shifts of original input pattern to be passed to the next layer., wherein the replication/clone layer with activations determined by a base learner (e.g., eSCD)  layer is added to the end/base of the deep stacked neural network architecture such that the second location in the fused model is after the last added/spliced layer as shown in Figure 3 and Algorithm 1 (especially line 31).), constructing, using the autonomic function, a vector transformation, wherein the vector transformation transforms an output vector of a previous layer in an immediately previous location in the model relative to the second location; ([pp. 154-155, Section 4, Algorithm 1, Algorithm 2, Figure 3], The stacked generalization principle is implemented in the next layer where the output of the first building unit is connected to the second layer and mixed with the random projection matrix P. The input of the second building block is formulated as follows: X2 = λX + αY1P1, (1) where α is a random projection constant and P1 ∈ Rm×n is the random projection matrix randomly generated in the range of [0,1] while λ is the input weight vector crafted as Section 6.1. I, wherein a vector transformation in the form of lambda*X_n (raw inputs) + alpha*Y_(D-1) (output from previous layer) * Z_(D-1) (random projection matrix) is determined in the algorithmic framework to transform the output (Y) generated from the previous layer (relative to the new replication/clone layer), thereby forming the inputs into the new replication/clone layer as shown in Figure 3.) and fusing, using the autonomic function, the cloned layer in the fused model using the transformed output vector as input to the cloned layer, the fusing forming a deep fused model that has a revised accuracy, the revised accuracy being higher than the accuracy relative to an … of the new problem space.  ([p. 156, Section 5, p. 158, Section 5.3, Algorithm 1, Algorithm 2, Figure 2, Table 3],Such base building blocks do not offer diverse information to rectify learning performance and is fused into one to relieve the computational complexity. The last step concerns with the drift detection scenario which evolves the structure of DSSCNs. This mechanism controls the depth of DSSCNs where an extra building unit is added and the structure of DSSCNs is deepened provided that a concept change is observed., We select to only adjust the last layer because it has the most adjacent relationship to the current training concept. Other layers are left untrained with the current data stream to generate different levels of feature representation and to avoid the catastrophic forgetting problem – the key success of DNNs in the continual environment. This strategy is also designated to overcome the catastrophic forgetting problem., wherein the base learner model is fused into the overall architecture with the training of that layer based on the transformed inputs from the previous layer and in response to the concept drift detection (Figure 2) leading to an updated fused model that has an improved/revised accuracy in the sense of adjusting for the concept drift (i.e., a new problem space) and resolving the biases/underfitting in the performance of the fused model.)
However, Pratama does not explicitly disclose ontology. Although Pratama teaches the detection of a new problem space through concept drift detection, he does not indicate that the concept drift/problem space is represented by an ontology. 
However, Lecue, in the analogous environment of learning concept drift in data streams, teaches … the fusing forming a … fused model that has a revised accuracy, the revised accuracy being higher than the accuracy relative to an ontology of the new problem space.  ([p. 1, Section 1, p. 2, Section 2.2,  p. 3, Section 3.1], Semantic reasoning and machine learning have been combined by revisiting features embeddings as semantic embeddings i.e., vectors capturing consistency and knowledge entailment in ontology streams. Such embeddings are then exploited in a context of supervised stream learning to learn models, which are robust to concept drifts i.e., sudden and inconsistent prediction changes., We represent knowledge evolution by a dynamic, evolutive version of ontologies [Huang and Stuckenschmidt, 2005]. Data (ABox), its inferred statements (entailments) are evolving over time while its schema (TBox) remains unchanged., Definition 4 … ABox entailment g is called an evidence entailment of the prediction change. We denote by C|T ∪A(S n 0 , i, j, ε), the set of all evidence entailments of the prediction change with an ε difference between time i and j of ontology stream S n 0 ., wherein predictive models for data streams are developed/revised (fused with new knowledge/model elements) in response to the detection of concept drift such that the emergence of the concept drift and concomitant revised models is according to/represented by an ontology for the semantic representation of the features of the data stream such that the models accordingly revised have higher accuracy in a non-stationary environment.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue for the fusing to form a deep fused model that has a revised accuracy, the revised accuracy being higher than the accuracy relative to an ontology of the new problem space.  The modification would have been obvious because one of ordinary skill would have been motivated to improve the accuracy of machine learning models that are robust to environment changes such as from concept drifts by exploiting the semantic ontological representation of the data stream (Lecue, [pp. 5-6, Section 5, p. 6, Section 6, Table 4]). 

In regards to claim 2, the rejection of claim 1 is incorporated and Pratama further teaches wherein the second location is immediate adjacent to the location of the spliced layer.  ([p. 151, Section 1, Algorithm 1, Algorithm 2, Figure 3], DSSCN is actualized under a deep stacked neural networks structure where every layer is formed by a base-learner, eSCNs., The MIMO architecture is implemented here to infer the final predicted class label of eSCN where each rule comprises multiple output weights per each class attribute…., wherein, in response to the detection of concept drift, a layer is added to a deep stacked neural network architecture in which each successively added layer is a replication/clone of any existing layer, particularly those added in previous iterations (i.e., the replication/clone layer has the same number of nodes of an existing layer to process feature vectors of the same dimension to generate output with the same dimension namely, m outputs and either n or (2n+1) inputs), including the layer immediately before/adjacent to the new layer (i.e., layer eSCN_D-1 is a spliced layer relative to the eSCN_D clone layer in Figure 3).) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue for the same reasons as pointed out for claim 1.

In regards to claim 3, the rejection of claim 1 is incorporated and Pratama further teaches constructing, using the autonomic function, a second vector transformation, wherein the second vector transformation transforms an output vector of the cloned layer Page 28 of 34 Docket No. P201904491AUS01into an input vector for a next layer in an immediately next location in the deep fused model relative to the second location. ([pp. 154-155, Section 4, Algorithm 1, Algorithm 2, Figure 3], The stacked generalization principle is implemented in the next layer where the output of the first building unit is connected to the second layer and mixed with the random projection matrix P. The input of the second building block is formulated as follows: X2 = λX + αY1P1, (1) where α is a random projection constant and P1 ∈ Rm×n is the random projection matrix randomly generated in the range of [0,1] while λ is the input weight vector crafted as Section 6.1., wherein a vector transformation in the form of lambda*X_n (raw inputs) + alpha*Y_(D-1) (output from previous layer) * Z_(D-1) (random projection matrix) transforms the output (Y) generated from the previous layer in the deep stacked neural network to a next layer such that over successive iterations in which successive concept drift detections occur, the output from the cloned layer at a previous iteration (e.g., from eSC_D) is transformed to form an input into a (new) next layer relative to the new replication/clone layer (e.g., from eSC_D to eSC_(D+1)) to form the inputs into a new replication/clone layer as shown in Figure 3.)  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue for the same reasons as pointed out for claim 1.

In regards to claim 5, the rejection of claim 1 is incorporated and Pratama further teaches wherein the previous layer is the spliced layer, and wherein the vector transformation transforms an output vector of the spliced layer into an input vector of the cloned layer.  ([pp. 154-155, Section 4, Algorithm 1, Algorithm 2, Figure 3], The stacked generalization principle is implemented in the next layer where the output of the first building unit is connected to the second layer and mixed with the random projection matrix P. The input of the second building block is formulated as follows: X2 = λX + αY1P1, (1) where α is a random projection constant and P1 ∈ Rm×n is the random projection matrix randomly generated in the range of [0,1] while λ is the input weight vector crafted as Section 6.1., wherein a vector transformation in the form of lambda*X_n (raw inputs) + alpha*Y_(D-1) (output from previous layer) * Z_(D-1) (random projection matrix) transforms the output (Y) generated from the previous layer (relative to the new replication/clone layer) to form the inputs into the new replication/clone layer as shown in Figure 3 and wherein the previous layer relative to which the clone/replication layer is formed is a spliced layer (i.e., has been spliced into fused model in a previous iteration).)  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue for the same reasons as pointed out for claim 1.

In regards to claim 6, the rejection of claim 1 is incorporated and Pratama further teaches selecting, using the autonomic function, a first model, wherein the first model comprises a first trained neural network corresponding to a first ontology; ([pp. 157-158, Section 5.3, Algorithm 1, Algorithm 2], Although the notion of drift detection mechanism is often integrated in the context of ensemble learning in which the size of ensemble expands if the drift is detected, this feature can be also implemented in realm of the deep stacked network since it is formed by a collection of local learners injected with different level of abstractions of the original data points. Note that although each layer of DSSCN is connected in series, each layer locally learns data streams with minor interaction with other layers. DSSCN makes use of Hoeffding’s bound drift detection mechanism classifying data stream into three categories: normal, warning and drift [12]…., wherein a deep stacked neural network architecture is determined/selected (according to algorithms 1 and 2) such that the deep stacked neural network architecture at a given iteration is a first trained neural network model applicable to a first problem space corresponding to the statistics of the input data used to train and test that neural network when (or before) those statistics indicate the occurrence of concept drift (i.e., the emergence of a different/new problem space).) identifying, using the autonomic function, a second model, wherein the second model comprises a second trained neural network corresponding to a second …; ([p. 151, Section 1, pp. 158-160, Section 5.4, Algorithm 1, Algorithm 2, Figure 3, Figure 4], DSSCN is actualized under a deep stacked neural networks structure where every layer is formed by a base-learner, eSCNs., The MIMO architecture is implemented here to infer the final predicted class label of eSCN where each rule comprises multiple output weights per each class attribute…., Once the activation degree per input attribute is elicited using (18) and ((19), it is combined using the product t-norm operator to induce the upper and lower activation degrees Gi , Gi as follows Gi = n j=1 μi,j , Gi =j=1 μi,j. (20) The use of the product t-norm operator in (20) opens possibility to apply the gradient-based learning approach compared to the min t-norm operator. … For simplicity, the final output expression of DSSCN (12) can be also expressed in one compact form as follows: <equation 12>… The MIMO architecture is implemented here to infer the final predicted class label of eSCN where each rule comprises multiple output weights per each class attribute., wherein, in response to the detection of concept drift (i.e., the emergence of a new/second problem space), a new model is learned/identified (Figure 4) that uses the statistics of the feature data to configure the activations/weights of a neural network layer (also Figure 4) which represents the new concept/second problem space detected in the concept drift by representing the predicted output as a superposition of Gaussian basis functions of the input (as determined from bounds on the activation functions derived from the covariance matrix).) extracting, using the autonomic function, a layer from the second model; splicing, using the autonomic function, the layer into the location in the first model as the spliced layer; ([pp. 154-155, Section 4, Algorithm 1, Algorithm 2, Figure 3], It is worth noting that DSSCN demonstrates the fully elastic deep network structure where a new building unit can be inserted on top of the current hidden layer when the existing structure does not suffice to a given problem. All layers or building blocks are stacked and work in tandem where each layer except the bottom layer produces random shifts of original input pattern to be passed to the next layer., wherein the neural network layer (eSCN in general but the weight layer in particular in Figure 4) learned by the base learner replication/clone layer is selected/extracted from the base learner model and inserted into a (new) base layer in the deep stacked neural network architecture (first model) such that this is being interpreted as forming the spliced layer relative to which a subsequent layer can be appended in a later iteration due to the emergence of a new concept drift.) constructing, using the autonomic function, a third vector transformation, wherein the third vector transformation transforms an output vector of a previous layer in an immediately previous location in the model relative to the location; ([pp. 154-155, Section 4, Algorithm 1, Algorithm 2, Figure 3], The stacked generalization principle is implemented in the next layer where the output of the first building unit is connected to the second layer and mixed with the random projection matrix P. The input of the second building block is formulated as follows: X2 = λX + αY1P1, (1) where α is a random projection constant and P1 ∈ Rm×n is the random projection matrix randomly generated in the range of [0,1] while λ is the input weight vector crafted as Section 6.1., wherein a vector transformation in the form of lambda*X_n (raw inputs) + alpha*Y_(D-2) (output from previous layer) * Z_(D-2) (random projection matrix) transforms the output (Y) generated from the previous layer (relative to a new spliced layer) to form the inputs into the new spliced layer as shown in Figure 3.) and fusing, using the autonomic function, the layer in the first model using the transformed output vector as input to the layer, the fusing forming the fused model that is operable on an … of the new problem space.  ([p. 156, Section 5, p. 158, Section 5.3, Algorithm 1, Algorithm 2, Figure 2, Table 3], Such base building blocks do not offer diverse information to rectify learning performance and is fused into one to relieve the computational complexity. The last step concerns with the drift detection scenario which evolves the structure of DSSCNs. This mechanism controls the depth of DSSCNs where an extra building unit is added and the structure of DSSCNs is deepened provided that a concept change is observed., we select to only adjust the last layer because it has the most adjacent relationship to the current training concept. Other layers are left untrained with the current data stream to generate different levels of feature representation and to avoid the catastrophic forgetting problem – the key success of DNNs in the continual environment. This strategy is also designated to overcome the catastrophic forgetting problem., wherein a base learner model is fused into the overall architecture with the training of that layer based on the transformed inputs from the previous layer and in response to the concept drift detection (Figure 2) leading to an updated fused model that has an improved/revised accuracy in the sense of adjusting for the concept drift (i.e., a new problem space).)
However, Pratama does not explicitly disclose …ontology…ontology…ontology. Although Pratama teaches the detection of a new problem space through concept drift detection, he does not indicate that the concept drift/problem space is represented by an ontology. 
However, Lecue, in the analogous environment of learning concept drift in data streams, teaches  selecting … a first model, wherein the first model … corresponding to a first ontology; identifying a second model, wherein the second model … corresponding to a second ontology… the fusing forming the fused model that is operable on an ontology of the new problem space.  ([p. 1, Section 1, p. 2, Section 2.2,  p. 3, Section 3.1], Semantic reasoning and machine learning have been combined by revisiting features embeddings as semantic embeddings i.e., vectors capturing consistency and knowledge entailment in ontology streams. Such embeddings are then exploited in a context of supervised stream learning to learn models, which are robust to concept drifts i.e., sudden and inconsistent prediction changes., We represent knowledge evolution by a dynamic, evolutive version of ontologies [Huang and Stuckenschmidt, 2005]. Data (ABox), its inferred statements (entailments) are evolving over time while its schema (TBox) remains unchanged., Definition 4 … ABox entailment g is called an evidence entailment of the prediction change. We denote by C|T ∪A(S n 0 , i, j, ε), the set of all evidence entailments of the prediction change with an ε difference between time i and j of ontology stream S n 0 ., wherein predictive models for data streams are developed/revised (fused with new knowledge/model elements) such that successively learned/augmented models formed in response to successive concept drift detections form first and second model such that the emergence of the concept drift and concomitant respective revised model is according to/represented by an ontology for the semantic representation of the features of the data stream.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue to select a first model that comprises a first trained neural network corresponding to a first ontology, identify a second model that comprises a second trained neural network corresponding to a second ontology, and fuse the layer in the first model using the transformed output vector as input to the layer with the fusing forming the fused model that is operable on an ontology of the new problem space.  The modification would have been obvious because one of ordinary skill would have been motivated to improve the accuracy of machine learning models that are robust to and are successively trained/augmented in response to the emergence of environment changes such as from concept drifts by exploiting the semantic ontological representation of the data stream (Lecue, [pp. 5-6, Section 5, p. 6, Section 6, Table 4]). 

In regards to claim 7, the rejection of claim 6 is incorporated and Pratama further teaches wherein the layer is a penultimate layer in the second model.  ([pp. 158-160, Section 5.4, Algorithm 1, Figure 3, Equation 12], Bt is the network bias set as zero for simplicity, while R is the number of hidden nodes, n is the number of input dimension and m is the number of output dimension. βi = xeWi where xe ∈ 1×(2n+1)is the extended input vector produced by the functional expansion block of second order Chebyshev function while Wi ∈ (2n+1)×1is the output connective weight vector., wherein the eSCN base learner model for the new problem space is represented by a (hidden) layer which performs a product between the transformed input vector activation function G(lambda_t X_t +B_t on RHS of equation 12) and the output connectivity weight vector such that the activation function G (along with q which fuses the upper and lower activation values) with or without the associated weights (through Betai but also evident in Figure 3) is being interpreted as forming the (hidden) penultimate layer of the eSCN model with the output corresponding to the last layer (which pools the output from all of the hidden nodes) such that this penultimate layer is inserted into the deep stacked architecture with that architecture continuing to have an output layer.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue for the same reasons as pointed out for claim 6.

In regards to claim 8, the rejection of claim 7 is incorporated and Pratama further teaches wherein the location is a position of the layer in the second model relative to a last layer in the second model.  ([pp. 158-160, Section 5.4, Algorithm 1, Figure 3, Equation 12], Bt is the network bias set as zero for simplicity, while R is the number of hidden nodes, n is the number of input dimension and m is the number of output dimension. βi = xeWi where xe ∈ 1×(2n+1)is the extended input vector produced by the functional expansion block of second order Chebyshev function while Wi ∈ (2n+1)×1is the output connective weight vector., wherein the eSCN base learner model for the new problem space is represented by a (hidden) layer is inserted into the deep stacked architecture at a location that has the same relationship with the last layer in the deep stacked architecture as the hidden layer in the base learner model has with respect to its last layer; in other words, the spliced layer location is located just before the output layer in the deep stacked architecture which corresponds also to the position of that the layer had in the second model also relative to the output layer.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue for the same reasons as pointed out for claim 6.

In regards to claim 9, the rejection of claim 6 is incorporated and Pratama further teaches wherein the second … has a threshold similarity with a gap between the first … and the … of the new problem space.  ([pp. 157-158, Section 5.3, Algorithm 1, Algorithm 2], DSSCN makes use of Hoeffding’s bound drift detection mechanism classifying data stream into three categories: normal, warning and drift [12]…. The drift phase pinpoints an uncovered concept which calls for enrichment of existing structure by integrating a new layer….  The three conditions of data streams, namely normal, warning and drift are determined from the Hoeffding test applying two conflict levels: αW (warning), αD(drift) which correspond to the confidence level of Hoeffding’s statistics as follows: <equation 7>…  The drift status is returned if the null hyphotesis is rejected with the size of αD, while the warning status makes use of αW to reject the null hyphotesis. The null hyphotesis is defined as H0 : E(Zˆ 1) ≤ E(Zˆ 2) while its alternative is formulated just as the opposite. The null hyphotesis is rejected if |Zˆ 1 − Zˆ 2|   ≥ ε where ε is found from (7) by applying the specific significance level αW, αD. The hypothesis test is meant to investigate the increase of population mean which hints the presence of concept drift … Suppose that dist = |Zˆ 1 − Zˆ 2|, the three levels, namely drift, warning and drift, are signalled in respect to the following conditions…., wherein a layer is added/fused to a deep stacked neural network architecture when a detection of concept drift is performed such that this architecture/model is evaluated over time to detect if its performance/accuracy falls below a threshold performance on the basis of the (Hoeffding’s) drift detection algorithm in which the feature space statistical representation before the concept drift differs sufficiently from (i.e., has a similarity threshold with a gap) the feature space statistical representation after the emergence of concept drift (i.e., the new problem space) such that the similarity function corresponds to a comparison of population means with a threshold epsilon_drift, the satisfaction of which is indicative of a gap between the first problem space and the second problem space.) 
However, Pratama does not explicitly disclose …ontology…ontology…ontology. Although Pratama teaches the detection of a new problem space through concept drift detection, he does not indicate that the concept drift/problem space is represented by an ontology. 
However, Lecue, in the analogous environment of learning concept drift in data streams, teaches  wherein the second ontology has a threshold similarity with a gap between the first ontology and the ontology of the new problem space ([pp. 2-3, Section 2.3, p. 3, Section 3.1],  Definition 3… An Ontology Stream Learning Problem, noted OSLPhSn 0 , k, T , A, gi, is the problem of estimating whether g can be entailed from T and A at time k ∈ (0, n] of stream S n 0 , given knowledge at time t < k of S n 0 . This estimation is denoted as p|T ∪A(S n 0 (k) |= g) with values in [0, 1] and k ≥ 1….<equation 29> …, Definition 4 … A prediction change in S n 0 is ocuring between time i and j in [0, n] with respect to T , A and its entailments iff: <equation 30> … Abruptness captures disruptive changes from a semantic perspective i.e., conflicting knowledge among snapshots S n 0 (i), S n 0 (j) with respect to background knowledge T ∪ A., wherein the ontologically-driven predictive models are modified if the semantics of the environment have changed such that the lack of a semantic representation best suited for a current environment is reflected as a knowledge gap or a prediction change/gap (dissimilarity)  as quantified by a threshold change in the entailment predictions in the ontology model (particularly as seen in equation 30).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue for the second ontology to have a threshold similarity with a gap between the first ontology and the ontology of the new problem space. The modification would have been obvious because one of ordinary skill would have been motivated to improve the accuracy of machine learning models that are robust to and are successively trained/augmented in response to the emergence of environment changes such as from concept drifts by exploiting the semantic ontological representation of the data stream (Lecue, [pp. 5-6, Section 5, p. 6, Section 6, Table 4]). 

In regards to claim 10, the rejection of claim 6 is incorporated and Pratama further teaches wherein the first … has a threshold similarity with the … of the new problem space.  ([pp. 157-158, Section 5.3, Algorithm 1, Algorithm 2], DSSCN makes use of Hoeffding’s bound drift detection mechanism classifying data stream into three categories: normal, warning and drift [12]…. The drift phase pinpoints an uncovered concept which calls for enrichment of existing structure by integrating a new layer….  The three conditions of data streams, namely normal, warning and drift are determined from the Hoeffding test applying two conflict levels: αW (warning), αD(drift) which correspond to the confidence level of Hoeffding’s statistics as follows: <equation 7>…  The drift status is returned if the null hyphotesis is rejected with the size of αD, while the warning status makes use of αW to reject the null hyphotesis. The null hyphotesis is defined as H0 : E(Zˆ 1) ≤ E(Zˆ 2) while its alternative is formulated just as the opposite. The null hyphotesis is rejected if |Zˆ 1 − Zˆ 2|   ≥ ε where ε is found from (7) by applying the specific significance level αW, αD. The hypothesis test is meant to investigate the increase of population mean which hints the presence of concept drift … Suppose that dist = |Zˆ 1 − Zˆ 2|, the three levels, namely drift, warning and drift, are signalled in respect to the following conditions…., wherein a layer is added/fused to a deep stacked neural network architecture when a detection of concept drift is performed such that this architecture/model is evaluated over time to detect if its performance/accuracy falls below a threshold performance on the basis of the (Hoeffding’s) drift detection algorithm in which the feature space statistical representation before the concept drift differs sufficiently from (i.e., has a similarity threshold) the feature space statistical representation after the emergence of concept drift (i.e., the new problem space) such that the similarity function corresponds to a comparison of population means with a threshold epsilon_drift, the satisfaction of which is indicative of a dissimilarity between the first problem space and the second problem space.)
However, Pratama does not explicitly disclose …ontology…ontology. Although Pratama teaches the detection of a new problem space through concept drift detection, he does not indicate that the concept drift/problem space is represented by an ontology. 
However, Lecue, in the analogous environment of learning concept drift in data streams, teaches  wherein the first ontology has a threshold similarity with the ontology of the new problem space ([pp. 2-3, Section 2.3, p. 3, Section 3.1],  Definition 3… An Ontology Stream Learning Problem, noted OSLPhSn 0 , k, T , A, gi, is the problem of estimating whether g can be entailed from T and A at time k ∈ (0, n] of stream S n 0 , given knowledge at time t < k of S n 0 . This estimation is denoted as p|T ∪A(S n 0 (k) |= g) with values in [0, 1] and k ≥ 1….<equation 29> …, Definition 4 … A prediction change in S n 0 is ocuring between time i and j in [0, n] with respect to T , A and its entailments iff: <equation 30> … Abruptness captures disruptive changes from a semantic perspective i.e., conflicting knowledge among snapshots S n 0 (i), S n 0 (j) with respect to background knowledge T ∪ A., wherein the ontologically-driven predictive models are modified if the semantics of the environment have changed such that the lack of a semantic representation best suited for a current environment is reflected as a knowledge dissimilarity or a prediction change (dissimilarity or relative similarity)  as quantified by a threshold change in the entailment predictions in the ontology model (particularly as seen in equation 30).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama to incorporate the teachings of Lecue for first ontology to have a threshold similarity with the ontology of the new problem space. The modification would have been obvious because one of ordinary skill would have been motivated to improve the accuracy of machine learning models that are robust to and are successively trained/augmented in response to the emergence of environment changes such as from concept drifts by exploiting the semantic ontological representation of the data stream (Lecue, [pp. 5-6, Section 5, p. 6, Section 6, Table 4]). 

Claim 11 is also rejected because it is just a product implementation of the same subject matter of claim 1 which can be found in Pratama and Lecue. It is noted that claim 11 also recites a computer product with computer-readable storage medium for instructions which can be found in Pratama ([pp. 162-163, Section 6, Algorithm 1, Algorithm 2, Figure 9] , Simulations are undertaken under MATLAB environment of an Intel (R) Core i5-6600 CPU @ 3.3 GHZ with 8 GB of RAM and the MATLAB code of DSSCN is made publicly available.) 

Claim 12/11 is also rejected because it is just a product implementation of the same subject matter of claim 2/1 which can be found in Pratama and Lecue.

Claim 13/11 is also rejected because it is just a product implementation of the same subject matter of claim 3/1 which can be found in Pratama and Lecue.

Claim 15/11 is also rejected because it is just a product implementation of the same subject matter of claim 5/1 which can be found in Pratama and Lecue.

Claim 16/11 is also rejected because it is just a product implementation of the same subject matter of claim 6/1 which can be found in Pratama and Lecue.

Claim 17/16 is also rejected because it is just a product implementation of the same subject matter of claim 7/6 which can be found in Pratama and Lecue.

Claim 20 is also rejected because it is just a computer-readable storage medium implementation of the same subject matter of claim 1 which can be found in Pratama and Lecue. It is noted that claim 20 also recites a computer-readable storage medium for instructions which can be found in Pratama ([pp. 162-163, Section 6, Algorithm 1, Algorithm 2, Figure 9] , Simulations are undertaken under MATLAB environment of an Intel (R) Core i5-6600 CPU @ 3.3 GHZ with 8 GB of RAM and the MATLAB code of DSSCN is made publicly available.) 

Claim 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Pratama, in view of Lecue, and in further view of Kauschke et al. (“Towards Neural Network Patching: Evaluating Engagement-Layers and Patch-Architectures”, https:// https://arxiv.org/abs/1812.03468, arXiv:1812.03468v2 [cs.LG] 17 Jan 2019, pp. 1-72), hereinafter referred to as Kausche.

In regards to claim 4, the rejection of claim 3 is incorporated and Pratama and Lecue do not further teach wherein the next layer is the spliced layer, and wherein the second vector transformation transforms the output vector of the cloned layer into an input vector for the spliced layer.  Although Pratama teaches the vector transformation of output data from any given layer to the input of its respective succeeding layer, Pratama only clearly discloses the insertion of the cloned layer after the spliced layer.
However, Kauschke, in the analogous environment of the adaptation of neural networks to handle concept drift teaches wherein the next layer is the spliced layer, and wherein the second vector transformation transforms the output vector of the cloned layer into an input vector for the spliced layer.  ([p. 7, Section 2, pp. 16-21, Section 4.1, Figure 1, Figure 6], Divert classification from M to P, if E is confident. When an instance is to be classified, the error detector E is executed. If the result is positive, classification is diverted to P, otherwise to M., For some architectures, well performing engagement layers are found in the higher layers of the network, close to the output layer, whereas for other network architectures it is preferable to choose engagement layers close to the network input…. Figure 6(a) shows the layerwise patching performance for a fully-connected network architecture. The optimal engagement layer in the presented configuration is the second fully-connected layer of the network. We observe that the average and final accuracy increase up to the second fully-connected layer… The accuracy maximum is reached at the fifth convolutional layer as engagement layer for the patch. The following pooling layer shows a marginal loss in average and final accuracy., wherein the particular inner layer (or region of inner layers) of a (stacked) deep neural network subject to augmentation in response to concept drift detection is identified according to the type of deep neural network such that any patched internal layer in a sequence of similarly structured (spliced or cloned) layers is being interpreted as an inserted new layer with transformations relative to a succeeding layer that is not the final layer and relative to which the new output of the patched/new layer is being directed according to the modified architecture.)  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama and Lecue to incorporate the teachings of Kauschke for the next layer to be the spliced layer and for the second vector transformation to transform the output vector of the cloned layer into an input vector for the spliced layer.  The modification would have been obvious because one of ordinary skill would have been motivated to achieve improved accuracy and efficiency in the performance and training of machine learning models in response to the emergence of concept drift by modifying/augmenting an internal layer suitably selected to optimize the effectiveness of that modification without needing to retrain the entire deep neural network  (Kauschke, [pp. 52-54, Section 6.1, p. 62, Section 6.6, Figure 14, Table 27]). 

Claim 14/13 is also rejected because it is just a product implementation of the same subject matter of claim 4/3 which can be found in Pratama, Lecue, and Kauschke.

Claim 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Pratama, in view of Lecue, and in further view of  Piekniewski et al. (US 2014/0379623, published 25 December 2014)

In regards to claim 18, the rejection of claim 11 is incorporated and Pratama further teaches wherein the stored program instructions are stored in a computer readable storage medium in a data processing system, …([pp. 162-163, Section 6, Figure 9] , Simulations are undertaken under MATLAB environment of an Intel (R) Core i5-6600 CPU @ 3.3 GHZ with 8 GB of RAM and the MATLAB code of DSSCN is made publicly available…. Another data stream problem is picked up from our indoor RFID localization problem in the manufacturing shopfloor where the underlying goal is to locate the position of raw materials in the production line. DSSCN is also compared with data stream algorithms: pENsemble [32], Learn++.NSE [11], Learn++.CDS [10], pClass [28], eT2Class [31]., wherein a code-based framework is implemented that analyzes streamed data for concept drift emergence detection and responsive model modification in which that implementation takes place in a server/sensor-based data collection and processing environment (but also, alternatively, implicitly for the simulation analysis) and wherein it is noted that this dependent claim lacks patentable weight because it is merely an intended use.)  
However, Pratama and Lecue do not explicitly teach and wherein the stored program instructions are transferred over a network from a remote data processing system.  In other words, Pratama does not teach remote access to his code/method such as via a subscription; rather Pratama appears to suggest the free dissemination of the code. Lecue likewise does not teach a subscription access.
However, Piekniewski, in the analogous environment of the adaptation of neural networks to handle concept drift teaches wherein the stored program instructions are stored in a computer readable storage medium in a data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system ([0011, 0115, Figure 6], In one aspect, a method of operating a node of network is disclosed. In one embodiment, the method includes: Scaling individual inputs of a plurality of inputs received by the node via a plurality of connections, the scaling using at least a transformation to produce a plurality of scaled inputs;…, In one exemplary implementation, a web based repository of network plug-ins “images” (e.g., processor-executable instructions configured to implement input transformation and/or scaling in a neuron network) is introduced. Developers may utilize e.g., a “cloud' web repository to distribute the input transformation plug-ins. Users may access the repository (Such as under a subscription, per-access, or other business model), and browse plug-ins created by developers and/or other users much as one currently browses online music download venues. Plug-in modules may be also offered (e.g., for purchase, as an incentive, free download, or other consideration model) via the repository in an online “app” store model. Other related content such as user-created media (e.g., a code and/or a description outlining the input transformation methodology) may available through the repository, and Social forums and links., wherein a framework for processing inputs (data stream) by a neural network (including transformation of inputs at a given node according to features of the input data) is offered according to various business models that provide remote access to the associated programs including via subscription.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama and Lecue to incorporate the teachings of Piekniewski for the stored program instructions to be transferred over a network from a remote data processing system.  The modification would have been obvious because one of ordinary skill would have been motivated to enhance user experience by providing convenient access to software that can be used to improve the effectiveness/accuracy in the processing of data by a neural network  (Piekniewski, [0083, 0118, 0121]). 

In regards to claim 19, the rejection of claim 11 is incorporated and Pratama further teaches wherein the stored program instructions are stored in a computer readable storage medium in a server data processing system, …([pp. 162-163, Section 6, Figure 9] , Simulations are undertaken under MATLAB environment of an Intel (R) Core i5-6600 CPU @ 3.3 GHZ with 8 GB of RAM and the MATLAB code of DSSCN is made publicly available…. Another data stream problem is picked up from our indoor RFID localization problem in the manufacturing shopfloor where the underlying goal is to locate the position of raw materials in the production line. DSSCN is also compared with data stream algorithms: pENsemble [32], Learn++.NSE [11], Learn++.CDS [10], pClass [28], eT2Class [31]., wherein a code-based framework is implemented that analyzes streamed data for concept drift emergence detection and responsive model modification in which that implementation takes place in a server/sensor-based data collection environment (but also, alternatively, implicitly for the simulation analysis) and wherein it is noted that this dependent claim lacks patentable weight because it is merely an intended use.)  
However, Pratama and Lecue do not explicitly teach and wherein the stored program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage medium associated with the remote data processing system, further comprising: program instructions to meter use of the computer usable code associated with the request; and program instructions to generate an invoice based on the metered use.  In other words, Pratama does not teach access to his code/method using a subscription; rather Pratama appears to suggest the free dissemination of the code. Lecue likewise does not teach a subscription access.
However, Piekniewski, in the analogous environment of the adaptation of neural networks to handle concept drift teaches wherein the stored program instructions are stored in a computer readable storage medium in a server data processing system  and wherein the stored program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage medium associated with the remote data processing system, further comprising: program instructions to meter use of the computer usable code associated with the request; and program instructions to generate an invoice based on the metered use. ([0011, 0115, Figure 6], In one aspect, a method of operating a node of network is disclosed. In one embodiment, the method includes: Scaling individual inputs of a plurality of inputs received by the node via a plurality of connections, the scaling using at least a transformation to produce a plurality of scaled inputs;…, In one exemplary implementation, a web based repository of network plug-ins “images” (e.g., processor-executable instructions configured to implement input transformation and/or scaling in a neuron network) is introduced. Developers may utilize e.g., a “cloud' web repository to distribute the input transformation plug-ins. Users may access the repository (Such as under a subscription, per-access, or other business model), and browse plug-ins created by developers and/or other users much as one currently browses online music download venues. Plug-in modules may be also offered (e.g., for purchase, as an incentive, free download, or other consideration model) via the repository in an online “app” store model. Other related content such as user-created media (e.g., a code and/or a description outlining the input transformation methodology) may available through the repository, and Social forums and links., wherein a framework for processing inputs (data stream) by a neural network (including transformation of inputs at a given node according to features of the input data) is offered according to various business models including subscription, pay per access (metered use), or periodic payment.) 

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pratama and Lecue to incorporate the teachings of Piekniewski for the stored program instructions to be downloaded over a network to a remote data processing system for use in a computer readable storage medium associated with the remote data processing system including program instructions to meter use of the computer usable code associated with the request and program instructions to generate an invoice based on the metered.  The modification would have been obvious because one of ordinary skill would have been motivated to enhance user experience by providing convenient access to software that can be used to improve the effectiveness/accuracy in the processing of data by a neural network  (Piekniewski, [0083, 0118, 0121]). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Draelos et al. (US2017/0177993, 22 June 2017) teach the addition of new nodes as well as a new layer to a deep neural network in response to the emergence of concept drift.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT LEWIS KULP whose telephone number is (571)272-7983. The examiner can normally be reached M, Th, F 8-5:30; Tu 8-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT LEWIS KULP/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124