Detailed Action
This action is in response to Applicant's communications filed 25 April 2022.
Claim(s) 1, 4, 5, 19, and 25 was/were amended.  No claims were cancelled. No claims were withdrawn.  No claims were added.  Therefore, claims 1-30 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments/Arguments
Applicant's amendments, filed 25 April 2022, regarding the rejections of claims 4 and 5 under 35 USC 112(b) have been fully considered and are sufficient to overcome the rejections.  Accordingly, the rejections to the claims under 35 USC 112(b) have been withdrawn.
Applicant's arguments, filed 25 April 2022, regarding the rejections of claims 1-30 under 35 USC 103 have been fully considered but are not persuasive.
In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).
Applicant argues that the motivation to combine Townsend and Doshi  is flawed because Doshi does something Townsend already does, which is to provide explanations of how data is used and to gain insight of machine learning methods.  However, that strengthens how Townsend and Doshi are analogous are and increases the likelihood a person of ordinary skill in the art would combine the two references.
Applicant's arguments, filed 25 April 2022, regarding the rejections of claims 1-30 under 35 USC 103 have been fully considered but are moot because the arguments do not apply to any of the references being used in the current rejection.
Applicant’s arguments, filed 25 April 2022, with respect to the rejections of claims 1-30 under 35 USC 103 are regarding newly amended claims and are addressed in the current rejection. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-11, 14, 16-17, 19-21, 24-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Townsend et al. (Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective, hereinafter "Townsend") in view of Doshi et al. (US 2019/0370647, hereinafter "Doshi") and Gama et al. (Hierarchical Overlapping Clustering of Network Data Using Cut Metrics, hereinafter "Gama").

Regarding Claim 1,
Townsend teaches system for providing an interpretable neural network embodied on a non-transitory computer readable medium, comprising:
an input ("input" sec. II.B, p. 3458);
a feature generation network configured to identify a plurality of features from the input (Fig. 5, Quantisation, Feature Space Exploration, Extraction, p. 3459; "1) Quantization of Features: Quantization is the process of mapping features deﬁned in a continuous space to a discrete space, using one of a number of methods... The continuous form is often taken directly from the raw feature values, i.e., raw activation values output by corresponding neurons, but may also be taken as the feature importance as derived by the corresponding method. Some extraction methods can only quantize features from a speciﬁc network layer (input, hidden, or output). Where Table I does not specify a layer, the extraction method is layer-agnostic. Discrete forms include, but are not limited to, states in ﬁnite state machines (FSMs), antecedents of rules in logic pro-grams, words in natural language models, or masks produced by thresholding (usually applied to images). In some cases, the choice of the discrete form may depend on the problem domain. Methods through which feature values/vectors are quantized mainly consist of means by which state vectors are assigned into distinct groups either by partitioning feature space and assigning a representative discrete vector to each partition" sec. III.B, p. 3459; quantizing of features teaches feature generation network);
one or more relevance estimators ("Methods that identify such regions are referred to in a number of ways in the literature (e.g., importance, salience, or relevance), but we refer to them collectively as importance methods." sec. V, p. 3462), each relevance estimator configured to calculate a coefficient associated with one or more features of the input ("the importance of a feature is derived from the observed effect on classiﬁcation score for a given class when removing or modifying that feature. The principle is portable beyond the scope of neural networks, but complexity is increased as a backward pass must be computed for each and every feature examined." sec. V.C, p. 3464);
a conditional network configured to evaluate a plurality of rules, each rule comprising at least an IF-condition, wherein each IF-condition activates one or more partitions, each partition comprising one or more features ("Methods through which feature values/vectors are quantized mainly consist of means by which state vectors are assigned into distinct groups either by partitioning feature space and assigning a representative discrete vector to each partition, or by specifying interval-based conditions in the rules (e.g., IF 0.1 < x ≤ 0.5 THEN DO a ELSE DO b)." sec. III.B, p. 3459);
a feature attribution layer provided downstream of the one or more relevance estimators (Figs. 7,8; the figures provide examples of using the relevance estimators in a knowledge space and extracting and embedding that knowledge downstream as representations of a program, p. 3462) and configured to, based on the coefficient associated with the one or more features of the input (Table 1, p. 3460, the first column displays many methods of knowledge extraction, which teaches the relevance estimators, while the fourth through sixth column provides information regarding the quantization of features, which provides the feature vectors/value/importance that teaches the coefficients), calculate an attribution value of the features associated with the partitions activated by the conditional network ("These early methods used hierarchical clustering analysis to partition states into distinct groups [31]–[34]. These and similar ideas were later adapted to rule extraction methods." sec. IV.A, p. 3461; "The general idea of importance-based explanation models is to detect activity throughout the network that contributes to classiﬁcation decisions made by that network and then project that activity back onto the layer being inspected. Alternatively or additionally, a subset of features can be selected for inspection and have their activity projected onto the original input. Furthermore, some methods consider individual ﬁlters as corresponding to speciﬁc semantic concepts [6], [25]." sec. V, p. 3462; calculating importance teaches the attribution value, partition analysis and subset of features teaches the that the attribution value is for the partition; "Where Table I does not specify a layer, the extraction method is layer-agnostic." sec. III.B, p. 3459);
aggregating a plurality of predictive results for each of the activated partitions (Fig. 12 shows an example of a method for aggregating predictive results for interpreting the digits 0 through 9, p.3465), into a partition hierarchy comprising a plurality of hierarchical partitions (Fig. 12; "Graphs and Trees: A graph is a set of nodes representing concepts and edges representing the relationships between those concepts. A tree is a form of a hierarchical graph. In decision trees each node represents a choice, and these trees are easily represented as logic programs by treating each path from the root to a leaf node as a rule (Fig. 1)." sec. II.A, p. 3458), the hierarchical partitions comprising at least one pair of overlapping hierarchical partitions (Fig. 12, p. 3465; the first pair of hierarchical partitions divides the numbers from 0-9 into "1,2,5,6,7,8" and "0,2,3,4,8,9".  As shown, these two partitions overlap for numbers 2 and 8; the next set of partitions divide "1,2,5,6,7,8" into "1,6,7,8" and "2,5,6,7" which overlap for numbers 6 and 7) and at least one pair of non-overlapping hierarchical partitions (Fig. 12, p. 3465; the lower set of partitions divide each range into a single digit, such as dividing "1,7" to 1 and 7.  These teach non-overlapping hierarchical partitions), wherein the aggregation layer is configured to aggregate the plurality of predictive results by splitting at least one hierarchical partition (Figs. 12 and 14, pp. 3465-3466; at each node or condition the tree splits each partition into two leaves; "A heuristic search is then performed to find a set of conditions on features of each sample at the current node to split the set of samples into two categories such that information gain is maximized. Two child nodes are then created, and the process repeats for each child node until a stopping condition is met." sec. VI.B, p. 3466).
an output layer configured to provide an output comprising an answer and explanation based on the predictive results and activated partitions ("LOcal Rule-based Explanations (LORE) [59], [60] is a model-agnostic method that also yields tree-based explanations. The LORE samples local neighborhoods using a genetic algorithm that takes a sample to be explained and outputs a population of samples that are similar in feature space but may or may not belong to the target class (see Fig. 15). The output population is then used to derive a tree-based local explanation using the C4.5 algorithm [81]. The LORE was later extended to global explanations by combining similar local ones [60].  Anchors [61] is an extension of LIME to rule-based expla-nations. Starting with the instance to be explained, Anchors searches for features that have no effect on the decision outcome and excludes these from the rule construction. The remaining conditions are the anchors that anchor the instance in feature space; i.e., any sample that satisﬁes all of these conditions will be assigned to the same class. The task of ﬁnding anchors is equivalent to the CILP (Section IV-C) identifying the bare minimal conditions (i.e., the anchors) that are required to satisfy a rule. However, the anchor method treats the search as a multi-arm bandit problem [82]." sec. VI.B, p. 3466).

While Townsend teaches aggregating a plurality of predictive results for each of the activated partitions as discussed above, it does not explicitly teach an aggregation layer configured to aggregate a plurality of predictive results for each of the activated partitions
Doshi teaches an aggregation layer configured to aggregate a plurality of predictive results for each of the activated partitions ("The linear activations can be processed by a detector stage 918. In the detector stage 918, each linear activation is processed by a non-linear activation function...Various types of pooling functions can be used during the pooling stage 920, including max pooling, average pooling, and 12-norm pooling." [0098]-[0099]; this teaches a layer that pools (aggregates) output from non-linear activation functions into the next layer's partitions).
Townsend and Doshi are analogous art because both are directed to explainable artificial intelligence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural networks of Townsend and Doshi for an improved explainable neural network.  The modification would have been obvious because one of ordinary skill in the art would be motivated to better understand  how decisions are made in neural networks, as suggested by Townsend ("There are multiple motivations for so-called explainable AI... people want to know how their data are being used... Another motivation is to gain scientific insight... A third may be to discover and explain flaws in machine learning methods and architectures in order to improve accuracy, efficiency, or both" sec. 1, p. 3456) and Doshi ("developers wish to gain visibility into how decisions are reached in processing systems, including deep neural networks" [0003]).

	The Townsend/Doshi combination does not explicitly teach wherein the aggregation layer is configured to aggregate the plurality of predictive results by merging at least one pair of hierarchical partitions.
	Gama teaches wherein the aggregation layer is configured to aggregate the plurality of predictive results by merging at least one pair of hierarchical partitions (Fig. 4, when 2.5 < δ <3, a pair of hierarchical partitions are shown, when δ < 3 they are merged into one partition; "A nested collection of quasi-coverings reﬂects the different levels of similarity present in the network. More speciﬁcally, as δ grows, covers would typically start exercising inﬂuence one over the other, then some nodes start to overlap while the remaining ones still exercise inﬂuence, until eventually the covers merge into one; see Fig. 4(d) for an example." sec. V, p. 398).
Townsend and Gama are analogous art because both are directed to hierarchical partitioning. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the hierarchical partitioning of the Townsend/Doshi combination with the hierarchical overlapping clustering of Gama.  The modification would have been obvious because one of ordinary skill in the art would be motivated to portray gaining insights into complex data structures depicted by overlapping partitions that cannot be explained by traditional methods where each data point can only belong to one partition, as suggested by Gama ("the so-called overlapping function is presented as a tool for gaining insights about the data" Abstract, p. 392; "Traditional clustering methods provide only one partitioning of the node set in such a way that each data point belongs to one and only one block of the partition. An important limitation of these traditional clustering methods is that the dataset may present a complex data structure at several resolutions or levels of similarity, and outputting only one partition may not be adequate in portraying the different grouping degrees that may be present." sec. I, p. 392).

Regarding Claim 2,
The Townsend/Doshi/Gama combination teaches the system of claim 1.  Townsend further teaches wherein the relevance estimator ("Methods that identify such regions are referred to in a number of ways in the literature (e.g., importance, salience, or relevance), but we refer to them collectively as importance methods." sec. V, p. 3462; "the importance of a feature is derived from the observed effect on classiﬁcation score for a given class when removing or modifying that feature. The principle is portable beyond the scope of neural networks, but complexity is increased as a backward pass must be computed for each and every feature examined." sec. V.C, p. 3464) and/or the feature generation network (Fig. 5, Quantisation, Feature Space Exploration, Extraction, p. 3459; "1) Quantization of Features: Quantization is the process of mapping features deﬁned in a continuous space to a discrete space, using one of a number of methods... The continuous form is often taken directly from the raw feature values, i.e., raw activation values output by corresponding neurons, but may also be taken as the feature importance as derived by the corresponding method. Some extraction methods can only quantize features from a speciﬁc network layer (input, hidden, or output). Where Table I does not specify a layer, the extraction method is layer-agnostic. Discrete forms include, but are not limited to, states in ﬁnite state machines (FSMs), antecedents of rules in logic programs, words in natural language models, or masks produced by thresholding (usually applied to images). In some cases, the choice of the discrete form may depend on the problem domain. Methods through which feature values/vectors are quantized mainly consist of means by which state vectors are assigned into distinct groups either by partitioning feature space and assigning a representative discrete vector to each partition" sec. III.B, p. 3459; quantizing of features teaches feature generation network) are formed from a black-box model ("pedagogical methods which treat the model as a black box and describe its behavior irrespective of its inner workings" sec. III.C, p. 3460; "CILP [21], [22] is an example of a recurrent neural-symbolic architecture that completes the neural-symbolic cycle (Fig. 5) by having both a knowledge extraction process and an embedding process. The CILP translates networks into logic programs, and therefore, here, the term relation corresponds to a rule that maps a set of antecedents to a consequent.... CILP’s knowledge extraction algorithm has elements of decompositional and pedagogical methods. A network is decomposed into multiple basic neural structures (BNSs), where a BNS is a hidden or output neuron plus all of its inputs and weights. Rules from each BNS are then extracted through a pedagogical approach, and the set of rules across all BNSs is then simpliﬁed into a single logic program. Input vectors are ordered such that rule extraction is only performed on vectors that are representative of others, thus reducing the search space and algorithmic complexity." sec. IV.C, p. 3462).

Regarding Claim 3,
The Townsend/Doshi/Gama combination teaches the system of claim 1. Townsend further teaches wherein the partitions are static (Fig. 5, Symbolic System; "Discrete forms include, but are not limited to, states in ﬁnite state machines (FSMs), antecedents of rules in logic programs, words in natural language models, or masks produced by thresholding (usually applied to images). In some cases, the choice of the discrete form may depend on the problem domain." sec. III.B; discrete forms teach static partitions) or dynamic (Fig. 5, Connectionist (Neural) System, Training; "A complete implementation of the cycle enables an expert to inject their knowledge into the cycle in the form of logic programs, graphs, or other symbolic systems, which are then embedded as trainable, connectionist neural networks according to principles similar to those shown in Fig. 6. The neural network can then learn from examples, learning concepts, and relationships that were previously unknown to the expert but can be communicated back to them in a symbolic, human-interpretable form via the extraction process." sec. III.A; training the connectionist system teams a dynamic system) and discovered through an external partitioning process (Fig. 5, Symbolic System, p. 3459) or through a connected neural network (Fig. 5, Connectionist (Neural) System, p. 3459).

Regarding Claim 4,
The Townsend/Doshi/Gama combination teaches the system of claim 1. Doshi further teaches wherein the feature generation network further comprises a transformation network ("Convolutional Neural Network (CNN)" [0088]) configured to apply one or more transforms to the input vector to identify the features ("In convolutional network terminology, the first function to the convolution can be referred to as the input, while the second function can be referred to as the convolution kernel. The output may be referred to as the feature map. For example, the input to a convolution layer can be a multidimensional array of data that defines the various color components of an input image. The convolution kernel can be a multidimensional array of parameters, where the parameters are adapted by the training process for the neural network." [0088]; "In the convolution stage 916 performs several convolutions in parallel to produce a set of linear activations. The convolution stage 916 can include an affine transformation, which is any transformation that can be specified as a linear transformation plus a translation. Affine transformations include rotations, translations, scaling, and combinations of these transformations" [0097]).
The motivation to combine Townsend and Doshi is the same as the motivation for claim 1.

Regarding Claim 5,
The Townsend/Doshi/Gama combination teaches the system of claim 1.  Doshi further teaches wherein the system comprises one partition ("The nodes in the CNN input layer are organized into a set of "filters" (feature detectors inspired by the receptive fields found in the retina), and the output of each set of filters is propagated to nodes in successive layers of the network. The computations for a CNN include apply-ing the convolution mathematical operation to each filter to produce the output of that filter. Convolution is a specialized kind of mathematical operation performed by two functions to produce a third function that is a modified version of one of the two original functions. In convolutional network terminology, the first function to the convolution can be referred to as the input, while the second function can be referred to as the convolution kernel. The output may be referred to as the feature map" [0088]; the filters teach the partitions), wherein the transformation function ("In the convolution stage 916 performs several convolutions in parallel to produce a set of linear activations. The convolution stage 916 can include an affine transformation, which is any transformation that can be specified as a linear transformation plus a translation. Affine transformations include rotations, translations, scaling, and combinations of these transformations" [0097]") and relevance estimator ("“attention” or “factor attention” refers to contribution by a factor in decisions, which may be utilized to reveal the anatomy of a decision by a network with regard to which factors in various layers of the network contributed more, and which factors contributed less, to various decisions. Thus, attention relates to the observation of the reference load received by relevant factors during the operation of a network model." [0017]) each comprise a deep neural network ("The exemplary neural networks described above can be used to perform deep learning. Deep learning is machine learning using deep neural networks. The deep neural networks used in deep learning are artificial neural networks composed of multiple hidden layers, as opposed to shallow neural networks that include only a single hidden layer." [0091]), and wherein the one partition models non-linear data ("In the detector stage 918, each linear activation is processed by a non-linear activation function. The non-linear activation function increases the nonlinear properties of the overall network without affecting the receptive fields of the convolution layer." [0098]).
The motivation to combine Townsend and Doshi is the same as the motivation for claim 1.

Regarding Claim 6,
The Townsend/Doshi/Gama combination teaches the system of claim 1. Doshi further teaches wherein the aggregation layer is further configured to weight the plurality of predictive results from each of the activated partitions ("The neurons compute a dot product between the weights of the neurons and the region in the local input to which the neurons are connected. The output from the convolution stage 916 defines a set of linear activations that are processed by successive stages of the convolutional layer 914" [0097]; "Several types of non-linear activation functions may be used. One particular type is the rectified linear unit (ReLU), which uses an activation function defined as f(x)=max(0, x), such that the activation is thresholded at zero" [0098]).
The motivation to combine Townsend and Doshi is the same as the motivation for claim 1.

Regarding Claim 7,
The Townsend/Doshi/Gama combination teaches the system of claim 1. Townsend further teaches wherein the partitions comprise one or more of linear partitions, Bayesian partitions, curvilinear partitions, continuous partitions, non-continuous segmented partitions, Bezier curve segments, graph-based partitions, hypergraph-based partitions, and simplicial complex partitions (Fig. 7, Continuous space quantized through partitioning, with FSMs representing the transitions between partitions.  (a) Equal partitioning  (b) Unequal partitioning, p. 3462; Fig. 7(a) and 7(b) shows partitions that are linear, continuous, and graph-based; Examiner notes that the claim language only requires one of the nine types of partitions listed).

Regarding Claim 8,
The Townsend/Doshi/Gama combination teaches the system of claim 1.  Townsend further teaches wherein the partitions comprise one or more static partitions (Fig. 5, Symbolic System; "Discrete forms include, but are not limited to, states in ﬁnite state machines (FSMs), antecedents of rules in logic programs, words in natural language models, or masks produced by thresholding (usually applied to images). In some cases, the choice of the discrete form may depend on the problem domain." sec. III.B; discrete forms teach static partitions) and one or more dynamic partitions (Fig. 5, Connectionist (Neural) System, Training; "A complete implementation of the cycle enables an expert to inject their knowledge into the cycle in the form of logic programs, graphs, or other symbolic systems, which are then embedded as trainable, connectionist neural networks according to principles similar to those shown in Fig. 6. The neural network can then learn from examples, learning concepts, and relationships that were previously unknown to the expert but can be communicated back to them in a symbolic, human-interpretable form via the extraction process." sec. III.A; training the connectionist system teams a dynamic system).

Regarding Claim 9,
The Townsend/Doshi/Gama combination teaches the system of claim 1.  Townsend further teaches wherein the rules comprise causal logic ("specifying interval-based conditions in the rules (e.g., IF 0.1 < x ≤ 0.5 THEN DO a ELSE DO b)." sec. III.B, p. 3459) and one or more of abductive logic, inductive logic, and deductive logic ("C. Extracting and Embedding With Connectionist Inductive Learning and Logic Programming (CILP)... The CILP translates networks into logic programs, and therefore, here, the term relation corresponds to a rule that maps a set of antecedents to a consequent." sec. IV.C, p. 3462; It is noted only one of these types of logic is required by the claims).

Regarding Claim 10,
The Townsend/Doshi/Gama combination teaches the system of claim 9.  Doshi further teaches wherein the output is in at least one of a computer-readable programming language and a machine-readable hardware circuit specification (Fig. 4, Step 450 teaches presenting an output that is interpretable and explainable by a machine program (explanation created by machine) and interpretable and explainable by human (explanation is viewed by human users)).
The motivation to combine Townsend and Doshi is the same as the motivation for claim 1.

Regarding Claim 11,
The Townsend/Doshi/Gama combination teaches the system of claim 1.  Doshi further teaches wherein one or more of the feature generation network, conditional network, and/or feature attribution layer are implemented on one or more of: a flexible architecture or field-programmable gate array, a static architecture or an application-specific integrated circuit, analogue/digital electronics, discrete components, photo-electronic components, spintronics and neuromorphic architectures, spiking neuromorphic architectures or quantum computing hardware ("Embodiments may be implemented using one or more memory chips, controllers, CPUs (Central Processing Unit), microchips or integrated circuits interconnected using a motherboard, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The term “logic” may include, by way of example, software or hardware and/or combinations of software and hardware." [0080]; this teaches field-programmable gate arrays, static architecture, discrete components; It is noted the claims only require one hardware implementation).
The motivation to combine Townsend and Doshi is the same as the motivation for claim 1.

Regarding Claim 14,
The Townsend/Doshi/Gama combination teaches the system of claim 1. Doshi further teaches wherein the output further comprises an explanation structure model (FIG. 4; "an Explainable Artificial Intelligence (XAI) Console in FIG. 4, to allow a user to receive the artificial intelligence explanation output that has been produced" [0048]), and wherein at least one of the output and the explanation comprises at least one of a human-readable explanation and a machine-readable explanation (Fig. 4, Step 450 teaches presenting an output that is interpretable and explainable by a machine program (explanation created by machine) and interpretable and explainable by human (explanation is viewed by human users)).
The motivation to combine Townsend and Doshi is the same as the motivation for claim 1.

Regarding Claim 16,
The Townsend/Doshi/Gama combination teaches the system of claim 1. Townsend further teaches wherein the rules are in one or more of: disjunctive normal form, conjunctive normal form, first-order logic assertions, non-Boolean logical systems, Type 1 or Type 2 fuzzy logic systems, modal logic, quantum logic, and probabilistic logic ("For example, given two rules, A ∧ B ∧ C → D and B ∧ C → D" p. 3460; the example formulas are given in disjunctive normal form; it is noted the claim only requires one type).

Regarding Claim 17,
The Townsend/Doshi/Gama combination teaches the system of claim 1. Doshi further teaches wherein the rules further comprise neuro-symbolic constraints (Fig. 5, Symbolic System, Logical Clause (Symbolic), " Fig. 5. Neural-symbolic cycle and a simpliﬁed example of how a logical clause can be represented by a neural network" p. 3459) comprising one or more of symbolic expressions, polynomial expressions, conditional and non-conditional probability distributions, joint probability distributions, state-space and phase-space transforms, integer/real/complex/quaternion/octonion transforms, Fourier transforms, Walsh functions, Haar and non-Haar wavelets, generalized L2 functions, fractal-based transforms, Hadamard transforms, Type 1 and Type 2 fuzzy logic and difference analysis ("Such a model forms the basis of many neural-symbolic architectures that implement some or all of the neural-symbolic cycle and collectively cover a wide range of logics [21], [22], [65]–[71]. A complete implementation of the cycle enables an expert to inject their knowledge into the cycle in the form of logic programs, graphs, or other symbolic systems, which are then embedded as trainable, connectionist neural networks according to principles similar to those shown in Fig. 6." sec. III.A, p. 3458; this teaches symbolic expressions; it is noted the claims only require one form).

Regarding Claim 19,
Townsend teaches a method for providing an interpretable neural network, comprising:
inputting a set of training data ("training data" sec. IV.B, p. 3462) to a black-box predictor model ("treat the model being explained as a black box" sec. III.C, p. 3461);
recording an output of the black-box predictor model corresponding to the set of training data ("1) State-Space Searching: Many RNN rule extraction algorithms use some form of self-driven search to navigate the state space, propagating output back to the input in order to explore transitions." sec. IV.B, p. 3461);
with a feature attribution layer, calculating one or more attribution values of coefficients associated with one or more features of the set of training data input to the black-box predictor model (Table 1, p. 3460, the first column displays many methods of knowledge extraction, which teaches the relevance estimators, while the fourth through sixth column provides information regarding the quantization of features, which provides the feature vectors/value/importance that teaches the coefficients; "These early methods used hierarchical clustering analysis to partition states into distinct groups [31]–[34]. These and similar ideas were later adapted to rule extraction methods." sec. IV.A, p. 3461; "The general idea of importance-based explanation models is to detect activity throughout the network that contributes to classiﬁcation decisions made by that network and then project that activity back onto the layer being inspected. Alternatively or additionally, a subset of features can be selected for inspection and have their activity projected onto the original input. Furthermore, some methods consider individual ﬁlters as corresponding to speciﬁc semantic concepts [6], [25]." sec. V, p. 3462; calculating importance teaches the attribution value, partition analysis and subset of features teaches the that the attribution value is for the partition; "Where Table I does not specify a layer, the extraction method is layer-agnostic." sec. III.B, p. 3459);
aggregating the output and forming one or more hierarchical partitions (Fig. 12 shows an example of a method for aggregating predictive results for interpreting the digits 0 through 9, p.3465; Fig. 12; "Graphs and Trees: A graph is a set of nodes representing concepts and edges representing the relationships between those concepts. A tree is a form of a hierarchical graph. In decision trees each node represents a choice, and these trees are easily represented as logic programs by treating each path from the root to a leaf node as a rule (Fig. 1)." sec. II.A, p. 3458) based on the aggregated output (Figs. 13-14; "a means to disentangle these concepts and represent their relationships to each other in a hierarchical graph in which different layers of the extracted hierarchy correspond to the different layers of the original network and each node corresponds to a concept that has been disentangled from the ﬁlters (Fig. 13)."), the hierarchical partitions comprising at least one pair of overlapping hierarchical partitions (Fig. 12, p. 3465; the first pair of hierarchical partitions divides the numbers from 0-9 into "1,2,5,6,7,8" and "0,2,3,4,8,9".  As shown, these two partitions overlap for numbers 2 and 8; the next set of partitions divide "1,2,5,6,7,8" into "1,6,7,8" and "2,5,6,7" which overlap for numbers 6 and 7) and at least one pair of non-overlapping hierarchical partitions (Fig. 12, p. 3465; the lower set of partitions divide each range into a single digit, such as dividing "1,7" to 1 and 7.  These teach non-overlapping hierarchical partitions), wherein the aggregation layer is configured to aggregate the output by splitting at least one hierarchical partition (Figs. 12 and 14, pp. 3465-3466; at each node or condition the tree splits each partition into two leaves; "A heuristic search is then performed to find a set of conditions on features of each sample at the current node to split the set of samples into two categories such that information gain is maximized. Two child nodes are then created, and the process repeats for each child node until a stopping condition is met." sec. VI.B, p. 3466);
constructing rules ("Methods through which feature values/vectors are quantized mainly consist of means by which state vectors are assigned into distinct groups either by partitioning feature space and assigning a representative discrete vector to each partition, or by specifying interval-based conditions in the rules (e.g., IF 0.1 < x ≤ 0.5 THEN DO a ELSE DO b)." sec. III.B, p. 3459) based on the local models ("These early methods used hierarchical clustering analysis to partition states into distinct groups [31]–[34]. These and similar ideas were later adapted to rule extraction methods." sec. IV.A, p. 3461); and
aggregating the rules to from a global interpretable model (Fig. 14 and Fig. 16 demonstrate an aggregation of the rules to form a global interpretable model.  Fig. 14 shows a tree with the rules as conditions.  Fig. 16 shows a RBM compared with its representation as an RNN).

Townsend does not explicitly teach an aggregation layer and applying at least one linear or non-linear transformation to the partitions to form one or more local models.
Doshi teaches an aggregation layer ("The linear activations can be processed by a detector stage 918. In the detector stage 918, each linear activation is processed by a non-linear activation function...Various types of pooling functions can be used during the pooling stage 920, including max pooling, average pooling, and 12-norm pooling." [0098]-[0099]; this teaches a layer that pools (aggregates) output from non-linear activation functions into the next layer's partitions) and
applying at least one linear or non-linear transformation to the partitions to form one or more local models ("The linear activations can be processed by a detector stage 918. In the detector stage 918, each linear activation is processed by a non-linear activation function...Various types of pooling functions can be used during the pooling stage 920, including max pooling, average pooling, and 12-norm pooling." [0098]-[0099]; this teaches a layer that pools (aggregates) output from non-linear activation functions into the next layer's partitions).
Townsend and Doshi are analogous art because both are directed to explainable artificial intelligence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural networks of Townsend and Doshi for an improved explainable neural network.  The modification would have been obvious because one of ordinary skill in the art would be motivated to better understand  how decisions are made in neural networks, as suggested by Townsend ("There are multiple motivations for so-called explainable AI... people want to know how their data are being used... Another motivation is to gain scientific insight... A third may be to discover and explain flaws in machine learning methods and architectures in order to improve accuracy, efficiency, or both" sec. 1, p. 3456) and Doshi ("developers wish to gain visibility into how decisions are reached in processing systems, including deep neural networks" [0003]).

	The Townsend/Doshi combination does not explicitly teach wherein the aggregation layer is configured to aggregate the plurality of predictive results by merging at least one pair of hierarchical partitions.
	Gama teaches wherein the aggregation layer is configured to aggregate the plurality of predictive results by merging at least one pair of hierarchical partitions (Fig. 4, when 2.5 < δ <3, a pair of hierarchical partitions are shown, when δ < 3 they are merged into one partition; "A nested collection of quasi-coverings reﬂects the different levels of similarity present in the network. More speciﬁcally, as δ grows, covers would typically start exercising inﬂuence one over the other, then some nodes start to overlap while the remaining ones still exercise inﬂuence, until eventually the covers merge into one; see Fig. 4(d) for an example." sec. V, p. 398).
Townsend and Gama are analogous art because both are directed to hierarchical partitioning. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the hierarchical partitioning of the Townsend/Doshi combination with the hierarchical overlapping clustering of Gama.  The modification would have been obvious because one of ordinary skill in the art would be motivated to portray gaining insights into complex data structures depicted by overlapping partitions that cannot be explained by traditional methods where each data point can only belong to one partition, as suggested by Gama ("the so-called overlapping function is presented as a tool for gaining insights about the data" Abstract, p. 392; "Traditional clustering methods provide only one partitioning of the node set in such a way that each data point belongs to one and only one block of the partition. An important limitation of these traditional clustering methods is that the dataset may present a complex data structure at several resolutions or levels of similarity, and outputting only one partition may not be adequate in portraying the different grouping degrees that may be present." sec. I, p. 392).

Regarding Claim 20,
The Townsend/Doshi/Gama combination teaches the method of claim 19. Townsend further teaches monitoring for one or more constraints and expressions, wherein the constraints and expressions comprise one or more conditions, events, triggers and actions in the form of one or more of symbolic rules or system of symbolic expressions, polynomial expressions, conditional and non-conditional probability distributions, joint probability distributions, state-space and phase-space transforms, integer/real/complex/quaternion/octonion transforms, Fourier transforms, Walsh functions, Haar and non-Haar wavelets, generalized L2 functions, fractal-based transforms, Hadamard transforms, Type 1 and Type 2 fuzzy logic and difference analysis (Fig. 14, Conditions 1...5; " TREPAN quantizes the input space using interval-based conditions. Each node contains a generator for sampling the input space based on constraints that reﬂect the path taken to reach that node (Fig. 14). Using the trained network, labels can then be assigned to each sample. A heuristic search is then performed to ﬁnd a set of conditions on features of each sample at the current node to split the set of samples into two categories such that information gain is maximized." sec. VI.B, p. 3466; this teaches conditions and events; Fig. 5, Symbolic System; "Discrete forms include, but are not limited to, states in ﬁnite state machines (FSMs), antecedents of rules in logic programs, words in natural language models, or masks produced by thresholding (usually applied to images). In some cases, the choice of the discrete form may depend on the problem domain." sec. III.B; this teaches symbolic expressions; It is noted that the claims require only one of these terms).

Regarding Claim 21,
The Townsend/Doshi/Gama combination teaches the method of claim 20. Townsend further teaches wherein the monitoring is implemented by a data structure that references one or more features ("the selected feature map (and no others) act as the initial importance signal" sec. V.A, p. 3463) and one or more associated taxonomies, ontologies, causal models, one or more knowledge graph networks, control charts, Nelson rules, Bode plots, or Nyquist plots ("recent developments in extracting relational information from neural networks ﬁt nicely into earlier frameworks and taxonoimages/mies" sec. IX, p. 3468; this teaches taxonomies; "Tree-based representations dominate slightly here, though it should be noted that trees are a form of a graph, and in general graphs can be represented in formal logic, and vice versa." sec. VIII.C, p. 3468; this teaches knowledge graph networks; it is noted the claims only require one of these terms).

Regarding Claim 24,
The Townsend/Doshi/Gama combination teaches the method of claim 19. Townsend further teaches converting global interpretable model to an explainable neural network ("Other work extracts natural-language sentences that pro-vide explanations of conclusions made in visual question–answering (VQA) [62], [63]. However, it is difﬁcult to group these into either the sampling or search categories because relations are generated after all the features have been selected, whereas, for the searching and sampling methods, the selec-tion or elimination of candidate features is performed as part of the related construction process itself. In one of the VQA cases, the features are strung together using an unspeciﬁed rule-based method [62]. In the other, the explanation is gen-erated by an LSTM conditioned on combined embeddings of attended regions in the image and the question [63]. While such so-called generation approaches are observed here in VQA examples only, there is no reason that they could not be applied to other problem types." sec. VI.C, p. 3466).

Regarding Claim 25,
The Townsend/Doshi/Gama combination teaches the method of claim 19. Townsend further teaches detecting bias in the input and/or the global interpretable model or individually detecting bias in one or more partitions ("Another issue is that the bias of a neuron can yield discontinuities in the gradients. For example, a negative bias in a rectiﬁed linear unit will yield an output of 0 for all inputs with magnitude less than that bias. DeepLIFT’s solution is to backpropagate the difference in activation with respect to some reference state. Each neuron is assigned a reference value equal to the activation it yields when forward propagating some default or class-neutral input particular to the problem domain. This reference input could be a tensor of zeros, a common background pattern (e.g., grass if classifying images of farm animals), or some other class-neutral pattern." sec. V.B, p. 3464).

Regarding Claim 26,
The Townsend/Doshi/Gama combination teaches the method of claim 19. Townsend further teaches extracting high-level concepts from the input and linking the concepts to a causal model (Fig. 5, Symbolic System; "Discrete forms include, but are not limited to, states in ﬁnite state machines (FSMs), antecedents of rules in logic programs, words in natural language models, or masks produced by thresholding (usually applied to images). In some cases, the choice of the discrete form may depend on the problem domain." sec. III.B; Fig. 6, Neural implementation of logic gates),
wherein the causal model is incorporated into the global interpretable model (Fig. 8, CILP network for representing a simple logic program, p. 3462), and wherein the output further comprises causal explanations in a what-if, what-if-not, and but-for forms ("Expressive Power: Expressive power refers to the breadth of information that a model is able to represent and is highly dependent on the choice of model used in the rule construction stage of extraction (Section III-B2). For example, a model that can express IF THEN ELSE clauses is more expressive than one which can only express IF THEN clauses. Fuzzy logic is more expressive than Boolean logic because it is able to express degrees of truth." sec. III.C, p. 3461; this teaches the limitations of the causal explanations).

Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Townsend et al. (Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective, hereinafter "Townsend") in view of Doshi et al. (US 2019/0370647, hereinafter "Doshi"), Gama et al. (Hierarchical Overlapping Clustering of Network Data Using Cut Metrics, hereinafter "Gama"), and Wu et al. (Edge Detection Based on Spiking Neural Network Model, hereinafter "Wu").

Regarding Claim 12,
The Townsend/Doshi/Gama combination teaches the system of claim 1. The Townsend/Doshi/Gama combination does not explicitly teach wherein one or more of the feature generation network, conditional network, and/or feature attribution layer are implemented as a spiking network comprising a plurality of spiking neurons.
	Wu teaches wherein one or more of the feature generation network, conditional network, and/or feature attribution layer are implemented as a spiking network comprising a plurality of spiking neurons (Fig. 1 Spiking Neural Network Model for Edge Detecting, p. 28; "The intermediate layer is composed of four types of neurons corresponding to four different receptive fields respectively." sec. 2, p. 27; "The spiking neuron models provide powerful functionality for integration of inputs and generation of spikes. Synapses are able to perform different computations, filters, adaptation and dynamic properties [17]" pp. 32-33; this teaches the feature generation network is implemented as a spiking network comprising spiking neurons).
Townsend and Wu are analogous art because both are directed towards model explanation using neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the spiking neural network of Wu.  The modification would have been obvious because one of ordinary skill in the art would be motivated to explain and simulate behaviors of the visual system of the brain, as suggested by Wu ("The visual cortex has a highly ordered structure [1-2], and it has attracted considerable attention from theoretical neurobiologists and computer scientists." sec. 1, p. 26; "These neural network models can be applied to explain some of the behaviours of the visual system in the human brain. The spike synchronization network in [5-6] can be applied to explain why the visual system can perform high-level visual processing tasks in a limited time of 100-150 ms." sec. 1, p. 27).

Claims 13, 22, 27, and 29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Townsend et al. (Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective, hereinafter "Townsend") in view of Doshi et al. (US 2019/0370647, hereinafter "Doshi"), Gama et al. (Hierarchical Overlapping Clustering of Network Data Using Cut Metrics, hereinafter "Gama"), and Gunning et al. (DARPA's Explainable Artificial Intelligence Program, hereinafter "Gunning").

Regarding Claim 13,
The Townsend/Doshi/Gama combination teaches the system of claim 1. The Townsend/Doshi/Gama combination does not explicitly teach an identify-assess-recommend-resolve framework configured to identify bias, wherein the identify-assess-recommend-resolve framework comprises a goal-plan-action system.
Gunning teaches an identify-assess-recommend-resolve framework (Figure 8, Initial Model of the Explanation Process and Explanation Effectiveness Measurement Categories, p. 53) configured to identify bias ("For fast and increased learning accuracy, it uses discriminative techniques, deriving algorithms that compose NNs and support vector machines with TPLMs, using interpretability as a bias to learn more interpretable models. These approaches are then extended to handle real-world situations." p.56), wherein the identify-assess-recommend-resolve framework comprises a goal-plan-action system (Figure 5. Evaluating Explanation Effectiveness, p. 50; the task teaches a goal, the recommendation of the XAI System teaches the plan, and the Decision or Action teaches the action); .
Townsend and Gunning are analogous art because both are directed towards explainable artificial intelligence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the explainable artificial intelligence of Gunning.  The modification would have been obvious because one of ordinary skill in the art would be motivated to develop more intelligent, autonomous and symbiotic systems so that users can understand, appropriately trust, and effectively manage artificial intelligent partners, as suggested by Gunning ("Explainable AI will be essential if users are to understand, appropriately trust, and effectively manage these artiﬁcially intelligent partners." p. 44).

Regarding Claim 22,
The Townsend/Doshi/Gama combination teaches the method of claim 19. The Townsend/Doshi/Gama combination does not explicitly teach inputting the output into an explainable auto encoder or explainable auto decoder, and generating explanations using a generative adversarial network.
	Gunning teaches inputting the output into an explainable auto encoder or explainable auto decoder ("DARE/X-GANS uses generative adversarial net-works (GANs), which learn to understand data by creating it, while learning representations with explan-atory power. GANs are made explainable by using interpretable decoders that map unsupervised clus-ters onto parts-based representations." p. 54), and generating explanations using a generative adversarial network ("The deep attention-based representations for explanationxplainable generative adversarial networks (DARE/X-GANS) system employs DNN architectures inspired by attentional models in visual neuroscience. It identiﬁes, retrieves, and presents evidence to a user as part of an explanation. The attentional mechanisms provide a user with a means for system probing and collaboration." p. 53).
Townsend and Gunning are analogous art because both are directed towards explainable artificial intelligence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the explainable artificial intelligence of Gunning.  The modification would have been obvious because one of ordinary skill in the art would be motivated to develop more intelligent, autonomous and symbiotic systems so that users can understand, appropriately trust, and effectively manage artificial intelligent partners, as suggested by Gunning ("Explainable AI will be essential if users are to understand, appropriately trust, and effectively manage these artiﬁcially intelligent partners." p. 44).

Regarding Claim 27,
The Townsend/Doshi/Gama combination teaches the method of claim 19. The Townsend/Doshi/Gama combination does not explicitly teach abstracting the explanation based on an abstraction transformation function.
Gunning teaches abstracting the explanation based on an abstraction transformation function ("COGLE’s multilayer architecture partitions its in-formation processing into sensemaking, cognitive modeling, and learning. The learning layer employs capacity constrained recurrent and hierarchical DNNs to produce abstractions and compositions over the states and actions of unmanned aerial systems to sup-port an understanding of generalized patterns. It combines learned abstractions to create hierarchical, transparent policies that match those learned by the system. The cognitive layer bridges human-usable symbolic representations to the abstractions, compositions, and generalized patterns." p. 52).
Townsend and Gunning are analogous art because both are directed towards explainable artificial intelligence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the explainable artificial intelligence of Gunning.  The modification would have been obvious because one of ordinary skill in the art would be motivated to develop more intelligent, autonomous and symbiotic systems so that users can understand, appropriately trust, and effectively manage artificial intelligent partners, as suggested by Gunning ("Explainable AI will be essential if users are to understand, appropriately trust, and effectively manage these artiﬁcially intelligent partners." p. 44).

Regarding Claim 29,
The Townsend/Doshi/Gama combination teaches the method of claim 19. The Townsend/Doshi/Gama combination does not explicitly teach forming a heatmap, feature attribution graph, or textual explanation based on the attribution values identified by the feature attribution layer.
	Gunning teaches forming a heatmap ("Interactive visualization over multiple news, using heat maps and topic modeling clusters to show predictive features" Table 2, p.56), feature attribution graph ("The performer outputs interpretable representations in a spatial, temporal, and causal parse graph (STC-PG) for three-dimensional scene perception (for analytics) and task planning (for au-tonomy). STC-PGs are compositional, probabilistic, attributed, interpretable, and grounded on DNN features from images and videos." p. 50), or textual explanation based on the attribution values identified by the feature attribution layer (Figure 5, Explanation, The system provides an explanation to the user that justifies its recommendation, decision, or action, p. 50; Figure 3, "This is a cat. It has fur, whiskers, claws. It has this feature", p. 48).
Townsend and Gunning are analogous art because both are directed towards explainable artificial intelligence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the explainable artificial intelligence of Gunning.  The modification would have been obvious because one of ordinary skill in the art would be motivated to develop more intelligent, autonomous and symbiotic systems so that users can understand, appropriately trust, and effectively manage artificial intelligent partners, as suggested by Gunning ("Explainable AI will be essential if users are to understand, appropriately trust, and effectively manage these artiﬁcially intelligent partners." p. 44).

Claims 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Townsend et al. (Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective, hereinafter "Townsend") in view of Doshi et al. (US 2019/0370647, hereinafter "Doshi"), Gama et al. (Hierarchical Overlapping Clustering of Network Data Using Cut Metrics, hereinafter "Gama"), and Chabanne et al. (Privacy-Preserving Classification on Deep Neural Network, hereinafter "Chabanne").

Regarding Claim 15,
The Townsend/Doshi/Gama combination teaches the system of claim 1. The Townsend/Doshi/Gama combination does not explicitly teach at least one data privacy subsystem, the data privacy configured to perform at least one of:
a differential privacy solution comprising introducing, to the input, prior to the input being supplied to the feature generation network for identification of the plurality of features from the input, data noise based on a noise level;
a secure multi-party computation solution comprising performing secure multi-party computation of at least one function of the system;
a federated learning solution comprising retaining, in a plurality of distributed locations, only a portion of data samples provided in the input, said plurality of distributed locations comprising at least a first distributed location having a first data sample and not a second data sample and a second distributed location having the second data sample and not the first data sample; and
 an encryption solution comprising introducing, to the input, prior to the input being supplied to the feature generation network for identification of the plurality of features from the input, homomorphic encryption, and wherein the feature generation network is configured to identify the plurality of features from a homomorphically-encrypted input.

Chabanne teaches at least one data privacy subsystem, the data privacy configured to perform at least one of:
a differential privacy solution comprising introducing, to the input, prior to the input being supplied to the feature generation network for identification of the plurality of features from the input, data noise based on a noise level ("Like many FHE schemes, the BGV encryption scheme consists in hiding the plain-text message with noise in order to create the ciphertext message. The decryption consists in removing the noise from the ciphertext message. The noise level increases with each homomorphic operation." p. 7);
a secure multi-party computation solution comprising performing secure multi-party computation of at least one function of the system ("All existing methods are based on secure multi-party computation (SMC) or homomorphic encryption (HE) or a combination of those methods." p. 2);
a federated learning solution comprising retaining, in a plurality of distributed locations, only a portion of data samples provided in the input, said plurality of distributed locations comprising at least a first distributed location having a first data sample and not a second data sample and a second distributed location having the second data sample and not the first data sample ("More precisely, to evaluate the scalar product between x and y both in RN and owned respectively by the client and the server, the client encrypts each component of the vector x with Paillier’s homomorphic encryption [37] and sends them to the server, then the server evaluates homomorphically the encryption of the scalar product and sends back the encrypted result to the client who finally decrypts it in the plaintext domain." p. 2); and
 an encryption solution comprising introducing, to the input, prior to the input being supplied to the feature generation network for identification of the plurality of features from the input, homomorphic encryption, and wherein the feature generation network is configured to identify the plurality of features from a homomorphically-encrypted input ("a privacy preserving neural network classification algorithm based on secure multi-party computation and homomorphic encryption. Their neural networks are composed of a succession of scalar products which are secured on the basis of homomorphic encryption and activation functions (threshold or sigmoid) secured with protocols based on secure multi-party computation." p. 2).
Townsend and Chabanne are analogous art because both are directed towards explainable artificial intelligence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the encryption of Chabanne.  The modification would have been obvious because one of ordinary skill in the art would be motivated to have good accuracy while protecting against privacy risks, as suggested by Chabanne (Abstract, p. 1).

Claims 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Townsend et al. (Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective, hereinafter "Townsend") in view of Doshi et al. (US 2019/0370647, hereinafter "Doshi"), Gama et al. (Hierarchical Overlapping Clustering of Network Data Using Cut Metrics, hereinafter "Gama"), and Cannici et al. (Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras, hereinafter "Cannici").

Regarding Claim 18,
The Townsend/Doshi/Gama combination teaches the system of claim 17. The Townsend/Doshi/Gama combination does not explicitly teach wherein the neuro-symbolic constraints are linked with past data comprising at least one of: a previous historic rate of activations and a set of dynamically-changing Fast Weights.
	Cannici teaches wherein the neuro-symbolic constraints are linked with past data comprising at least one of: a previous historic rate of activations and a set of dynamically-changing Fast Weights ("The basic component of the proposed ar-chitectures is a procedure able to accumulate events. Sparse events generated by the neuromorphic camera are integrated into a leaky surface, a structure that takes inspiration from the functioning of Spiking Neural Networks (SNNs) to maintain memory of past events... Notice that the effects of λ and ∆incr are related: ∆incr determines how much infor-mation is contained in each single event whereas λ deﬁnes the decay rate of activations." sec. 2, p. 2).
Townsend and Cannici are analogous art because both are directed towards improving neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the spiking neural network of Cannici.  The modification would have been obvious because one of ordinary skill in the art would be motivated perceive changes at high frequency with low power consumption, as suggested by Cannici (Abstract, p. 1).

Claims 23 and 30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Townsend et al. (Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective, hereinafter "Townsend") in view of Doshi et al. (US 2019/0370647, hereinafter "Doshi"), Gama et al. (Hierarchical Overlapping Clustering of Network Data Using Cut Metrics, hereinafter "Gama"), and Kraus et al. (Decision Support From ﬁnancial Disclosures With Deep Neural Networks And Transfer Learning, hereinafter "Kraus").

Regarding Claim 23,
The Townsend/Doshi/Gama combination teaches the method of claim 19. The Townsend/Doshi/Gama combination does not explicitly teach training global interpretable model using one or more of transfer learning, Genetic Algorithms, Monte Carlo Simulation Techniques, and Bayesian networks.
Kraus teaches training global interpretable model using one or more of transfer learning, Genetic Algorithms, Monte Carlo Simulation Techniques, and Bayesian networks (Fig. 2.  Research framework evaluating the performance gains from deep learning architectures and transfer learning; this teaches transfer learning; it is noted that the claims only require one of the list).
Townsend and Kraus are analogous art because both are directed towards explaining deep learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the recurrent neural network of Kraus.  The modification would have been obvious because one of ordinary skill in the art would be motivated to use a model that has high accuracy, as suggested by Kraus (Abstract, p. 38).

Regarding Claim 30,
The Townsend/Doshi/Gama combination teaches the system of claim 1. The Townsend/Doshi/Gama combination does not explicitly teach at least one of. an explainable transformer-transducer, a long short-term memory unit, and a gated recurrent unit, configured to provide a recursive output and a recursive explanation.
Kraus teaches at least one of. an explainable transformer-transducer, a long short-term memory unit, and a gated recurrent unit ("long short-term memory (LSTM) model" p. 39; this teaches the long short-term memory unit; it is noted that the claims only require one type), configured to provide a recursive output and a recursive explanation ("The recursive autoen-coder in [12] splits the announcements into three classes (up, down or steady) according to the abnormal return and then discards the steady samples a priori." sec. 5.1, p. 46; "we contribute to explanatory insights as follows: we draw upon the ﬁnance-speciﬁc dictionary from Loughran-McDonald that comprises terms labeled as either positive or negative, where the underlying categorization stems from subjective human ratings. We then treat each word as a single document and insert them as input into our deep neural network. The resulting predictions allow us to infer whether a word links to a positive or negative market reaction. In other words, the prediction scores the polarity of the words and speciﬁes how markets perceive them." sec. 4.5, p. 45).
Townsend and Kraus are analogous art because both are directed towards explaining deep learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the recurrent neural network of Kraus.  The modification would have been obvious because one of ordinary skill in the art would be motivated to use a model that has high accuracy, as suggested by Kraus (Abstract, p. 38).

Claim 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Townsend et al. (Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective, hereinafter "Townsend") in view of Doshi et al. (US 2019/0370647, hereinafter "Doshi"), Gama et al. (Hierarchical Overlapping Clustering of Network Data Using Cut Metrics, hereinafter "Gama"), and Voogd et al. (Using Relational Concept Networks for Explainable Decision Support, hereinafter "Voogd").

Regarding Claim 28,
The Townsend/Doshi/Gama combination teaches the method of claim 19. The Townsend/Doshi/Gama combination does not explicitly teach injecting human knowledge into the rules in a universal representation format wherein the human knowledge is fixed and cannot be updated.
	Voogd teaches injecting human knowledge into the rules in a universal representation format wherein the human knowledge is fixed and cannot be updated ("To obtain situational understanding, information from different sources needs to be integrated. For example, information from sensors ( such as cameras, thermal sen­sors, but also social media and human observations) and expert knowledge e.g. obtained through experience, needs to be combined to build an understanding of the situation." p. 79; " The envisioned support tool integrates multiple sources of information, some of which consist of data streams, some are symbolic in nature such as the knowledge of human experts. We combine sub-symbolic information and approaches (sen­sor data, machine learning) with symbolic knowledge and technologies ( expert knowledge, semantic networks) to work towards a decision support system that produces better results with fewer data required for learning compared to tradi­tional neural network approaches, and supports explainability." 81).
Townsend and Voogd are analogous art because both are directed towards explainable machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the explainable neural network of the Townsend/Doshi/Gama combination with the explainable decision support of Kraus.  The modification would have been obvious because one of ordinary skill in the art would be motivated to use assist users in making the right decisions from using multiple sources of information, as suggested by Voogd (Abstract, p. 78).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477. The examiner can normally be reached M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES C KUO/Examiner, Art Unit 2126
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126