DETAILED ACTION
1.	This office action is in response to the Application No. 16052936 filed on 08/02/2018. Claims 5 and 14 has been cancelled, claims 19-20 withdrawn, and 1-4, 6-13, 15-18, 21-24 are presented for examination and are currently pending. Applicant’s arguments have been carefully and respectfully considered.

Allowable Subject Matter
2.	Claims 2, 3, 11, 12 and 24 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and overcome the 35 USC 112(b) rejection.

Response to Arguments
3.	Applicant’s arguments are moot in view of the new grounds of rejection.  The examiner is withdrawing the rejections in the previous office action 10/18/2021 because the applicant amendments necessitated the new grounds of rejection presented in this office action. Accordingly, this action is made final.

Election/Restrictions
4.	Applicant’s election without traverse of claims 1-18 in the reply filed on 2/28/2022 is acknowledged.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1, 4-7,10 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al ("Deep neural networks on graph signals for brain imaging analysis." 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017.) in view of Price (“Fusion of evolution constructed features for computer vision.” A Dissertation Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical and Computer Engineering in the Department of Electrical and Computer Engineering, May 2018) and further in view of Du et al. ("Topology adaptive graph convolutional networks." arXiv preprint arXiv:1710.10370 (2017).

	Regarding claim 1, Guo teaches an apparatus for generating and training (we use TensorFlow to implement our networks, (are configured to run on either CPUs or GPUs) (pg. 3297, 3.2. Implementation), the entire network is trained end-to-end in an unsupervised way to learn the low-dimensional representations for the input brain imaging data, pg. 3296, left col, first para.) 
	a digital signal processor (DSP) to evaluate graph data, (Our proposed networks use ConvNets on graph to compute rich features for the input graph signals, pg. 3296, right col, 2.2. Model Structure). 
	the apparatus comprising at least one processor and at least on memory including computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus (We use TensorFlow to implement our networks, (pg. 3297, 3.2. Implementation), we use TensorFlow code (are configured to run on either CPUs or GPUs) to implement our networks, pg. 3297, left col, first para.) to: 
	receive, by a processor, (the k-th network layer takes as input a graph signal, pg.3297, second to the last para.)
	 known graph data that includes irregular grid graph data; (MEG signal datasets, pg. 3297, 3. Experiment)
	split, by the processor, (into training data and cross-validation data, pg. 3298, left col, 3.2. Implementation)
	 the known graph data into a set of training graph data and a set of cross-validation graph data; (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	construct, by the processor, a constructed set of filters using the training graph data (The structure of the ConvNets on graph is shown in Figure 3, which integrates the graph information into the neural network. (pg. 3296, 2.2.1. ConvNets on graph) Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	 wherein (i) each constructed filter in the set of constructed filters is generated based at least in part on a corresponding initial Laplacian operator of one or more initial Laplacian operators and a corresponding filter type of a set of filter types, (Graph Laplacian, or combinatorial Laplacian is defined as L = D − W, where D is the diagonal degree matrix with diagonal element Dii = ∑Nj=1 Wij. Since L is an symmetric matrix, it can be eigen-decomposed as L = UΛUT and has a complete set of orthonormal eigenvectors, denoted as ul, for l = 0, 1, ..., N − 1, and sorted real associated eigenvalues λl, known as the frequencies. In other words, we have Lul = λlul for l = 0, 1, ..., N − 1 and 0 ≤ λ0 < λ1 < ... < λN−1. Normalized graph Laplacian, defined as L˜ = I − D –1/2 LD−1/2, is also widely used due to the property that all the eigenvalues of it lie in the interval [0, 2]. {ul} acts like the Fourier basis in analogy to the eigen-functions of Laplace operator in classical signal processing, pg. 3296, right col, first para.) and 
	(ii) the set of filter types comprise a K-order Chebyshev filter type, (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	 a first order renormalized filter type, (We also use a renormalization technique proposed, which converts IN + D – 1/2 AD− 1/2 (A is the adjacency matrix) into ˆD−1/2 ˆAD−1/2, where A = A + IN and D is the corresponding degree matrix of ˆA. The reason for renormalization is that the eigenvalues of IN +D –1/2 AD−1/2 are in the interval [0, 2], which makes training of this neural network unstable due to gradient explosion, pg. 3296, right col, last para.) and 
	formulate, by the processor, (a formulation of ConvNets on graph, pg. 3295, right col second to the last para.)
	 an objective function for training; (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function)
	generate, by the processor, an optimized DSP using the objective function, (Adam (as optimization solver for a neural network algorithm) is adopted to minimize the MSE (mean square error) with learning rate 0.001 (pg. 3297, 3.2. Implementation)
	 the constructed set of filters, (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized., pg. 3296, right col, third para.)
	the training graph data and the cross-validation graph data, (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	 wherein the optimized DSP includes a set of hidden layers, (fully connected layers, pg. 3298, left col, 3.2. Implementation)
	each hidden layer (kth and (k+1)th hidden layers respectively, Fig. 3) comprises 
	(i) is selected from constructed set of filters, (The structure of the ConvNets on graph is shown in Figure 3, which integrates the graph information into the neural network. (pg. 3296, 2.2.1. ConvNets on graph) Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.) and 
	(ii) comprises at least one filter associated with the K- order Chebyshev filter type, (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	at least one filter associated with the first order renormalized filter type, (We also use a renormalization technique proposed, which converts IN + D – 1/2 AD− 1/2 (A is the adjacency matrix) into ˆD−1/2 ˆAD−1/2, where A = A + IN and D is the corresponding degree matrix of ˆA. The reason for renormalization is that the eigenvalues of IN +D –1/2 AD−1/2 are in the interval [0, 2], which makes training of this neural network unstable due to gradient explosion, pg. 3296, right col, last para.) and LEGAL02/41294265v1US Pat. Appl. No. 16/052,936Response dated November 30, 2021
	save, in a memory, a set of parameters defining the optimized DSP (we use TensorFlow to implement our networks, pg. 3297, 3.2. Implementation. Examiner notes: TensorFlow is an open-source Software library for machine learning operation and are configured to run on either CPUs or GPUs for processing and storing machine learning parameters))
	Guo does not explicitly teach a set of heterogeneous kernels (HKs), each HK is associated with a corresponding set of filters, a K-order topology adaptive filter type, at least one filter associated with the K-order topology adaptive filter type, 2 of 12 each HK is generated based at least in part on weighted combination of the corresponding set of filters associated with the HK;
           Price teaches a set of heterogeneous kernels (HKs) (The final experiment reported herein is with respect to heterogeneous kernels. For this experiment, each iECO descriptor’s top 5 individuals and the pre-screener score are concatenated to form a single feature vector, of which kernels were applied to and weights derived. Specifically, we used a single RBF, two polynomials (of degree two and three), and the dot product kernels, of which no normalization was implemented. Results are presented in Table 5.4. All methods performance dropped when using heterogeneous kernels, however our kernel matrix-based method degraded the most gracefully, with performance dropping only approximately 2-4% for all folds whereas MKLGL, ... As with the optimization function case, in theory, we should be able to find weights that would address the heterogeneous kernels through the weights “feature shrinkage” and “importance weight” for each kernel. We should be able to get the same results that we obtained in our previous experiment, as our heterogeneous case involved a RBF and the system could have defaulted back to simply learning that solution. pg. 78-79)
	each HK is associated with a corresponding set of filters (This pre-screener is an ensemble of trainable size-contrast filters (local dual sliding window detectors). Each size-contrast filter has seven parameters; the inner window height
and width, the pad height and width (which determine the size of the outer window), a Bhattacharyya distance threshold, a squared difference between the mean values threshold, and three state parameters, referred to as DType (which determines whether the detector will trigger only on bright on dark regions, dark on bright regions, or both), pg. 38; Specifically, iECO is comprised of three
steps. Step one is the learning of a composition of filters in the context of different feature descriptors. Step two is the fusion of these learned descriptors. Step three is classification. In comparison, a CNN is typically two parts, one part filter and feature learning and one part classification (e.g., MLP), pg. 82-83, last para.) that: 
	each HK is generated based at least in part on weighted combination of the corresponding set of filters associated with the HK; (we assume that the kernel K is
composed of a weighted combination of pre-computed base kernel matrices by

    PNG
    media_image1.png
    93
    209
    media_image1.png
    Greyscale

where there are m kernels and _k is the weight applied to the kth kernel. The above operation is valid because we can add and multiply (by scalars greater than 0) Gram matrices and are guaranteed to produce a Gram matrix. Specifically, the above falls into the category of linear convex sum (LCS) as 0 ≤σk ≤ 1 and
∑mk=1 σk = 1, pg. 104)
	          It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Guo to incorporate the teachings of Price for the benefit of learning a composition of filters in the context of different feature descriptors (Price, 82)
	          Du teaches a K-order topology adaptive filter type: (The above equation shows that each neuron in the graph convolutional layer is connected only to a local region (local vertices and edges) in the vertex domain of the input data volume, which is adaptive to the graph topology. The strength of correlation is explicitly utilized in ω(pkj,i). We refer to this method as topology adaptive graph convolutional network (TAGCN) In Fig. 2, we show TAGCN with an example of 2-size filter sliding from vertex 1 (figure on the lefthand-side) to vertex 2 (figure on the right-hand-side). The filter is first placed at vertex 1. Since paths (1, 2, 1) (5, 4, 1) and so on (paths with red glow) are all 2-length paths to vertex 1, they are covered by this 2-size filter. Since paths (2, 3) and (7, 3) are not on any 2-length path to vertex 1, they are not covered by this filter. Further, when this 2-size filter moves to vertex 2, paths (1, 5), (4, 5) and (6, 5) are no longer covered, but paths (2, 3) and (7, 3) are first time covered and contribute to the convolution with output at vertex 2. Further, y (`) f (i) is the weighted sum of the input features of vertices in x (`) c that are within k-paths away to vertex i, for k = 0, 1, . . . K, with weights given by the products of components of A k and g (`) c,f,k. Thus the output is the weighted sum of the feature map given by the filtered results from 1-size up to K-size filters. It is evident that the vertex convolution on the graph using Kth order polynomials is K-paths localized. Moreover, different vertices on the graph share g (`) c,f,k. The above local convolution and weight sharing properties of the convolution (5) on a graph are very similar to those in traditional CNN, pg. 6, second para.)
	   at least one filter associated with the K-order topology adaptive filter type, 2 of 12  (The above equation shows that each neuron in the graph convolutional layer is connected only to a local region (local vertices and edges) in the vertex domain of the input data volume, which is adaptive to the graph topology. The strength of correlation is explicitly utilized in ω(pkj,i). We refer to this method as topology adaptive graph convolutional network (TAGCN) In Fig. 2, we show TAGCN with an example of 2-size filter sliding from vertex 1 (figure on the lefthand-side) to vertex 2 (figure on the right-hand-side). The filter is first placed at vertex 1. Since paths (1, 2, 1) (5, 4, 1) and so on (paths with red glow) are all 2-length paths to vertex 1, they are covered by this 2-size filter. Since paths (2, 3) and (7, 3) are not on any 2-length path to vertex 1, they are not covered by this filter. Further, when this 2-size filter moves to vertex 2, paths (1, 5), (4, 5) and (6, 5) are no longer covered, but paths (2, 3) and (7, 3) are first time covered and contribute to the convolution with output at vertex 2. Further, y (`) f (i) is the weighted sum of the input features of vertices in x (`) c that are within k-paths away to vertex i, for k = 0, 1, . . . K, with weights given by the products of components of A k and g (`) c,f,k. Thus the output is the weighted sum of the feature map given by the filtered results from 1-size up to K-size filters. It is evident that the vertex convolution on the graph using Kth order polynomials is K-paths localized. Moreover, different vertices on the graph share g (`) c,f,k. The above local convolution and weight sharing properties of the convolution (5) on a graph are very similar to those in traditional CNN, pg. 6, second para.)
	          It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Guo to incorporate the teachings of Du for the benefit of a topology adaptive graph convolutional network (TAGCN), which uses learnable filters to perform convolutions on graphs and the topologies of the filters are adaptive to the topology of the graph when they scan the graph to perform convolution which exhibits better performance and is computationally simpler than other recent methods. (Du, abstract)

	Regarding claim 4, Modified Guo teaches the apparatus of claim 1, Guo teaches wherein each of the one or more initial Laplacian operators is a normalized Laplacian operator or a random walk Laplacian operator. (Since we use normalized 
Laplacian, pg. 3296, right col, last para.)

	Regarding claim 6, Modified Guo teaches the apparatus of claim 1, Guo teaches wherein the optimized DSP further comprises a discriminant layer (our approach can extract more discriminative representations (from discriminant layer), abstract)

	Regarding claim 7, Modified Guo teaches the apparatus of claim 1, Guo teaches wherein the objective function is a loss function or a reward function. (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function) 

	Regarding claim 10, Guo teaches a method for generating and training (the proposed method, brain imaging data is modelled as signals residing on connectivity graphs estimated with causality analysis, (pg. 3295, right col, last para.) the entire network is trained end-to-end in an unsupervised way to learn the low-dimensional representations for the input brain imaging data, pg. 3296, left col, first para.)
	a digital signal processor (DSP) to evaluate graph data (Our proposed networks use ConvNets on graph to compute rich features for the input graph signals, pg. 3296, right col, 2.2. Model Structure). 
	the method comprising: receiving, by a processor, (the k-th network layer takes as input a graph signal, pg.3297, second to the last para.)
	known graph data that includes irregular grid graph data; (MEG signal datasets, pg. 3297, 3. Experiment)
	splitting, by processor (into training data and cross-validation data, pg. 3298, left col, 3.2. Implementation)
	the known graph data into a set of training graph data and a set of cross-validation graph data (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	constructing, by the processor, a constructed set of filters using the training graph data; (The structure of the ConvNets on graph is shown in Figure 3, which integrates the graph information into the neural network. (pg. 3296, 2.2.1. ConvNets on graph) Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	wherein (i) each constructed filter in the set of constructed filters is generated based at least in part on a corresponding initial Laplacian operator of one or more initial Laplacian operators and a corresponding filter type of a set of filter types, (Graph Laplacian, or combinatorial Laplacian is defined as L = D − W, where D is the diagonal degree matrix with diagonal element Dii = ∑Nj=1 Wij. Since L is an symmetric matrix, it can be eigen-decomposed as L = UΛUT and has a complete set of orthonormal eigenvectors, denoted as ul, for l = 0, 1, ..., N − 1, and sorted real associated eigenvalues λl, known as the frequencies. In other words, we have Lul = λlul for l = 0, 1, ..., N − 1 and 0 ≤ λ0 < λ1 < ... < λN−1. Normalized graph Laplacian, defined as L˜ = I − D –1/2 LD−1/2, is also widely used due to the property that all the eigenvalues of it lie in the interval [0, 2]. {ul} acts like the Fourier basis in analogy to the eigen-functions of Laplace operator in classical signal processing, pg. 3296, right col, first para.) and 
	(ii) the set of filter types comprise a K-order Chebyshev filter type, (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	a first order renormalized filter type, (We also use a renormalization technique proposed, which converts IN + D – 1/2 AD− 1/2 (A is the adjacency matrix) into ˆD−1/2 ˆAD−1/2, where A = A + IN and D is the corresponding degree matrix of ˆA. The reason for renormalization is that the eigenvalues of IN +D –1/2 AD−1/2 are in the interval [0, 2], which makes training of this neural network unstable due to gradient explosion, pg. 3296, right col, last para.) and
	formulating, by the processor, (a formulation of ConvNets on graph, pg. 3295, right col second to the last para.)
	an objective function for training; (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function)
	generating, by the processor, an optimized DSP using the objective function, (Adam (as optimization solver for a neural network algorithm) is adopted to minimize the MSE (mean square error) with learning rate 0.001 (pg. 3297, 3.2. Implementation)
	the constructed set of filters, (Chebyshev polynomial generated recursively. k
is the order of the polynomial, which means that the filter is k-hop localized., pg. 3296, right col, third para.)
	the training graph data and the cross-validation graph data, (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	wherein the optimized DSP includes a set of hidden layers, (fully connected layers, pg. 3298, left col, 3.2. Implementation)
	each hidden layer (kth and (k+1)th hidden layers respectively, Fig. 3) comprises
	of the set of hidden layers (fully connected layers, pg. 3297, right col, first para.) 
	(ii) comprises at least one filter associated with the K-order Chebyshev filter type, (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	at least one filter associated with the first order renormalized filter type, (We also use a renormalization technique proposed, which converts IN + D – 1/2 AD− 1/2 (A is the adjacency matrix) into ˆD−1/2 ˆAD−1/2, where A = A + IN and D is the corresponding degree matrix of ˆA. The reason for renormalization is that the eigenvalues of IN +D –1/2 AD−1/2 are in the interval [0, 2], which makes training of this neural network unstable due to gradient explosion, pg. 3296, right col, last para.)
	saving, in a memory, a set of parameters defining the optimized DSP (we use TensorFlow to implement our networks, pg. 3297, 3.2. Implementation. Examiner notes: TensorFlow is an open-source Software library for machine learning operations and are configured to run on either CPUs or GPUs for processing and storing machine learning parameters))
	Guo does not explicitly teach a set of heterogeneous kernels (HKs), each HK is associated with a corresponding set of filters that (i) is selected from the constructed set of filters, and each HK is generated based at least in part on weighted combination of the corresponding set of filters associated with the HK ; at least one filter associated with the K-order topology adaptive filter type,  and saving, in a memory a set of parameters defining the optimized DSP.  
        Price teaches a set of heterogeneous kernels (HKs) (The final experiment reported herein is with respect to heterogeneous kernels. For this experiment, each iECO descriptor’s top 5 individuals and the pre-screener score are concatenated to form a single feature vector, of which kernels were applied to and weights derived. Specifically, we used a single RBF, two polynomials (of degree two and three), and the dot product kernels, of which no normalization was implemented. Results are presented in Table 5.4. All methods performance dropped when using heterogeneous kernels, however our kernel matrix-based method degraded the most gracefully, with performance dropping only approximately 2-4% for all folds whereas MKLGL, ... As with the optimization function case, in theory, we should be able to find weights that would address the heterogeneous kernels through the weights “feature shrinkage” and “importance weight” for each kernel. We should be able to get the same results that we obtained in our previous experiment, as our heterogeneous case involved a RBF and the system could have defaulted back to simply learning that solution. pg. 78-79)
	each HK is associated with a corresponding set of filters that (i) is selected from the constructed set of filters, (This pre-screener is an ensemble of trainable size-contrast filters (local dual sliding window detectors). Each size-contrast filter has seven parameters; the inner window height and width, the pad height and width (which determine the size of the outer window), a Bhattacharyya distance threshold, a squared difference between the mean values threshold, and three state parameters, referred to as DType (which determines whether the detector will trigger only on bright on dark regions, dark on bright regions, or both), pg. 38; Specifically, iECO is comprised of three steps. Step one is the learning of a composition of filters in the context of different feature descriptors. Step two is the fusion of these learned descriptors. Step three is classification. In comparison, a CNN is typically two parts, one part filter and feature learning and one part classification (e.g., MLP), pg. 82-83, last para.)and 
	each HK is generated based at least in part on weighted combination of the corresponding set of filters associated with the HK; (we assume that the kernel K is
composed of a weighted combination of pre-computed base kernel matrices by

    PNG
    media_image1.png
    93
    209
    media_image1.png
    Greyscale

where there are m kernels and _k is the weight applied to the kth kernel. The above operation is valid because we can add and multiply (by scalars greater than 0) Gram matrices and are guaranteed to produce a Gram matrix. Specifically, the above falls into the category of linear convex sum (LCS) as 0 ≤σk ≤ 1 and
∑mk=1 σk = 1, pg. 104)
       	          It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Guo to incorporate the teachings of Price for the benefit of learning a composition of filters in the context of different feature descriptors (Price, 82)
           	    Du teaches a K-order topology adaptive filter type: (The above equation shows that each neuron in the graph convolutional layer is connected only to a local region (local vertices and edges) in the vertex domain of the input data volume, which is adaptive to the graph topology. The strength of correlation is explicitly utilized in ω(pkj,i). We refer to this method as topology adaptive graph convolutional network (TAGCN) In Fig. 2, we show TAGCN with an example of 2-size filter sliding from vertex 1 (figure on the lefthand-side) to vertex 2 (figure on the right-hand-side). The filter is first placed at vertex 1. Since paths (1, 2, 1) (5, 4, 1) and so on (paths with red glow) are all 2-length paths to vertex 1, they are covered by this 2-size filter. Since paths (2, 3) and (7, 3) are not on any 2-length path to vertex 1, they are not covered by this filter. Further, when this 2-size filter moves to vertex 2, paths (1, 5), (4, 5) and (6, 5) are no longer covered, but paths (2, 3) and (7, 3) are first time covered and contribute to the convolution with output at vertex 2. Further, y (`) f (i) is the weighted sum of the input features of vertices in x (`) c that are within k-paths away to vertex i, for k = 0, 1, . . . K, with weights given by the products of components of A k and g (`) c,f,k. Thus the output is the weighted sum of the feature map given by the filtered results from 1-size up to K-size filters. It is evident that the vertex convolution on the graph using Kth order polynomials is K-paths localized. Moreover, different vertices on the graph share g (`) c,f,k. The above local convolution and weight sharing properties of the convolution (5) on a graph are very similar to those in traditional CNN, pg. 6, second para.)
	             at least one filter associated with the K-order topology adaptive filter type, (The above equation shows that each neuron in the graph convolutional layer is connected only to a local region (local vertices and edges) in the vertex domain of the input data volume, which is adaptive to the graph topology. The strength of correlation is explicitly utilized in ω(pkj,i). We refer to this method as topology adaptive graph convolutional network (TAGCN) In Fig. 2, we show TAGCN with an example of 2-size filter sliding from vertex 1 (figure on the lefthand-side) to vertex 2 (figure on the right-hand-side). The filter is first placed at vertex 1. Since paths (1, 2, 1) (5, 4, 1) and so on (paths with red glow) are all 2-length paths to vertex 1, they are covered by this 2-size filter. Since paths (2, 3) and (7, 3) are not on any 2-length path to vertex 1, they are not covered by this filter. Further, when this 2-size filter moves to vertex 2, paths (1, 5), (4, 5) and (6, 5) are no longer covered, but paths (2, 3) and (7, 3) are first time covered and contribute to the convolution with output at vertex 2. Further, y (`) f (i) is the weighted sum of the input features of vertices in x (`) c that are within k-paths away to vertex i, for k = 0, 1, . . . K, with weights given by the products of components of A k and g (`) c,f,k. Thus the output is the weighted sum of the feature map given by the filtered results from 1-size up to K-size filters. It is evident that the vertex convolution on the graph using Kth order polynomials is K-paths localized. Moreover, different vertices on the graph share g (`) c,f,k. The above local convolution and weight sharing properties of the convolution (5) on a graph are very similar to those in traditional CNN , pg. 6, second para.)
	           It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Guo to incorporate the teachings of Du for the benefit of a topology adaptive graph convolutional network (TAGCN), which uses learnable filters to perform convolutions on graphs and the topologies of the filters are adaptive to the topology of the graph when they scan the graph to perform convolution which exhibits better performance and is computationally simpler than other recent methods. (Du, abstract)

	Regarding claim 13, Modified Guo teaches the method of claim 10, Guo teaches wherein each of the one or more initial Laplacian operators is a normalized Laplacian operator or a random walk Laplacian operator. (since we use normalized 
 Laplacian, pg. 3296, right col, last para.)

	Regarding claim 15, Modified Guo teaches the method of claim 10, Guo teaches wherein the optimized DSP further comprises a discriminant layer (our approach can extract more discriminative representations (from discriminant layer), abstract)

	Regarding claim 16, Modified Guo teaches the method of claim 10, Guo teaches wherein the objective function is a loss function or a reward function. (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function) 

	Regarding claim 23, Guo teaches a computer program product for generating and training (we use TensorFlow to implement our networks, (are configured to run on either CPUs or GPUs) (pg. 3297, 3.2. Implementation), the entire network is trained end-to-end in an unsupervised way to learn the low-dimensional representations for the input brain imaging data, pg. 3296, left col, first para.)
	a digital signal processor (DSP) to evaluate graph data, (Our proposed networks use ConvNets on graph to compute rich features for the input graph signals, pg. 3296, right col, 2.2. Model Structure). 
	 the computer program product comprises at least one computer- readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising executable portions configured to: (We use TensorFlow to implement our networks, (pg. 3297, 3.2. Implementation), we use TensorFlow code (are configured to run on either CPUs or GPUs) to implement our networks, pg. 3297, left col, first para.)
	receive, by a processor, (the k-th network layer takes as input a graph signal, pg.3297, second to the last para.)
	 known graph data that includes irregular grid graph data; (MEG signal datasets, pg. 3297, 3. Experiment)
	split, by the processor, (into training data and cross-validation data, pg. 3298, left col, 3.2. Implementation)
	 the known graph data into a set of training graph data and a set of cross-validation graph data; (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	construct, by the processor, a constructed set of filters using the training graph data, (The structure of the ConvNets on graph is shown in Figure 3, which integrates the graph information into the neural network. (pg. 3296, 2.2.1. ConvNets on graph) Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	wherein: (i) each constructed filter in the set of constructed filters is generated based at least in part on a corresponding initial Laplacian operator of one or more initial Laplacian operators and a corresponding filter type of a set of filter types, (Graph Laplacian, or combinatorial Laplacian is defined as L = D − W, where D is the diagonal degree matrix with diagonal element Dii = ∑Nj=1 Wij. Since L is an symmetric matrix, it can be eigen-decomposed as L = UΛUT and has a complete set of orthonormal eigenvectors, denoted as ul, for l = 0, 1, ..., N − 1, and sorted real associated eigenvalues λl, known as the frequencies. In other words, we have Lul = λlul for l = 0, 1, ..., N − 1 and 0 ≤ λ0 < λ1 < ... < λN−1. Normalized graph Laplacian, defined as L˜ = I − D –1/2 LD−1/2, is also widely used due to the property that all the eigenvalues of it lie in the interval [0, 2]. {ul} acts like the Fourier basis in analogy to the eigen-functions of Laplace operator in classical signal processing, pg. 3296, right col, first para.) and 
	(ii) the set of filter types comprise a K-order Chebyshev filter type, (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	a first order renormalized filter type, (We also use a renormalization technique proposed, which converts IN + D – 1/2 AD− 1/2 (A is the adjacency matrix) into ˆD−1/2 ˆAD−1/2, where A = A + IN and D is the corresponding degree matrix of ˆA. The reason for renormalization is that the eigenvalues of IN +D –1/2 AD−1/2 are in the interval [0, 2], which makes training of this neural network unstable due to gradient explosion, pg. 3296, right col, last para.) and  
	formulate, by the processor, (a formulation of ConvNets on graph, pg. 3295, right col second to the last para.)
	an objective function for training; (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function) 
	generate, by the processor, an optimized DSP using the objective function, (Adam (as optimization solver for a neural network algorithm) is adopted to minimize the MSE (mean square error) with learning rate 0.001 (pg. 3297, 3.2. Implementation)
	the constructed set of filters, (Chebyshev polynomial generated recursively. k
is the order of the polynomial, which means that the filter is k-hop localized., pg. 3296, right col, third para.)
	the training graph data and the cross-validation graph data, (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	wherein: the optimized DSP includes a set of hidden layers, (fully connected layers, pg. 3298, left col, 3.2. Implementation)
	each hidden layer (kth and (k+1)th hidden layers respectively, Fig. 3) comprises 
	(ii) comprises at least one filter associated with the K- order Chebyshev filter type, (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	at least one filter associated with the first order renormalized filter type, (We also use a renormalization technique proposed, which converts IN + D – 1/2 AD− 1/2 (A is the adjacency matrix) into ˆD−1/2 ˆAD−1/2, where A = A + IN and D is the corresponding degree matrix of ˆA. The reason for renormalization is that the eigenvalues of IN +D –1/2 AD−1/2 are in the interval [0, 2], which makes training of this neural network unstable due to gradient explosion, pg. 3296, right col, last para.) 
	and a save, in a memory a set of parameters defining the optimized DSP (we use TensorFlow to implement our networks, pg. 3297, 3.2. Implementation. Examiner notes: TensorFlow is an open-source Software library for machine learning operation and are configured to run on either CPUs or GPUs for processing and storing machine learning parameters))
	Guo does not explicitly teach a set of heterogeneous kernels (HKs), each HK is associated with a corresponding set of filters that: (i) is selected from the constructed set of filters, and each HK is generated based at least in part on weighted combination of the corresponding set of filters associated with the HK ; a K-order topology adaptive filter type; and saving, in a memory a set of parameters defining the optimized DSP.  
          Price teaches a set of heterogeneous kernels (HKs) (The final experiment reported herein is with respect to heterogeneous kernels. For this experiment, each iECO descriptor’s top 5 individuals and the pre-screener score are concatenated to form a single feature vector, of which kernels were applied to and weights derived. Specifically, we used a single RBF, two polynomials (of degree two and three), and the dot product kernels, of which no normalization was implemented. Results are presented in Table 5.4. All methods performance dropped when using heterogeneous kernels, however our kernel matrix-based method degraded the most gracefully, with performance dropping only approximately 2-4% for all folds whereas MKLGL, ... As with the optimization function case, in theory, we should be able to find weights that would address the heterogeneous kernels through the weights “feature shrinkage” and “importance weight” for each kernel. We should be able to get the same results that we obtained in our previous experiment, as our heterogeneous case involved a RBF and the system could have defaulted back to simply learning that solution. pg. 78-79)
        	each HK is associated with a corresponding set of filters that: (i) is selected from the constructed set of filters, (This pre-screener is an ensemble of trainable size-contrast filters (local dual sliding window detectors). Each size-contrast filter has seven parameters; the inner window height
and width, the pad height and width (which determine the size of the outer window), a Bhattacharyya distance threshold, a squared difference between the mean values threshold, and three state parameters, referred to as DType (which determines whether the detector will trigger only on bright on dark regions, dark on bright regions, or both), pg. 38; Specifically, iECO is comprised of three
steps. Step one is the learning of a composition of filters in the context of different feature descriptors. Step two is the fusion of these learned descriptors. Step three is classification. In comparison, a CNN is typically two parts, one part filter and feature learning and one part classification (e.g., MLP), pg. 82-83, last para.)
	   each HK is generated based at least in part on weighted combination of the corresponding set of filters associated with the HK (we assume that the kernel K is
composed of a weighted combination of pre-computed base kernel matrices by

    PNG
    media_image1.png
    93
    209
    media_image1.png
    Greyscale

where there are m kernels and _k is the weight applied to the kth kernel. The above operation is valid because we can add and multiply (by scalars greater than 0) Gram matrices and are guaranteed to produce a Gram matrix. Specifically, the above falls into the category of linear convex sum (LCS) as 0 ≤σk ≤ 1 and
∑mk=1 σk = 1, pg. 104)
	          It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Guo to incorporate the teachings of Price for the benefit of learning a composition of filters in the context of different feature descriptors (Price, 82)
	Du teaches a K-order topology adaptive filter type; (The above equation shows that each neuron in the graph convolutional layer is connected only to a local region (local vertices and edges) in the vertex domain of the input data volume, which is adaptive to the graph topology. The strength of correlation is explicitly utilized in ω(pkj,i). We refer to this method as topology adaptive graph convolutional network (TAGCN) In Fig. 2, we show TAGCN with an example of 2-size filter sliding from vertex 1 (figure on the lefthand-side) to vertex 2 (figure on the right-hand-side). The filter is first placed at vertex 1. Since paths (1, 2, 1) (5, 4, 1) and so on (paths with red glow) are all 2-length paths to vertex 1, they are covered by this 2-size filter. Since paths (2, 3) and (7, 3) are not on any 2-length path to vertex 1, they are not covered by this filter. Further, when this 2-size filter moves to vertex 2, paths (1, 5), (4, 5) and (6, 5) are no longer covered, but paths (2, 3) and (7, 3) are first time covered and contribute to the convolution with output at vertex 2. Further, y (`) f (i) is the weighted sum of the input features of vertices in x (`) c that are within k-paths away to vertex i, for k = 0, 1, . . . K, with weights given by the products of components of A k and g (`) c,f,k. Thus the output is the weighted sum of the feature map given by the filtered results from 1-size up to K-size filters. It is evident that the vertex convolution on the graph using Kth order polynomials is K-paths localized. Moreover, different vertices on the graph share g (`) c,f,k. The above local convolution and weight sharing properties of the convolution (5) on a graph are very similar to those in traditional CNN , pg. 6, second para.)
	     at least one filter associated with the K-order topology adaptive filter type, 2 of 12  (The above equation shows that each neuron in the graph convolutional layer is connected only to a local region (local vertices and edges) in the vertex domain of the input data volume, which is adaptive to the graph topology. The strength of correlation is explicitly utilized in ω(pkj,i). We refer to this method as topology adaptive graph convolutional network (TAGCN) In Fig. 2, we show TAGCN with an example of 2-size filter sliding from vertex 1 (figure on the lefthand-side) to vertex 2 (figure on the right-hand-side). The filter is first placed at vertex 1. Since paths (1, 2, 1) (5, 4, 1) and so on (paths with red glow) are all 2-length paths to vertex 1, they are covered by this 2-size filter. Since paths (2, 3) and (7, 3) are not on any 2-length path to vertex 1, they are not covered by this filter. Further, when this 2-size filter moves to vertex 2, paths (1, 5), (4, 5) and (6, 5) are no longer covered, but paths (2, 3) and (7, 3) are first time covered and contribute to the convolution with output at vertex 2. Further, y (`) f (i) is the weighted sum of the input features of vertices in x (`) c that are within k-paths away to vertex i, for k = 0, 1, . . . K, with weights given by the products of components of A k and g (`) c,f,k. Thus the output is the weighted sum of the feature map given by the filtered results from 1-size up to K-size filters. It is evident that the vertex convolution on the graph using Kth order polynomials is K-paths localized. Moreover, different vertices on the graph share g (`) c,f,k. The above local convolution and weight sharing properties of the convolution (5) on a graph are very similar to those in traditional CNN, pg. 6, second para.)
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Guo to incorporate the teachings of Du for the benefit of a topology adaptive graph convolutional network (TAGCN), which uses learnable filters to perform convolutions on graphs and the topologies of the filters are adaptive to the topology of the graph when they scan the graph to perform convolution which exhibits better performance and is computationally simpler than other recent methods. (Du, abstract)

6.	Claims 8, 9, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al ("Deep neural networks on graph signals for brain imaging analysis." 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017.) in view of Price (“Fusion of evolution constructed features for computer vision.” A Dissertation Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical and Computer Engineering in the Department of Electrical and Computer Engineering, May 2018) in view of Du et al. ("Topology adaptive graph convolutional networks." arXiv preprint arXiv:1710.10370 (2017) and further in view Van Seijen et al (US20180165603)

	Regarding claim 8, Modified Guo teaches the apparatus of claim 7, but they do not explicitly teach in circumstances where the objective function is the loss function, the objective function is optimized when the loss function is minimized.  
	Van Seijen teaches in circumstances where the objective function is the loss function, the objective function is optimized when the loss function is minimized (by minimizing this loss function, the different heads of HRA approximate the optimal action-value functions under the different reward functions [0244])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Modified Guo to incorporate the teachings of Van Seijen for the benefit modelling the current estimate of the optimal value function with a deep neural network, (Van Seijen, [0228]) 

	Regarding claim 9, Modified Guo teaches the apparatus of claim 7, but they do not explicitly teach in circumstances where the objective function is the reward function, the objective function is optimized when the reward function is maximized.  
	Van Seijen teaches in circumstances where the objective function is the reward function, the objective function is optimized when the reward function is maximized.  (each agent can have its own reward function that maximizes the return based on these functions [0081])
	The same motivation to combine as the dependent claim 8 applies here.

	Regarding claim 17, Modified Guo teaches the method of claim 16, but they do not explicitly teach in circumstances where the objective function is the loss function, the objective function is optimized when the loss function is minimized.  
	Van Seijen teaches circumstances where the objective function is the loss function, the objective function is optimized when the loss function is minimized (By minimizing this loss function, the different heads of HRA approximate the optimal action-value functions under the different reward functions [0244])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Guo to incorporate the teachings of Van Seijen for the benefit of modelling the current estimate of the optimal value function with a deep neural network, (Van Seijen, [0228]) 

	Regarding claim 18, Modified Guo teaches the method of claim 16, but they do not explicitly teach in circumstances where the objective function is the reward function, the objective function is optimized when the reward function is maximized.  
	Van Seijen teaches in circumstances where the objective function is the reward function, the objective function is optimized when the reward function is maximized.  (each agent can have its own reward function that maximizes the return based on these functions [0081])
	The same motivation to combine as the dependent claim 17 applies here.

7.	Claims 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al ("Deep neural networks on graph signals for brain imaging analysis." 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017.)
in view of Price (“Fusion of evolution constructed features for computer vision.” A Dissertation Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical and Computer Engineering in the Department of Electrical and Computer Engineering, May 2018) in view of Du et al. ("Topology adaptive graph convolutional networks." arXiv preprint arXiv:1710.10370 (2017) and further in view of Liu et al (Random walk graph Laplacian-based smoothness prior for soft decoding of JPEG images." IEEE Transactions on Image Processing 26.2 (2016): 509-524)

	Regarding claim 21, Modified Guo teaches the apparatus of Claim 1, Guo teaches wherein the one or more initial Laplacian operators comprise a normalized Laplacian operator (since we use normalized Laplacian, pg. 3296, right col, last para.) and 
	Modified Guo does not explicitly teach a random walk Laplacian operator.  
	Liu teaches a random walk Laplacian operator (We first show that low graph frequencies for the normalized graph Laplacian Ln can be interpreted as relaxed solutions for spectral clustering, which are PWS if the underlying graph has distinct clusters. We then argue that to induce desirable filtering properties for Ln, a similarity transformation is required, resulting in the random walk Laplacian Lr. Finally, we show that a more appropriate smoothness prior than (25) can be defined using Lr, with many desirable filtering properties, pg. 515, right col, first para)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Guo to incorporate the teachings of Liu for the benefit of achieving better objective and subjective restoration quality in soft decoding approach for restoration of JPEG -compressed images. (Liu, conclusion, pg. 523, left col)

	Regarding claim 22, Modified Guo teaches the apparatus of Claim 21, wherein each corresponding set of filters for a particular HK (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized., pg. 3296, right col, third para.) comprises 
	at least one filter generated based at least in part on the normalized Laplacian operator (since we use normalized Laplacian, pg. 3296, right col, last para.) and
	Modified Guo does not explicitly teach at least one filter generated based at least in part on the random walk Laplacian operator.  
	Liu teaches at least one filter generated based at least in part on the random walk Laplacian operator. (We first show that low graph frequencies for the normalized graph Laplacian Ln can be interpreted as relaxed solutions for spectral clustering, which are PWS if the underlying graph has distinct clusters. We then argue that to induce desirable filtering properties for Ln, a similarity transformation is required, resulting in the random walk Laplacian Lr. Finally, we show that a more appropriate smoothness prior than (25) can be defined using Lr, with many desirable filtering properties, pg. 515, right col, first para)
	The same motivation to combine as dependent claim 21 applies here.

Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.G./Examiner, Art Unit 2121                                                                                                                                                                                                    


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121