DETAILED ACTION
1.	This office action is in response to the Application No. 16052936 filed on 8/02/2018. Claims 1-20 are presented for examination and are currently pending.	

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	
Election/Restrictions
3.	Restriction to one of the following inventions is required under 35 U.S.C. 121:
I. Claims 1-18 are drawn to generating and training a digital signal processor to evaluate graph data, classified in G06N 3/08.
II. Claims 19 and 20 are drawn to generating a predicted result using graph data and a digital signal processor, classified in G06N 5/04.

4.	The inventions are independent or distinct, each from the other because:
Inventions I and II are related as combination and subcombination.  Inventions in this relationship are distinct if it can be shown that (1) the combination as claimed does not require the particulars of the subcombination as claimed for patentability, and (2) that the subcombination has utility by itself or in other combinations (MPEP § 806.05(c)).  In the instant case, the combination as claimed does not require the particulars of the subcombination as claimed because the generating and training of a 
The examiner has required restriction between combination and subcombination inventions. Where applicant elects a subcombination, and claims thereto are subsequently found allowable, any claim(s) depending from or otherwise requiring all the limitations of the allowable subcombination will be examined for patentability in accordance with 37 CFR 1.104.  See MPEP § 821.04(a).  Applicant is advised that if any claim presented in a continuation or divisional application is anticipated by, or includes all the limitations of, a claim that is allowable in the present application, such claim may be subject to provisional statutory and/or nonstatutory double patenting rejections over the claims of the instant application. 
	Because these inventions are distinct for the reasons given above and have acquired a separate status in the art as shown by their different classification, restriction for examination purposes as indicated is proper.
	During a telephone conversation with Dane A. Baltich on 11/04/2021 a provisional election was made without traverse to prosecute the invention of group 1, claims 1-18.  Affirmation of this election must be made by applicant in replying to this Office action.  Claims 19 and 20 are withdrawn from further consideration by the examiner, 37 CFR 1.142(b), as being drawn to a non-elected invention.
	Applicant is reminded that upon the cancellation of claims to a non-elected invention, the inventorship must be corrected in compliance with  37 CFR 1.48(a) if one or more of the currently named inventors is no longer an inventor of at least one claim remaining in the application. A request to correct inventorship under 37 CFR 1.48(a) 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1-7 and 10-16 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al ("Deep neural networks on graph signals for brain imaging analysis." 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017.)
in view of Evans et al (US20040158569)
	Regarding claim 1, Guo teaches an apparatus for generating and training (we use TensorFlow to implement our networks, (are configured to run on either CPUs or GPUs) (pg. 3297, 3.2. Implementation), the entire network is trained end-to-end in an unsupervised way to learn the low-dimensional representations for the input brain imaging data, pg. 3296, left col, first para.)
	a digital signal processor (DSP) to evaluate graph data (Our proposed networks use ConvNets on graph to compute rich features for the input graph signals, pg. 3296, right col, 2.2. Model Structure). 
	the apparatus comprising at least one processor and at least on memory including computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: (We use TensorFlow to implement our networks, (pg. 3297, 3.2. Implementation), we use TensorFlow code (are configured to run on either CPUs or GPUs) to implement our networks, pg. 3297, left col, first para.)
	receive, by a processor, (the k-th network layer takes as input a graph signal, pg.3297, second to the last para.)
	known graph data that includes irregular grid graph data; (MEG signal datasets, pg. 3297, 3. Experiment)
	split, by processor (into training data and cross-validation data, pg. 3298, left col, 3.2. Implementation)
	the known graph data into a set of training graph data and a set of cross-validation graph data (after training all the networks for 300 epochs, we use 10-fold cross
validation, pg. 3298, left col, 3.2. Implementation)
	construct, by the processor, a set of filters using the training graph data; (The structure of the ConvNets on graph is shown in Figure 3, which integrates the graph information into the neural network. (pg. 3296, 2.2.1. ConvNets on graph) Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	formulate, by the processor, (a formulation of ConvNets on graph, pg. 3295, right col second to the last para.)
	an objective function for training; (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function)
	generate, by the processor, an optimized DSP using the objective function, (Adam (as optimization solver for a neural network algorithm) is adopted to minimize the MSE (mean square error) with learning rate 0.001 (pg. 3297, 3.2. Implementation)
	the constructed set of filters, (Chebyshev polynomial generated recursively. k
is the order of the polynomial, which means that the filter is k-hop localized., pg. 3296, right col, third para.)
	the training graph data and the cross-validation graph data; (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	wherein the optimized DSP includes a set of hidden layers, (fully connected layers, pg. 3298, left col, 3.2. Implementation)
	wherein each hidden layer (kth and (k+1)th hidden layers respectively, Fig. 3)
	of the set of hidden layers (fully connected layers, pg. 3297, right col, first para.) 
	comprises, wherein each HK of the set of HKs includes a corresponding set of filters (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is K-hop localized, pg. 3296, right col, Equation 2)
	selected from the constructed set of filters (polynomial filters, pg. 3296, right col, Equation 2). The applicant discloses in the instant specification that “Each HK comprises an aggregated set of filters comprising one or more of a K-order Chebyshev filter”, instant specification ([0026])
	and associated with one or more initial Laplacian operators (eigen-functions of Laplace operator, pg. 3296, right col, first para.)
	and corresponding initial filter parameters; (parameters are learned by back propagation pg. 3296, 2.1. GSP and convolution on graph)
	Guo does not explicitly teach a set of heterogeneous kernels (HKs), and save, in a memory a set of parameters defining the optimized DSP.  
	Evans teaches a set of heterogeneous kernels (HKs), (the component filters can be composed of heterogeneous combinations of filter types. [0115])
	and a save, in a memory a set of parameters defining the optimized DSP (as the size of the filter (model) as it is stored in memory, [0061])
	It would have being obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Guo to incorporate the teachings of Evans for the benefit of an alternative modelling strategy using stacked filters in a neural network as a means for creating an aggregation function (Evans, [0133])

	Regarding claim 2, Guo modified by Evans teaches the apparatus of claim 1, 
Guo teaches generate, by the processor, the set of hidden layers, (kth and (k+1)th hidden layers respectively, Fig. 3)
	wherein each hidden layer of the set of hidden layers, (fully connected layers, , pg. 3297, right col, first para.) 
	and wherein each hidden layer (fully connected layers, pg. 3297, right col, first para.) 
	is associated with an initial HK number representing a total number of HKs in each hidden layer; (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is K-hop localized, pg. 3296, right col, Equation 2)34Clean Version SpecificationAttyDkt: 054642/513005
	generate, by the processor, an initial DSP (fully connected layers, pg. 3298, left col, 3.2. Implementation)
	based on the set of hidden layers, (fully connected layers, pg. 3297, right col, first para.) 
	wherein the initial DSP is associated with an initial hidden layer number representing a total number of hidden layers in the initial DSP; (two-layer ConvNets on graph, pg. 3297, 3.2. Implementation)
	and update, by the processor, the one or more initial Laplacian operators, (we use normalized Laplacian and all the eigenvalues of it are in the interval [0, 2], pg. 3296, 2.2.1. ConvNets on graph)
	the corresponding initial filter parameters, (filters are polynomial chebyshev expansions where the polynomial coefficients are the parameters to be learned, pg. 3295, right col, second para.)
	the initial filter number, (filter h, pg. 3296, left col, first para.)
	the initial HK parameter, (in our work, the neural networks instead learn the weights, pg. 3297, right col, first para.)
	the initial HK number, (neural networks can also expand or compress number of the channels with 1 x 1 convolution, pg. 3297, right col, second para.)
	and the initial hidden layer number (k-th and (k+1)-th layers respectively, pg. 3297, Fig. 3)
	associated with the initial DSP in an iterative manner using the training graph data (after training all the networks for 300 epochs, pg. 3298, left col, 3.2. Implementation)
	and the cross-validation graph data until the objective function is optimized for defining the optimized DSP (we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	Evans teaches wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to generate the optimized DSP by causing the apparatus to:  (Computer system 1300 also includes a main memory 1306, such as a random access memory (RAM) or other dynamic Storage device, coupled to bus 1302 for storing information and instructions
to be executed by processor 1304. [0281])
	generate, by the processor, the set of HKs, wherein each HK of the set of HKs is generated based on a weighted combination of the corresponding set of filters
(measure of filter complexity could relate to the size of the filter in terms of bytes used to store the filter. These measures could be combined using a weighted sum, [0212])
	and is associated with an initial filter number representing a total number of filters in the corresponding set of filters; (an aggregation filter is based upon building a filter profile based upon features derived from the component filters, …. the aggregation filter in this case is more general than the previous filter, consisting of a threshold value and a collection of features, where each feature has value and a weight associated with it, [0114])
	is generated based on a weighted combination of the set of HKs (as yet another variant, each component filter could generate both a binary output (+1 or -1) and the actual score, which is weight-Summed to yield an overall Sum that is thresholded using 0. [0113])
	wherein each HK of the set of HKs is associated with an initial HK parameter, (the component filters can be composed of heterogeneous combinations of filter types. [0115])
	The same motivation to combine as the independent claim 1 applies here.

	Regarding claim 3, Guo modified by Evans teaches the apparatus of Claim 2, Evans teaches wherein the set of parameters defining the optimized DSP (connection parameters which are determined using a variety of methods of optimization, abstract)
	includes the corresponding initial filter parameters, the initial filter number, the initial HK parameter, the initial HK number, or the initial hidden layer number saved to the memory after the objective function is optimized. (filters are polynomial chebyshev expansions where the polynomial coefficients are the parameters to be learned, pg. 3295, right col, second para.)

	Regarding claim 4, Guo modified by Evans teaches the apparatus of claim 1, Guo teaches wherein each of the one or more initial Laplacian operators is a normalized Laplacian operator or a random walk Laplacian operator. (Since we use normalized 
Laplacian, pg. 3296, right col, last para.)
   
	Regarding claim 5, Guo modified by Evans teaches the apparatus of claim 1, Guo teaches wherein each filter of the constructed set of filters is a K- order Chebyshev filter, a first-order renormalized filter, or a K-order topology adaptive filter. (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the K filter is K-hop localized, pg. 3296, right col, third para.)

	Regarding claim 6, Guo modified by Evans teaches the apparatus of claim 1, Guo teaches wherein the optimized DSP further comprises a discriminant layer (our approach can extract more discriminative representations (from discriminant layer), abstract)

	Regarding claim 7, Guo modified by Evans teaches the apparatus of claim 1, Guo teaches wherein the objective function is a loss function or a reward function. (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function) 

(the proposed method, brain imaging data is modelled as signals residing on connectivity graphs estimated with causality analysis, (pg. 3295, right col, last para.) the entire network is trained end-to-end in an unsupervised way to learn the low-dimensional representations for the input brain imaging data, pg. 3296, left col, first para.)
	a digital signal processor (DSP) to evaluate graph data (Our proposed networks use ConvNets on graph to compute rich features for the input graph signals, pg. 3296, right col, 2.2. Model Structure). 
	the method comprising: receiving, by a processor, (the k-th network layer takes as input a graph signal, pg.3297, second to the last para.)
	known graph data that includes irregular grid graph data; (MEG signal datasets, pg. 3297, 3. Experiment)
	splitting, by processor (into training data and cross-validation data, pg. 3298, left col, 3.2. Implementation)
	the known graph data into a set of training graph data and a set of cross-validation graph data (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	constructing, by the processor, a set of filters using the training graph data;  
(The structure of the ConvNets on graph is shown in Figure 3, which integrates the graph information into the neural network. (pg. 3296, 2.2.1. ConvNets on graph) Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is k-hop localized, pg. 3296, right col, third para.)
	formulating, by the processor, (a formulation of ConvNets on graph, pg. 3295, right col second to the last para.)
	an objective function for training; (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function)
	generating, by the processor, an optimized DSP using the objective function, (Adam (as optimization solver for a neural network algorithm) is adopted to minimize the MSE (mean square error) with learning rate 0.001 (pg. 3297, 3.2. Implementation)
	the constructed set of filters, (Chebyshev polynomial generated recursively. k
is the order of the polynomial, which means that the filter is k-hop localized., pg. 3296, right col, third para.)
	the training graph data and the cross-validation graph data, (after training all the networks for 300 epochs, we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	wherein the optimized DSP includes a set of hidden layers, (fully connected layers, pg. 3298, left col, 3.2. Implementation)
	wherein each hidden layer (kth and (k+1)th hidden layers respectively, Fig. 3)
	of the set of hidden layers (fully connected layers, pg. 3297, right col, first para.) 
	comprises and wherein each HK of the set of HKs includes a corresponding set of filters (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is K-hop localized, pg. 3297, right col, Equation 2)
	selected from the constructed set of filters (polynomial filters, pg. 3297, right col, Equation 2). The applicant discloses in the instant specification that “Each HK comprises an aggregated set of filters comprising one or more of a K-order Chebyshev filter”, instant specification ([0026])
	and associated with one or more initial Laplacian operators (eigen-functions of Laplace operator, pg. 3296, right col, first para.)
	and corresponding initial filter parameters; (parameters are learned by back propagation pg. 3296, 2.1. GSP and convolution on graph)
	Guo does not explicitly teach a set of heterogeneous kernels (HKs), and save, in a memory a set of parameters defining the optimized DSP.  
	Evans teaches a set of heterogeneous kernels (HKs), (the component filters can be composed of heterogeneous combinations of filter types. [0115])
	and a save, in a memory a set of parameters defining the optimized DSP (as the size of the filter (model) as it is stored in memory, [0061])
	It would have being obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Guo to incorporate the teachings of Evans for the benefit of an alternative modelling strategy using stacked filters in a neural network as a means for creating an aggregation function (Evans, [0133])

	Regarding claim 11, Guo modified by Evans teaches the method of Claim 10, 
 (fully connected layers, pg. 3297, right col, first para.) 
	is associated with an initial HK number representing a total number of HKs in each hidden layer; (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the filter is K-hop localized, pg. 3296, right col, Equation 2)
	generating, by the processor, an initial DSP, (kth and (k+1)th hidden layers respectively, Fig. 3)
	based on the set of hidden layers, (fully connected layers, pg. 3297, right col, first para.) 
	wherein the initial DSP is associated with an initial hidden layer number representing a total number of hidden layers in the initial DSP; (two-layer ConvNets on graph, pg. 3297, 3.2. Implementation)
	and updating, by the processor, the one or more initial Laplacian operators, (we use normalized Laplacian and all the eigenvalues of it are in the interval [0, 2], pg. 3296, 2.2.1. ConvNets on graph)
	the corresponding initial filter parameters, (filters are polynomial chebyshev expansions where the polynomial coefficients are the parameters to be learned , pg. 3295, right col, second para.)
	the initial filter number, (filter h, pg. 3296, left col, first para.)
	the initial HK parameter, (in our work, the neural networks instead learn the weights, pg. 3297, right col, first para.)
	the initial HK number, (neural networks can also expand or compress number of the channels with 1 x 1 convolution, pg. 3297, right col, second para.)
	and the initial hidden layer number (k-th and (k+1)-th layers respectively, pg. 3297, Fig. 3)
	associated with the initial DSP in an iterative manner using the training graph data (after training all the networks for 300 epochs, pg. 3298, left col, 3.2. Implementation)
	and the cross-validation graph data until the objective function is optimized for defining the optimized DSP (we use 10-fold cross-validation, pg. 3298, left col, 3.2. Implementation)
	Evans teaches wherein generating the optimized DSP further comprises:   (Computer system 1300 also includes a main memory 1306, such as a random access memory (RAM) or other dynamic Storage device, coupled to bus 1302 for storing information and instructions to be executed by processor 1304. [0281])
	generating, by the processor, the set of HKs, wherein each HK of the set of HKs is generated based on a weighted combination of the corresponding set of filters
(measure of filter complexity  could relate to the size of the filter in terms of bytes used to store the filter. These measures could be combined using a weighted sum, [0212])
	and is associated with an initial filter number representing a total number of filters in the corresponding set of filters; (an aggregation filter is based upon building a filter profile based upon features derived from the component filters, …. the aggregation filter in this case is more general than the previous filter, consisting of a threshold value and a collection of features, where each feature has value and a weight associated with it, [0114])
	generating, by the processor, the set of hidden layers, wherein each hidden layer of the set of hidden layers is generated based on a weighted combination of the set of HKs, (as yet another variant, each component filter could generate both a binary output (+1 or -1) and the actual score, which is weight-Summed to yield an overall Sum that is thresholded using 0. [0113])
	wherein each HK of the set of HKs is associated with an initial HK parameter, (the component filters can be composed of heterogeneous combinations of filter types. [0115])
	The same motivation to combine as the independent claim 10 applies here.

	Regarding claim 12, Guo modified by Evans teaches the method of Claim 11, Evans teaches wherein the set of parameters defining the optimized DSP (connection parameters which are determined using a variety of methods of optimization, abstract)
	includes the corresponding initial filter parameters, the initial filter number, the initial HK parameter, the initial HK number, or the initial hidden layer number saved to the memory after the objective function is optimized  (filters are polynomial chebyshev expansions where the polynomial coefficients are the parameters to be learned , pg. 3295, right col, second para.)
	The same motivation to combine as the independent claim 10 applies here.

(since we use normalized 
 Laplacian, pg. 3296, right col, last para.)

	Regarding claim 14, Guo modified by Evans teaches the method of claim 10, Guo teaches wherein each filter of the constructed set of filters is a K- order Chebyshev filter, a first-order renormalized filter, or a K-order topology adaptive filter.  (Chebyshev polynomial generated recursively. K is the order of the polynomial, which means that the K filter is K-hop localized, pg. 3296, right col, third para.)

	Regarding claim 15, Guo modified by Evans teaches the method of claim 10, Guo teaches wherein the optimized DSP further comprises a discriminant layer (our approach can extract more discriminative representations (from discriminant layer), abstract)

	Regarding claim 16, Guo modified by Evans teaches the method of claim 10, Guo teaches wherein the objective function is a loss function or a reward function. (training of the entire network is end-to-end by minimizing mean square error (loss function) between input x and y, pg. 3297, right col, 2.2.2. Fully connected layers and loss function) 

s 8, 9, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al ("Deep neural networks on graph signals for brain imaging analysis." 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017.)
in view of Evans et al (US20040158569) and further in view Van Seijen et al (US20180165603)

	Regarding claim 8, Guo modified by Evans teaches the apparatus of claim 7, but they do not explicitly teach in circumstances where the objective function is the loss function, the objective function is optimized when the loss function is minimized.  
	Van Seijen teaches in circumstances where the objective function is the loss function, the objective function is optimized when the loss function is minimized (by minimizing this loss function, the different heads of HRA approximate the optimal action-value functions under the different reward functions [0244])
	It would have being obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Guo modified by Evans to incorporate the teachings of Van Seijen for the benefit  modelling the current estimate of the optimal value function with a deep neural network, (Van Seijen, [0228]) 

	Regarding claim 9, Guo modified by Evans teaches the apparatus of claim 7, but they do not explicitly teach in circumstances where the objective function is the reward function, the objective function is optimized when the reward function is maximized.  
(each agent can have its own reward function that maximizes the return based on these functions [0081])
	The same motivation to combine as the dependent claim 8 applies here.

	Regarding claim 17, Guo modified by Evans teaches the method of claim 16, but they do not explicitly teach in circumstances where the objective function is the loss function, the objective function is optimized when the loss function is minimized.  
	Van Seijen teaches circumstances where the objective function is the loss function, the objective function is optimized when the loss function is minimized (By minimizing this loss function, the different heads of HRA approximate the optimal action-value functions under the different reward functions [0244])
	It would have being obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Guo modified by Evans to incorporate the teachings of Van Seijen for the benefit  modelling the current estimate of the optimal value function with a deep neural network, (Van Seijen, [0228]) 

	Regarding claim 18, Guo modified by Evans teaches the method of claim 16, but they do not explicitly teach in circumstances where the objective function is the reward function, the objective function is optimized when the reward function is maximized.  
	Van Seijen teaches in circumstances where the objective function is the reward function, the objective function is optimized when the reward function is (each agent can have its own reward function that maximizes the return based on these functions [0081])
	The same motivation to combine as the dependent claim 17 applies here.

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance 

/M.G./Examiner, Art Unit 2121                                                                                                                                                                                            



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121