DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2022-08-11 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2022-08-11 has been entered.  The status of claims is as follows: 
Claims 1-8, 23-25, 27-32, and 34 are pending.
Claims 1-5, 23-25, 27-28, 31-32, and 34 are amended.
Claims 9-22, 26, and 33 are canceled.
Claims 35 and 36 are not included in the application, and are assumed to be canceled.
 
Response to Amendments/Arguments
Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 101: 
The rejections of claims 1-8 and 23-25 under 35 U.S.C. § 101 in the previous non-final Office action are withdrawn, as sufficient details on the method of training have been recited in the amendments to Claim 1.
 
Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 112(a): 
The rejection of claim 23 under 35 U.S.C. § 112(a) in the previous final Office action is withdrawn in view of the removal of the unsupported language in the amended claim 23.

Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 112(b): 
The rejections of claims 1-8, 23-25, 28, and 31-34 under 35 U.S.C. § 112(b) in the previous final Office action are withdrawn in view of the amendments to claims 1, 25, and 31.
 
Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 102/103: 
Applicant’s arguments with respect to the rejections of claims 1-8, 23-25, 27-32, and  34 under 35 U.S.C. § 112(b) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 23-25, 28-32, and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Kearnes et al. (“Molecular Graph Convolutions: Moving Beyond Fingerprints”; hereinafter “Kearnes”) in view of Heck et al. (“Supervised Machine Learning Methods Applied to Predict Ligand Binding Affinity”; hereinafter “Heck”) and Faber et al. (“Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error”; hereinafter “Faber”).
 
With respect to claim 1, Kearnes teaches:
A method performed by one or more computers for training a graph neural network having a plurality of weights, to perform molecular property prediction, the method comprising:  (Kearnes, p. 596, left-hand column, ¶ 2: “Here we describe molecular graph convolutions, a deep learning system using a representation of small molecules as undirected graphs of atoms.” p. 597, left-hand column, ¶ 1: “Neural networks are directed graphs of simulated ‘neurons’.” ¶ 5: “At the ‘top’ of the neural network you have node(s) whose output is the value you are trying to predict (e.g.[,] the probability that this molecule binds to a target or the binding afﬁnity). Many output nodes for different tasks can be added and this is commonly done [17, 29].” p. 600, right-hand column, ¶ 2: “Although our featurization includes space for hydrogen atoms, we did not use explicit hydrogens in any of our experiments in order to conserve memory and emphasize contributions from heavy atoms.”  Kearnes, p. 597, right-hand column, last paragraph: “In order to train the network, you ﬁrst have to choose a loss function describing the penalty for the network producing a set of outputs which differ from the outputs in the training example.”))
 
performing a first set of graph convolutions, by the graph neural network and in accordance with the plurality of weights of the graph neural network, on a first graph representation of the set of one or more molecules, (Kearnes, p. 596, Last paragraph: “Here we describe molecular graph convolutions, a deep learning system using a representation of small molecules as undirected graphs of atoms.”  P. 597, right-hand column, 5: “The ﬁrst basic unit of representation is an atom layer which contains an n-dimensional vector associated with each atom. Therefore the atom layer is a 2 dimensional matrix indexed ﬁrst by atom.” P.  598, left-hand column, ¶ 4, “Invariant-preserving operations”: “Since we apply the same function for every atom/pair, we refer to this as a convolution.  All the transformations we develop below will have this convolution nature of applying the same operation to every atom/pair, maintaining Property 2.”  “Invariant-Preserving Operations” further teach four types of graph convolutions: AP in Eq. 4, PA in Eq. 5, PP in Eq. 3, AA in Eq. 2.  P. 597, left-hand column, “Methods”, ¶ 1: “Neural networks are directed graphs of simulated ‘‘neurons’’. Each neuron has a set of inputs and computes an output. The neurons in early neural nets were inspired by biological neurons and computed an afﬁne combination of the inputs followed by a non-linear activation function. Mathematically, if the inputs are x1 . . . xN, weights w1 . . .wN and bias b are parameters, and f is the activation function, the output is [Eq. (1) (reproduction omitted)]”
The examiner notes that Kearnes’ representation of molecules as undirected graphs of atoms such as an atom layer upon which a graph convolutional transformation (e.g., A  P) operates teaches a first graph representation, and that Kearnes’ performing a graph convolutional transformation (e.g., A  P where A denotes atoms, and P denotes pairs) using weights (e.g., w1 . . .wN cited above) on the first graph representation teaches the above limitation.)
 
wherein the first set of graph convolutions are based at least in part on bond data defining bonds between pairs of atoms in the set of one or more molecules; (Kearnes, p. 597, right-hand column, Desired Invariants of A Model: “The output of the model should be invariant to the order that the atom and bond information is encoded in the input”. Table 2 lists “Hydrogen bonding” “Whether this atom is a hydrogen bond donor and/or acceptor” as an example input atom feature. 
The examiner notes that Kearnes’ encoding hydrogen bonding as well as atom and bond information in an input that is provided to its graph convolutions in, for example, its A  P transformation (where A denotes atoms, and P denotes pairs) teaches a first set of graph convolutions based on bond data that defines bonds between pairs of atoms because the hydrogen bonding is interpreted as occurring between an atom of a ligand molecule and another atom of a protein molecule. The examiner further notes that Kearnes’ graph convolutions in its P  P operation (where both Ps denote pairs) receive the “bond type” as an input and thus also teach a first set of graph convolutions based on bonds between the set of one or more molecules.)
 
performing a second set of graph convolutions, by the graph neural network and in accordance with the plurality of weights of the graph neural network, on a second graph representation of the set of one or more molecules, (Kearnes, p. 596, left-hand column, ¶ 4, “Invariant-Preserving Operations”: “Since we apply the same function for every atom/pair, we refer to this as a convolution.  All the transformations we develop below will have this convolution nature of applying the same operation to every atom/pair, maintaining Property 2.”  “Invariant-Preserving Operations” further teach four types of graph convolutions: AP in Eq. 4, PA in Eq. 5, PP in Eq. 3, AA in Eq. 2.  P. 597, left-hand column, “Methods”, ¶ 1: “Neural networks are directed graphs of simulated ‘neurons’. Each neuron has a set of inputs and computes an output. The neurons in early neural nets were inspired by biological neurons and computed an afﬁne combination of the inputs followed by a non-linear activation function. Mathematically, if the inputs are x1 . . . xN, weights w1 . . .wN and bias b are parameters, and f is the activation function, the output is [Eq. (1) (reproduction omitted)]” P. 597, right-hand column, 5: “The next basic unit of representation is a pair layer which contains an n-dimensional vector associated with each pair of atoms. Therefore, the pair layer is a 3 dimensional matrix where the ﬁrst two dimensions are indexed by atom.”
The examiner notes that Kearnes’ representation of molecules as undirected graphs of atoms such as a pair layer upon which a graph convolutional transformation (e.g., P  A) operates teaches a second graph representation, and that Kearnes’ performing a graph convolutional transformation (e.g., P  A where P denotes pairs, and A denotes atoms) using weights (e.g., w1 . . .wN cited above) on the aforementioned second graph representation teaches the above limitation.)
 
wherein the second set of graph convolutions are based at least in part on the spatial distance data defining [3D spatial] distances between pairs of atoms in the set of one or more molecules; (Kearnes, p. 597, ¶ 5, right-hand column, Desired Invariants of A Model: “we will encode the graph distance (length of shortest path from one atom to the other) in the input pair layer; Table 3 “Atom Pair Features” include “Graph distance” “whether the shortest path between the atoms in the pair is less than or equal to that number of bonds”.  The examiner notes that Kearnes’ graph convolutions in P  A convolutional transformation receives the pair attribute “graph distance”, which denotes the length of the shortest path between one atom to another, as an input and thus teach the second set of graph convolutions are based at least in part on spatial distance data between pairs of atoms in the set of one or more molecules.) *Note that applying graph convolutions to actual 3D spatial distances is taught below by other references
 
performing a graph gather operation, by the graph neural network and in accordance with the plurality of weights of the graph neural network, to produce a feature vector; and (Kearnes, p. 598, FIG. 2 caption: “P  A operation. Px is a matrix containing features for atom pairs ab, ac, ad, etc. The vi are intermediate values obtained by applying f to features for a given atom pair. Applying g to the intermediate representations for all atom pairs involving a given atom (e.g. a) results in a new atom feature vector for that atom”.  P. 597, left-hand column, “Methods”, ¶ 1: “Neural networks are directed graphs of simulated ‘neurons’. Each neuron has a set of inputs and computes an output. The neurons in early neural nets were inspired by biological neurons and computed an afﬁne combination of the inputs followed by a non-linear activation function. Mathematically, if the inputs are x1 . . . xN, weights w1 . . .wN and bias b are parameters, and f is the activation function, the output is [Eq. (1) (reproduction omitted)]”
The examiner notes that Kearnes’ graph neural network’s performing graph convolutional transformation (e.g., P  A cited above) is based on and hence in accordance with weights (wi cited above), that Kearnes’ applying the activation function f to the features of atom pairs and applying the summation function g to the intermediate representations for all atom pairs in a graph structure or molecular structure (a spatial graph representation) a graph gather, and that Kearnes’ applying f and g to generate a new feature vector for an atom teaches performing a graph gather with the spatial representation to produce a feature vector.)
 
predicting, by the graph neural network and in accordance with the plurality of weights of the graph neural network, a set of one or more characteristics for the set of one or more molecules based on the feature vector.  (Kearnes, p. 597, left-hand column, ¶ 5: “At the ‘‘top’’ of the neural network you have node(s) whose output is the value you are trying to predict (e.g. the probability that this molecule binds to a target or the binding affinity).” P. 597, right-hand column, ¶ 3 “Property 1”: “The output of the model should be invariant to the order that the atom and bond information is encoded in the input.” p. 597, right-hand column, ¶ 5: “The first basic unit of representation is an atom layer which contains an n-dimensional vector associated with each atom.”  P. 597, left-hand column, “Methods”, ¶ 1: “Neural networks are directed graphs of simulated ‘neurons’. Each neuron has a set of inputs and computes an output. The neurons in early neural nets were inspired by biological neurons and computed an afﬁne combination of the inputs followed by a non-linear activation function. Mathematically, if the inputs are x1 . . . xN, weights w1 . . .wN and bias b are parameters, and f is the activation function, the output is [Eq. (1) (reproduction omitted)]”
The examiner notes that Kearnes’ graph neural network’s performing graph convolutional transformation (e.g., P  A cited above) is based on and hence in accordance with weights (wi cited above), and that Kearnes’ predicting a probability of a molecule binding to a target or the binding affinity with a model that uses the atom layers respectively containing n-dimensional feature vectors associated with the atoms teaches this limitation.)
 
determining gradients of a loss function that measures an error in the prediction, generated by the graph neural network, of the set of one or more characteristics for the set of one or more molecules   (Kearnes, Page 3 Above Section 3.2, discloses:  “Training is done with the well known technique of back-propagation [Rumelhart et al., 1986] and stochastic gradient descent.”

updating the plurality of weights of the graph neural network using the gradients of the loss function.   (Kearnes, p. 597, left-hand column, last paragraph - right-hand column, ¶ 1: “The objective of training is then to ﬁnd a set of parameters for the network that minimizes the loss function. Training is done with the well known technique of back-propagation [32] and stochastic gradient descent.” p. 607, left-hand column, ¶ 3: “As has been pointed out elsewhere [9], the ability to use backpropagation to tune parameters at every stage of the network provides greater representational power than traditional descriptors, which are inﬂexible in the features they encode from the initial representation.” The examiner notes that Kearnes’ tuning parameters of its graph neural network using backpropagation techniques to minimize a loss function during training teaches training comprises updating the graph neural network based on the loss.)

Kearnes teaches multiple representations that encode 3D information, structural features, physical properties, activities, with emphasis on molecular shape and electrostatics (see e.g., Kearnes, pp. 596-597) but does not appear to explicitly teach:
obtaining spatial distance data for a set of one or more molecules, wherein the set of one or more molecules comprises a plurality of atoms, wherein the spatial distance data defines, for each pair of atoms from the plurality of atoms, a respective three-dimensional (3D) spatial distance between the pair of atoms in a 3D spatial configuration of the atoms in the set of one or more molecules; 
wherein the second set of graph convolutions are based at least in part on the spatial distance data defining 3D spatial distances between pairs of atoms in the set of one or more molecules
 
Heck does, however, teach:
obtaining spatial distance data for a set of one or more molecules, wherein the set of one or more molecules comprises a plurality of atoms, wherein the spatial distance data defines, for each pair of atoms from the plurality of atoms, a respective three-dimensional (3D) spatial distance between the pair of atoms in a 3D spatial configuration of the atoms in the set of one or more molecules; (Heck, Page 2460 Section 2, discloses:  “We face two major challenges in the computational prediction of ligand-binding affinity through SML techniques. Firstly, we need the three-dimensional structure of the protein for which we want to build a mathematical model to predict binding affinity.”  Here, Heck discloses obtaining 3-D information about the molecules.  Heck, Page 2462 Section 3.1 Below Eq. 2, discloses:  “In the Eq. (2), N1 and N2, are the numbers of nonhydrogen atoms in the ligand and protein, respectively. The term EPLP indicates the piecewise linear potential described elsewhere [28] and rij accounts for interatomic distance.”  Here, Heck discloses obtaining “interatomic distance” between atoms within the molecules.)
Kearnes and Heck are analogous art because they are both in the field of endeavor of using machine learning to predict molecular properties.
It would have been obvious before the effective filing date of the claimed invention to combine the graph convolutional network of Kearnes with the 3D spatial distance of Heck.  One of ordinary skill in the art would be motivated to do so in order to more accurately predict molecular interactions, which are known to take place within a certain distance cutoff, and thus incorporating the distance between atoms would increase the accuracy of the machine learning model (Heck, Page 2460 Section 2:  “Such integration of ligand-binding information and three-dimensional data is a favorable scenario for the development of ML regression models” and Heck, Page 2463 Section 3.5:  “Furthermore, each interaction represents the number of events indicating an intermolecular interaction involving a pair of atoms which occurs given a cutoff value for the interatomic distance.”)
However, the combination of Kearnes and Heck does not explicitly teach applying interatomic distances to graph convolutions, and thus does not explicitly teach wherein the second set of graph convolutions are based at least in part on the spatial distance data defining 3D spatial distances between pairs of atoms in the set of one or more molecules.
 Faber teaches wherein the second set of graph convolutions are based at least in part on the spatial distance data defining 3D spatial distances between pairs of atoms in the set of one or more molecules. (Faber, Page 5259 Section 2.5.5 “Graph Convolutions”, discloses:  “We used the GC model as described in Kearnes et al.,27 with several structural modifications and optimized hyperparameters. The graph convolution model is built on the concepts of “atom” layers (one real vector associated with each atom) and “pair” layers (one real vector associated with each pair of atoms).”  Faber continues on the next Page 5260:  “Since the model only uses the atom layer for the molecule level features, pair order invariance is not needed. Second, we used the Euclidean distance between atoms. In the (P → A) transformation, we divide the value from the convolution step by a series of distance exponentials.”  Here, Faber applies interatomic distances to a graph convolution model.)
Faber and the combination of Kearnes and Heck are analogous art because they are both in the field of endeavor of using machine learning to predict molecular properties.
It would have been obvious before the effective filing date of the claimed invention to combine the graph convolution network and 3-D distance cutoff of Kearnes and Heck with the incorporation of interatomic distances into graph convolutions of Faber.  Examiner notes that Faber acknowledges the connection to Kearnes in Page 5259 Section 2.5.5 “Graph Convolutions”, discloses:  “We used the GC model as described in Kearnes et al.”  One of ordinary skill in the art would be motivated to do so in order to, as described above, capture the proximity-dependent properties of interatomic interactions in order to accurately predict interactions (Heck, Page 2463 Section 3.5:  “Furthermore, each interaction represents the number of events indicating an intermolecular interaction involving a pair of atoms which occurs given a cutoff value for the interatomic distance.”), and Faber also acknowledges the success of their Graph Convolution (“GC”) approach which includes distance (Faber, Page 5262:  “We have found that GC, GG, and KRR have the best performance across all properties… For intensive electronic properties (μ, HOMO/LUMO eigenvalues, and gap) we found MG/GC or MG/GG to yield the highest predictive power.”)
 
With respect to claim 2, Kearnes modified by Heck and Faber teaches the method of claim 1, and Kearnes further teaches:
building the second graph representation of the set of one or more molecules, (¶ 5, Desired Invariants of A Model, RHS, p. 597: “The first basic unit of representation is an atom layer which contains an n-dimensional vector associated with each atom”; and “The next basic unit of representation is a pair layer which contains an n-dimensional vector associated with each pair of atoms.”  The examiner first notes that Kearnes’ first basic unit of representation and the net basic unit of representation respectively teach a first graph representation and a second graph representation. The examiner further notes that Kearnes’ generating a representation with at least the next basic unit (e.g., Kearnes’ pair layer) teaches this limitation.)
 
Kearnes does not appear to explicitly teach:
wherein building the second graph representation comprises generating a distance matrix and an adjacency tensor,
wherein the distance matrix denotes distances between pairs of atoms in the set of one or more molecules and the adjacency tensor indicates a plurality of different edge types between atoms in the set of one or more molecules.  
 
Faber does, however, teach:
wherein building the second graph representation comprises generating a distance matrix and an adjacency tensor, (Faber, Page 5259 Section 2.5.5 “Graph Convolutions”, discloses:  “Second, we used the Euclidean distance between atoms.”  Examiner notes that one of ordinary skill in the art will appreciate that pairwise distances between a plurality of objects comprises a matrix of values (also see Heck in Claim 1 above, using a 2-subscript matrix  entry notation:  “rij accounts for interatomic distance.”  Faber, Page 5259 Section 2.5.6 “Gated Graph Neural Networks” discloses:  “We used the GG neural networks model (GG) as described in Li et al.28 Similar to the GC model, GG is a deep neural network whose input is a set of node features {xv, v ∈ G} and an adjacency matrix A with entries in a discrete set S = {0,1,···,k} to indicate different edge types.”  Here, Faber discloses that “Similar to the GC model”, they use an “adjacency matrix A”.  One of ordinary skill in the art will appreciate that a matrix is a two-dimensional version of a tensor.)
wherein the distance matrix denotes distances between pairs of atoms in the set of one or more molecules and the adjacency tensor indicates a plurality of different edge types between atoms in the set of one or more molecules.  (Faber, Page 5259 Section 2.5.5 “Graph Convolutions”, discloses:  “Second, we used the Euclidean distance between atoms.”  Faber, Page 5259 Section 2.5.6 “Gated Graph Neural Networks” discloses:  “We used the GG neural networks model (GG) as described in Li et al.28 Similar to the GC model, GG is a deep neural network whose input is a set of node features {xv, v ∈ G} and an adjacency matrix A with entries in a discrete set S = {0,1,···,k} to indicate different edge types.”  Here, Faber discloses that the adjacency matrix represents “difference edge types”.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Faber with Kearnes and Heck. 
 
With respect to claim 3, Kearnes modified by Heck and Faber teaches the method of claim 1, and Kearnes further teaches wherein the set of one or more molecules comprises a ligand molecule and a target molecule. (Kearnes, Abstract “Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.” ¶ 5, Methods – Deep Neural Networks, left-hand-column, p. 597: “At the ‘‘top’’ of the neural network you have node(s) whose output is the value you are trying to predict (e.g. the probability that this molecule binds to a target or the binding affinity).”) 
 
With respect to claim 4, Kearnes modified by Heck and Faber teaches the method of claim 1, and Kearnes also teaches:
wherein the second set of graph convolutions are further based on the bond data defining bonds between pairs of atoms in the set of one or more molecules.  (Kearnes, ¶ 4, “Invariant-Preserving Operations”, left-hand column, p. 596: “Since we apply the same function for every atom/pair, we refer to this as a convolution.  All the transformations we develop below will have this convolution nature of applying the same operation to every atom/pair, maintaining Property 2.”  “Invariant-Preserving Operations” further teach four types of graph convolutions: AP in Eq. 4, PA in Eq. 5, PP in Eq. 3, AA in Eq. 2.  Table 3 “Atom Pair Features” include “Bond type” including “Single, double, triple, or aromatic (one-hot or null)”.  The examiner notes that Kearnes’ graph convolutions in P  A convolutional transformation (where P denotes pairs, and A denotes atoms) receives the feature “bond type” as an input and thus teach performing a second set of graph convolutions based on at least bonds between ligands and targets and hence between a set of molecules.)
 
With respect to claim 5, Kearnes modified by Heck and Faber teaches the method of claim 1, and Kearnes further teaches:
wherein the first set of graph convolutions is based on a first set of bonds between the set of one or more molecules and (Kearnes, last paragraph, Introduction, p. 596: “Here we describe molecular graph convolutions, a deep learning system using a representation of small molecules as undirected graphs of atoms.”  Property 1, Desired Invariants of A Model, RHS, p. 3: “The output of the model should be invariant to the order that the atom and bond information is encoded in the input”. Table 1 lists “Hydrogen bonding” “Whether this atom is a hydrogen bond donor and/or acceptor” as an example input atom feature.  ¶ 5, left-hand column, p. 597: “At the ‘’top’’ of the neural network you have node(s) whose output is the value you are trying to predict (e.g.[,] the probability that this molecule binds to a target or the binding affinity).”
The examiner notes that Kearnes’ encoding hydrogen bonding and/or bond information in an input for its graph convolutions in its A  P and/or P  P transformation (where A denotes atoms and P denotes pairs) teaches a first set of graph convolutions based on a first set of bonds between a set of molecules because the hydrogen bonding and bond information are interpreted as occurring between a ligand molecule and a target protein molecule.)
 
the second set of graph convolutions is based at least in part on a second set of bonds between the set of one or more molecules.  (Kearnes, p. 598, “Invariant-Preserving Operations” further teaches four types of graph convolutions: AP in Eq. 4, PA in Eq. 5, PP in Eq. 3, AA in Eq. 2.  Table 3, p. 601: “Atom Pair Features” include “Bond type” including “Single, double, triple, or aromatic (one-hot or null)”. 
The examiner notes that the aforementioned bond types such as single, double, triple, and aromatic bonds constitute a second set of bonds between a ligand molecule and a target molecule (e.g., a protein molecule) and are different from hydrogen bonding which is not recognized as a chemical bond between atoms.  The examiner further notes that Kearnes’ graph convolutions in P  A convolutional transformation receives the feature “bond type” as an input for the convolutional transformation and thus teaches performing a second set of graph convolutions based on a second set of the bonds between ligands and targets and hence between a set of molecules.)
 
With respect to claim 23, Kearnes modified by Heck and Faber teaches the method of claim 1, and Kearnes further teaches:
generating the second graph representation of the set of one or more molecules prior to performing the second set of graph convolutions, (Kearnes, p. 596, left-hand column, ¶ 4, “Invariant-Preserving Operations”: “Since we apply the same function for every atom/pair, we refer to this as a convolution.  All the transformations we develop below will have this convolution nature of applying the same operation to every atom/pair, maintaining Property 2.”  “Invariant-Preserving Operations” further teach four types of graph convolutions: AP in Eq. 4, PA in Eq. 5, PP in Eq. 3, AA in Eq. 2.  P. 597, left-hand column, P. 597, right-hand column, 5: “The next basic unit of representation is a pair layer which contains an n-dimensional vector associated with each pair of atoms. Therefore, the pair layer is a 3 dimensional matrix where the ﬁrst two dimensions are indexed by atom.”
The examiner notes that Kearnes’ performing the second set of graph convolutions on the second graph representation (e.g., the aforementioned next basic unit) requires the second graph representation be available. Therefore, the second graph representation is generated prior to performing the second set of graph convolutions.)
wherein generating the second graph representation of the set of one or more molecules comprises: determining, for each pair of atoms that are separated by a spatial distance that is below a threshold value, that the pair of atoms are neighbors in the second graph representation of the set of one or more molecules.  (Kearnes, p. 605, left-hand column, ¶ 2: “Using a consistent base architecture with two Weave modules and a maximum atom pair distance of 2, we compared these traditional reduction strategies with our Gaussian histogram approach.”
The examiner notes that Kearnes’ limiting the maximum distance between a pair of atoms teaches a distance that is below a threshold value, and that Kearnes’ accommodating pair distances smaller than or equal to 2 in its graph representations (e.g., the second graph representation such as Kearnes’ next basic unit of representation cited above) renders obvious that the atoms in a pair are neighbors of each other where the maximum distance defines what constitutes a neighbor for an atom.)
Kearnes does not appear to explicitly teach the distance is a spatial distance that defines 3D spatial distances although Kearnes does teach many representations encode 3D information, structural features, physical properties, activities, with emphasis on molecular shape and electrostatics on pp. 596-597. 
Heck does, however, teach:
a spatial distance (Heck, Page 2462 Section 3.1 Below Eq. 2, discloses:  “In the Eq. (2), N1 and N2, are the numbers of nonhydrogen atoms in the ligand and protein, respectively. The term EPLP indicates the piecewise linear potential described elsewhere [28] and rij accounts for interatomic distance.”)
that is below a threshold value (Heck, Page 2463 Section 3.5, discloses:  “Furthermore, each interaction represents the number of events indicating an intermolecular interaction involving a pair of atoms which occurs given a cutoff value for the interatomic distance.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Heck with Kearnes and Faber.
 
With respect to claim 24, Kearnes modified by Heck and Faber teaches the method of claim 1, and Kearnes further teaches:
wherein the set of one or more characteristics for the set of one or more molecules comprises one or more of: a toxicity of the set of one or more molecules, a solubility of the set of one or more molecules, a binding affinity of the set of one or more molecules, or quantum properties of the set of one or more molecules.  (Kearnes, p. 597, left-hand column, ¶ 5: “At the ‘‘top’’ of the neural network you have node(s) whose output is the value you are trying to predict (e.g. the probability that this molecule binds to a target or the binding affinity).” The examiner notes that Kearnes’ probability of one molecule binds to a target and/or its binding affinity teaches a binding affinity of the set of one or more molecules. The examiner further notes that claim 24 recites one or more of: a toxicity of the set of one or more molecules, a solubility of the set of one or more molecules, a binding affinity of the set of one or more molecules, or quantum properties of the set of one or more molecules. Therefore, Kearnes’ teaching of a binding affinity teaches the new claim 24.)
 
With respect to claim 25, Kearnes modified by Heck and Faber teaches the method of claim 1, and Kearnes further teaches:
providing data defining the set of one or more characteristics for the set of one or more molecules for use in performing drug discovery.  (Kearnes, P. 597, right-hand column, ¶¶ 3-4: “For a deep learning architecture taking a molecular graph as input, some arbitrary choice must be made for the order that the various atoms and bonds are presented to the model. Since that choice is arbitrary, we want: Property 1 (Order invariance) The output of the model should be invariant to the order that the atom and bond information is encoded in the input.” P. 597, left-hand column, “Deep neural networks” ¶ 5: “At the ‘top’ of the neural network you have node(s) whose output is the value you are trying to predict (e.g.[,] the probability that this molecule binds to a target or the binding afﬁnity).” p. 607, right-hand column, ¶ 4: “Looking forward, graph convolutions (and related graph-based methods; see ‘‘Related work’’ section) present a ‘‘new hill to climb’’ in computer-aided drug design and cheminformatics. Although our current graph convolution models do not consistently outperform state-of-the-art ﬁngerprint-based models, we emphasize their ﬂexibility and potential for further optimization and development.”)
 
With respect to claim 28, Kearnes modified by Heck and Faber teaches the method of claim 1, and Kearnes further teaches:
wherein the set of one or more molecules comprises a plurality of molecules.  (Kearnes, p. 595, Abstract: “We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, speciﬁcally small molecules. Graph convolutions use a simple encoding of the molecular graph – atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all ﬁngerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.”)
 
With respect to claim 29, Kearnes modified by Heck and Faber teaches the method of claim 28, and Kearnes further teaches:
wherein the plurality of molecules comprises a ligand molecule and a target molecule.  (Kearnes, p. 595, Abstract: “Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.” ¶ 5, Methods – Deep Neural Networks, left-hand-column, p. 597: “At the ‘‘top’’ of the neural network you have node(s) whose output is the value you are trying to predict (e.g. the probability that this molecule binds to a target or the binding affinity).”) 
 
With respect to claim 30, Kearnes modified by Heck and Faber teaches the method of claim 29, and Kearnes further teaches:
wherein the graph gather operation is performed solely with respect to the ligand molecule.  (Kearnes, p. 595, Abstract: “We describe molecular graph convolutions” that “represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement” for “drug discovery applications”.  ¶ 5, right-hand column, p. 599: “Throughout this paper, we construct the molecule-level features ONLY from the top-level atom features and not the pair features. FIG. 2 illustrates “P  A operation. Px is a matrix containing features for atom pairs ab, ac, ad, etc. The vi are intermediate values obtained by applying f to features for a given atom pair. Applying g to the intermediate representations for all atom pairs involving a given atom (e.g. a) results in a new atom feature vector for that atom”.  This is to restrict the total number of feature vectors that must be summarized while still providing information about the entire molecule.”  The examiner notes that Kearnes’ determining the new feature vector ONLY for a top-level atom for ligand-based screening, but not pair features between a ligand and a target molecule, teaches performing graph gather solely on the ligand molecules.)

With respect to claim 31, this is a system claim corresponding to method claim 1.  The difference is that it recites one or more computers and a non-transitory memory.  Kearnes, p. 602, left-hand column, ¶ 1: “Training was parallelized over 96 CPUs (or 96 GPUs in the case of the W4N2 model)”. The examiner notes that Kearnes’ 96 CPUs or 96GPUs teach one or more computers.  Kearnes, Page 6 Right Column, discloses:  “Although our featurization includes space for hydrogen atoms, we did not use explicit hydrogens in any of our experiments in order to conserve memory.”  Here, Kearnes recites memory.  Claim 31 recites substantially similar claimed limitations as claim 1 and is thus rejected accordingly, the same citations and rationale applying.
 
With respect to claim 32, it recites substantially similar claimed limitations as claim 23 and is thus rejected accordingly, the same citations and rationale applying.
 
With respect to claim 34, this is a non-transitory memory claim corresponding to method claim 1.  The difference is that it recites one or more computers and a non-transitory memory.  Kearnes, p. 602, left-hand column, ¶ 1: “Training was parallelized over 96 CPUs (or 96 GPUs in the case of the W4N2 model)”. The examiner notes that Kearnes’ 96 CPUs or 96GPUs teach one or more computers.  Kearnes, Page 6 Right Column, discloses:  “Although our featurization includes space for hydrogen atoms, we did not use explicit hydrogens in any of our experiments in order to conserve memory.”  Here, Kearnes recites memory.  Claim 34 recites substantially similar claimed limitations as claim 1 and is thus rejected accordingly, the same citations and rationale applying.
 
Claims 6-8 stand rejected under 35 U.S.C. § 103 as being unpatentable over Kearnes in view of Heck and Faber and further in view of Merkwirth et al. Automatic Generation of Complementary Descriptors with Molecular Graph Networks (2005) (hereinafter Merkwirth).
 
With respect to claim 6, Kearnes modified by Heck and Faber teaches the method of claim 1 but does not appear to explicitly teach:
wherein performing the first set of graph convolutions comprises utilizing a first plurality of neural networks, wherein each neural network of the first plurality of neural networks is used for a different bond type.  
 
Merkwirth does, however, teach:
wherein performing the first set of graph convolutions comprises utilizing a first plurality of neural networks, wherein each neural network of the first plurality of neural networks is used for a different bond type.  (Merkwirth, p. 1160, § 2, last paragraph: “The weights governing the dynamic evolution of a feature net do not pertain to a specific position within the network; instead, element and bond type of the node determine which weights are taken from several common weight tables. The tables constitute the adjustable parameters of a feature net.”  FIG. 1 (an annotated version produced immediately below) shows a molecular graph of dichloromethane input having four separate bonds into four separate neural networks (F1 through F4) that respectively generate outputs that are then multiplied with the respective weights (selected according to the bond types) for the final output.
The examiner notes that Merkwirth’s four separate neural networks teach a first plurality of neural networks.  The examiner further notes that Merkwirth’s providing molecular graph of dichloromethane input having four separate bonds into each of the four separate neural networks (e.g., F1 through F4 in FIG. 1) that respectively generate outputs that are then multiplied with the respective weights (selected according to the bond types) for the final output teaches each neural network is used for a different bond type.)









Kearnes, Heck, Faber, and Merkwirth are analog art because all three references pertain to predicting molecular activities by using graph networks for molecular learning architectures.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Kearnes in view of Heck and Faber to incorporate Merkwirth’s using different neural network for different bond types (Merkwirth, supra). The modification uses multiple neural networks (feature nets) establishes a functional relationship between a graph and a property of interest (e.g., bonding) by using a dynamic evolution of the states of nodes in the graph of a combined neural network having the aforementioned multiple neural networks and a supervisor neural network (Merkwirth, p. 1159, left-hand column, § 1, ¶ 2: “The method we propose directly establishes a functional relationship between the molecular graph representing a molecule and the property of interest, for example, therapeutic activity.” p. 1159 right-hand column, § 2, ¶ 1: “The method introduces a new statistical model called the molecular graph network (MGN), which makes use of feature nets. In each feature net, the dynamic evolution of node states of the molecular graph is used in order to establish a functional relationship between the molecular graph and a scalar output value.”)
 
With respect to claim 7, Kearnes modified by Heck and Faber and Merkwirth teaches the method of claim 6, and Merkwirth further teaches:
wherein performing the second set of graph convolutions comprises utilizing a second plurality of neural networks, wherein weights for the first plurality of neural networks are shared with the second plurality of neural networks.  (Merkwirth, ¶ 3, right-hand column, p. 1160: “The weights governing the dynamic evolution of a feature net do not pertain to a specific position within the network; instead, element and bond type of the node determine which weights are taken from several common weight tables. The tables constitute the adjustable parameters of a feature net.” 
The examiner notes that Merkwirth’s “several common weight tables” that do not pertain to specific positions within Merkwirth’s neural network teach that Merkwirth determines which weights to use by referencing these common tables in identifying the appropriate weights and thus teaches sharing weights among multiple neural networks such as F1 through F4 in Figure 1. The examiner further notes that Merkwirth’s performing convolution with the aforementioned shared weights in its convolutions in the neural networks (e.g, F1 through F4 in Figure 1) teaches performing the second set of graph convolutions utilizing a second plurality of neural networks.)
It would have been obvious to one of ordinary skill in the art to combine the teachings of Merkwirth with Kearnes, Heck, and Faber for at least the reasons recited in claim 6. 
 
With respect to claim 8, Kearnes modified by Heck and Faber and Merkwirth teaches the method of claim 6, and Kearnes further teaches:
wherein performing the second set of graph convolutions comprises utilizing a second plurality of neural networks, wherein the neural networks of the second plurality of neural networks utilize the spatial distance data.  (Kearnes, ¶ 4, “Invariant-Preserving Operations”, left-hand column, p. 596: “Invariant-Preserving Operations” further teach four types of graph convolutions: AP in Eq. 4, PA in Eq. 5, PP in Eq. 3, AA in Eq. 2.  ¶ 5, RHS, Desired Invariants of A Model: “we will encode the graph distance (length of shortest path from one atom to the other) in the input pair layer; Table 3 “Atom Pair Features” include “Graph distance” “whether the shortest path between the atoms in the pair is less than or equal to that number of bonds”. 
The examiner notes that Kearnes’ networks (e.g., that perform graph convolutions for P  A (pair to atom) convolutional transformation teaches a second plurality of neural networks, and that Kearnes’ graph distance, when modified by Li’s generalized distance or Euclidean distance, between an atom of a ligand molecule and another atom of a target molecule teaches spatial distance data (see citations and rationale for claim 1, supra) between a pair of an atom of the ligand and another atom of a target molecule.  The examiner further notes that Kearnes’ performing the second set of graph convolutions, when modified by Merkwirth’s multiple neural networks (see citations and rationale for base claim 6, supra) teaches performing the second set of graph convolutions by utilizing a plurality of neural networks. The examiner thus notes that the aforementioned transformation receiving the pair attribute “graph distance” as an input for graph convolutions teaches the above limitations.)
  
Claim(s) 27 stand rejected under 35 U.S.C. § 103 as being anticipated by Kearnes in view of Heck and Faber and further in view of Zeiler et al. (“Stochastic Pooling for Regularization of Deep Convolutional Neural Networks”; hereinafter “Zeiler”).
 
With respect to claim 27, Kearnes modified by Heck and Faber teaches the method of claim 26, but does not appear to explicitly teach:
wherein the loss comprises a cross-entropy loss.  
Zeiler does, however, teach:
wherein the loss comprises a cross-entropy loss.  (Zeiler, p. 4, § 4.1, ¶ 1: “We compare our method to average and max pooling on a variety of image classiﬁcation tasks. In all experiments we use mini-batch gradient descent with momentum to optimize the cross entropy between our network’s prediction of the class and the ground truth labels.”)
Kearnes, Heck, Faber, and Zeiler are analog art because all three references pertain to training neural networks using gradient descent and backpropagation techniques.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Kearnes in view of Heck and Faber to further incorporate Zeiler’s use of cross-entropy loss (Zeiler, supra). The modification enables the determination of the difference (loss) between two probability distributions (e.g., between Kearnes’ predicted probability that a molecular binds to a specific target and the class probability in the training data) and further provides an explicit way to compute the weight adjustment for a parameter of interest at a specific time point by using the gradient of the cross-entropy loss function (For a given parameter x at time t the weight updates added to the parameters, Δxt are Δxt = 0.9Δxt-1 − e gt where gt is the gradient of the cost function with respect to that parameter at time t averaged over the batch and e is a learning rate set by hand.)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Fan et al. (US 10,923,214 B2) discloses using graph convolutions to predict molecular properties
Riley et al. (US 11,205,113 B2) discloses in Col 1 Lines 46-52: “Processing the order-invariant features through one or more neural network layers to generate a classification of the graph data. The graph data represents a molecule, and wherein each vertex in the graph is an atomic element in the molecule and each edge is a type of bond between two atomic elements in the molecule.”
Cang et al. (“TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions”), Bottom of Page 14, discloses:  “where d(, ) is Euclidean distance between two atoms and A() denotes the affiliation of an atom which is either a protein or a ligand.”
Artemenko (“Distance dependent scoring function for describing protein− ligand intermolecular interactions”), Page 570, discloses:  “dij is the distance between the ith atom in the ligand and the jth atom in the protein.”
Ballester (“Does a more precise chemical description of protein–ligand complexes lead to more accurate prediction of binding affinity?”), Page 949 Left Column, discloses:  “The strength of the interatomic interactions that collectively form the noncovalent intermolecular bond depends on the separation between the interacting atoms. Therefore, it is reasonable to think that partitioning the descriptors into a number of interatomic distance bins should lead to a model with more predictivity.
Keil (“Pattern recognition strategies for molecular surfaces: III. Binding site prediction with a neural network”), Page 781 Right Column Para 2, discloses:  “A point of the protein surface is defined as part of a binding site if another molecule is found in a distance less than the given cutoff of 1.5 Å.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126