DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/26/2018, 02/28/2019 and 12/28/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 10-18 are objected to because of the following informalities:  
Independent claims 10 recites “training a GCNN to classify structural relationships…”, since GCNN is the first time it is being recited at an independent claim, Examiner has suggested to write out the long form of what GCNN stands for. Appropriate correction is required.
Claims 11-15 are objected for being dependency of independent claim 10. 
Independent claims 17 recites “using a GCNN to determine a plurality of functional scores…”, since GCNN is the first time it is being recited at an independent claim, Examiner has suggested to write out the long form of what GCNN stands for. Appropriate correction is required.
Claim 18 is objected for being dependency of independent claim 17. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 17-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 17 and 18 both recites “functionally equivalent” and it is unclear how does one know what functions matter for equivalency. The term “functionally equivalent” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50-57(January 7, 2019) (“2019 PEG”). 

Claims 17-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50-57(January 7, 2019) (“2019 PEG”). 
Regarding claim 17
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:  The claim recites multiple mental processes, as explained below.  The claim recites, inter alia:
“A computer-implemented method for inferring functionality of a system: generating a knowledge graph representative of the system, where the knowledge graph comprises a plurality of subgraphs corresponding to subsystems of the system; using a …to determine a plurality of functional scores for the system, each functional score corresponding to one of a plurality of functional labels describing the subsystems; and based on the functional scores, identifying a functionally equivalent alternative subsystem for at least one of the subsystems of the system.” 
This limitation, under its broadest reasonable interpretation in light of the specification, a human could generate knowledge graphs and subgraphs using pen and paper or in his/her mind. In addition, a human could further determine scores based on the plurality labels using observation and evaluation method. A human could also identify functional subsystem using observation and evaluation method. 

Step 2A Prong 2: This judicial exception is not integrated into a practical. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “GCNN” as drafted, is reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible. 

Regarding claim 18
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:  The claim recites multiple mental processes, as explained below.  The claim recites, inter alia:
“presenting the functionally equivalent alternative subsystem to a user along with a measurement of how similar the functionally equivalent alternative subsystem is to the system.”
This limitation, under its broadest reasonable interpretation in light of the specification, a human could make a determination of how similar the functionality are equivalent to one another using observation and evaluation method. 
Step 2A Prong 2: The claim does not appear to recite additional elements that might integrate the judicial exception into a practical application.
	Based on the determination in Step 2A of the analysis that the claims are directed to a judicial exception, it must be determined if the claim contain any element or combination of elements sufficient to ensure that the claim amounts to significantly more than the judicial exception. In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claim does not include additional elements that are sufficient to amount of significantly more than the judicial exception. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 4 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Kipf et al. (“Semi-Supervised Classification with Graph Convolutional Networks”, hereinafter: Kipf).
Regarding claim 1
Hamilton teaches a computer-implemented method for learning structural relationships between nodes of a graph, (Figure 8 “This structure can be exploited by learning embeddings at various levels of the hierarchy, and only applying the regularization between layers that are in a parent-child relationship. C-E, Example application of multi-layer graph embedding to protein-protein interaction graphs derived from different brain tissues;”)
the method comprising: generating a knowledge graph comprising a plurality of nodes representing a system; (Figure 2 “Graph structure of the Zachary Karate Club social network, where nodes are connected if the corresponding individuals are friends. The nodes are colored according to the different communities that exist in the network. B, Twodimensional visualization of node embeddings generated from this graph using the DeepWalk method (Section 2.2.2) [46]. The distances between nodes in the embedding space reflect proximity in the original graph, and the node embeddings are spatially clustered according to the different color-coded communities.”)
…
wherein the GCNN comprises: a graph feature compression layer configured to learn a plurality of subgraphs representing embeddings of the nodes of the knowledge graph into a vector space, (pg. 3 section 1.1 “We also assume that the methods can make use of a real-valued matrix of node attributes X ∈ R m×|V| (e.g., representing text or metadata associated with nodes). The goal is to use the information contained in A and X to map each node, or a subgraph, to a vector z ∈ R d , where d << |V|. Most of the methods we review will optimize this mapping in an unsupervised manner, making use of only information in A and X, without knowledge of the downstream machine learning task. However, we will also discuss some approaches for supervised representation learning, where the models make use of classification or regression labels in order to optimize the embeddings.”)
a neighbor nodes aggregation layer configured to (i) derive a plurality of neighbor node feature vectors for each subgraph (pg. 4 section 2 “We begin with a discussion of methods for node embedding, where the goal is to encode nodes as low-dimensional vectors that summarize their graph position and the structure of their local graph neighborhood. These lowdimensional embeddings can be viewed as encoding, or projecting, nodes into a latent space, where geometric relations in this latent space correspond to interactions (e.g., edges) in the original graph [32]. Figure 2 visualizes an example embedding of the famous Zachary Karate Club social network [46], where two dimensional node embeddings capture the community structure implicit in the social network.”)
and (ii) aggregate the neighbor node feature vectors with their corresponding subgraphs to yield a plurality of aggregated subgraphs, (Figure 7 “Overview of the neighborhood aggregation methods. To generate an embedding for a node, these methods first collect the node’s k-hop neighborhood (occasionally sub-sampling the full neighborhood for efficiency). In the next step, these methods aggregate the attributes of node’s neighbors, using neural network aggregators. This aggregated neighborhood information is used to generate an embedding, which is then fed to the decoder.”)
a subgraph convolution layer configured to generate the plurality of feature vectors based on the aggregated subgraphs; (Figure 7 “Overview of the neighborhood aggregation methods. To generate an embedding for a node, these methods first collect the node’s k-hop neighborhood (occasionally sub-sampling the full neighborhood for efficiency). In the next step, these methods aggregate the attributes of node’s neighbors, using neural network aggregators.” Also see pg. 12 first paragraph “First, the node embeddings are initialized to be equal to the input node attributes. Then at each iteration of the encoder algorithm, nodes aggregate the embeddings of their neighbors, using an aggregation function that operates over sets of vectors. After this aggregation, every node is assigned a new embedding, equal to its aggregated neighborhood vector combined with its previous embedding from the last iteration.”)
and identifying functional groups of components included in the system based on the plurality of feature vectors (pg. 13 section 2.4 “Assume that we have a binary classification label, yi ∈ Z, associated with each node. To learn to map nodes to their labels, we can feed our embedding vectors, zi, through a logistic, or sigmoid, function yˆi = σ(z > i θ), where θ is a trainable parameter vector.”);
Hamilton does not teach applying a graph-based convolutional neural network (GCNN) to the knowledge graph to generate a plurality of feature vectors describing structural relationships between the nodes.
Kipf teaches applying a graph-based convolutional neural network (GCNN) to the knowledge graph to generate a plurality of feature vectors describing structural relationships between the nodes. (Figure 1 “Left: Schematic depiction of multi-layer Graph Convolutional Network (GCN) for semisupervised learning with C input channels and F feature maps in the output layer. The graph structure (edges shown as black lines) is shared over layers, labels are denoted by Yi.” Also see pg. 6 second paragraph “A knowledge graph is a set of entities connected with directed, labeled edges (relations). We follow the pre-processing scheme as described in Yang et al. (2016). We assign separate relation nodes r1 and r2 for each entity pair (e1, r, e2) as (e1, r1) and (e2, r2). Entity nodes are described by sparse feature vectors. We extend the number of features in NELL by assigning a unique one-hot representation for every relation node, effectively resulting in a 61,278-dim sparse feature vector per node. The semi-supervised task here considers the extreme case of only a single labeled example per class in the training set.”)
Hamilton and Kipf are analogous art because they are both directed to knowledge graph. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton to include graph-based convolutional neural network (GCNN) to the knowledge graph to generate a plurality of feature vectors of Kipf in order to accurately classify the relationship between feature vectors as disclosed by Kipf (pg. 6 second paragraph “A knowledge graph is a set of entities connected with directed, labeled edges (relations). We follow the pre-processing scheme as described in Yang et al. (2016). We assign separate relation nodes r1 and r2 for each entity pair (e1, r, e2) as (e1, r1) and (e2, r2). Entity nodes are described by sparse feature vectors. We extend the number of features in NELL by assigning a unique one-hot representation for every relation node, effectively resulting in a 61,278-dim sparse feature vector per node. The semi-supervised task here considers the extreme case of only a single labeled example per class in the training set.”).

Regarding claim 2
Hamilton in view Kipf teaches the method of claim 1. 
Hamilton further teaches wherein the embedding of each node is learned with respect to a plurality of contexts and each context corresponds to a neighborhood of nodes connected to the node in the knowledge graph. (Figure 7: Overview of the neighborhood aggregation methods. To generate an embedding for a node, these methods first collect the node’s k-hop neighborhood (occasionally sub-sampling the full neighborhood for efficiency). In the next step, these methods aggregate the attributes of node’s neighbors, using neural network aggregators.”)

Regarding claim 4
Hamilton in view Kipf teaches the method of claim 1. 
Hamilton further teaches wherein the plurality of neighbor node feature vectors are derived for each subgraph by: generating a neighbor feature matrix comprising a plurality of neighbor paths in the subgraph; (Figure 3 “. First the encoder maps the node, vi , to a low-dimensional vector embedding, zi , based on the node’s position in the graph, its local neighborhood structure, and/or its attributes.”)
and applying a 2D convolution operation with a trainable matrix (pg. 11 section 2.3.2 “Unlike the previously discussed methods, these neighborhood aggregation algorithms rely on node features or attributes (denoted xi ∈ R m) to generate embeddings. For example, a social network might have text data (e.g., profile information), or a protein-protein interaction network might have molecular markers associated with each node. The neighborhood aggregation methods leverage this attribute information to inform their embeddings. In cases where attribute data is not given, these methods can use simple graph statistics as attributes (e.g., node degrees) [28], or assign each node a one-hot indicator vector as an attribute [35, 52]. These methods are often called convolutional because they represent a node as a function of its surrounding neighborhood, in a manner similar to the receptive field of a center-surround convolutional kernel in computer vision”)
and a bias variable to extract the neighbor node feature vectors from the neighbor feature matrix. (Figure 5 “A, Illustration of how node2vec biases the random walk using the p and q parameters. Assuming that the walk just transitioned from vs to v∗, the edge labels, α, are proportional to the probability of the walk taking that edge at next time-step. B, Difference between random-walks that are based on breadth-first search (BFS) and depth-first search (DFS).”)

Regarding claim 6
Hamilton in view Kipf teaches the method of claim 1. 
Hamilton further teaches wherein the subgraph convolution layer determines the plurality of feature vectors based on the aggregated subgraphs using a process comprising: generating an adjacency matrix defining connections between the plurality of nodes; (section 1.1 “We will assume that the primary input to our representation learning algorithm is an undirected graph G = (V, E) with associated binary adjacency matrix, A. 2 We also assume that the methods can make use of a real-valued matrix of node attributes X ∈ R m×|V| (e.g., representing text or metadata associated with nodes). The goal is to use the information contained in A and X to map each node, or a subgraph, to a vector z ∈ R d , where d << |V|.” also see pg. 4 second paragraph “The Graph Factorization algorithm defines proximity directly based on the adjacency matrix (i.e., sG(vi , vj ) , Ai,j ) [1]; GraRep considers various powers of the adjacency matrix (e.g., sG(vi , vj ) , A2 i,j ) in order to capture higher order graph proximity”)
generating a feature matrix based on the aggregated subgraphs; (pg. 12 “First, the node embeddings are initialized to be equal to the input node attributes. Then at each iteration of the encoder algorithm, nodes aggregate the embeddings of their neighbors, using an aggregation function that operates over sets of vectors. After this aggregation, every node is assigned a new embedding, equal to its aggregated neighborhood vector combined with its previous embedding from the last iteration.”)
generating an attribute matrix as a composition of the adjacency matrix and the feature matrix; (section 1.1 “We will assume that the primary input to our representation learning algorithm is an undirected graph G = (V, E) with associated binary adjacency matrix, A. 2 We also assume that the methods can make use of a real-valued matrix of node attributes X ∈ R m×|V| (e.g., representing text or metadata associated with nodes). The goal is to use the information contained in A and X to map each node, or a subgraph, to a vector z ∈ R d , where d << |V|.”
determining the plurality of feature vectors by applying a convolution between the attribute matrix and a graph convolution kernel. (pg. 11 section 2.3.2 “In cases where attribute data is not given, these methods can use simple graph statistics as attributes (e.g., node degrees) [28], or assign each node a one-hot indicator vector as an attribute [35, 52]. These methods are often called convolutional because they represent a node as a function of its surrounding neighborhood, in a manner similar to the receptive field of a center-surround convolutional kernel in computer vision” also see pg. 17 section 3 “Representation learning on subgraphs is closely related to the design of graph kernels, which define a distance measure between subgraphs [57]. That said, we omit a detailed discussion of graph kernels, which is a large and rich research area of its own, and refer the reader to [57] for a detailed discussion. The methods we review differ from the traditional graph kernel literature primarily in that we seek to learn useful representations from data, rather than pre-specifying feature representations through a kernel function.”) 

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over 
Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Kipf et al. (“Semi-Supervised Classification with Graph Convolutional Networks”, hereinafter: Kipf) and further in view of Yu et al. (“Classifying Large Data Sets Using SVMs with Hierarchical Clusters”).
Regarding claim 3
Hamilton in view Kipf teaches the method of claim 2. 
Hamilton in view Kipf does not teach wherein the plurality of contexts are generated for each node by: identifying a set of nodes with a predetermined radius from the node in the knowledge graph generating the plurality of contexts around each node by randomly sampling the set of nodes a predetermined number of times.  
Yu teaches wherein the plurality of contexts are generated for each node by: identifying a set of nodes with a predetermined radius from the node in the knowledge graph (pg. 308 section 3.1.1 “A leaf entry, the entry in a leaf node, only has a % a without a child pointer. So, a leaf or a nonleaf node represents a cluster made up of all the subclusters represented by its entries. The threshold is a constraint for the leaf entries to satisfy such that the radius of an entry in a leaf node has to be less than t”… also see pg. 309 “We first train an SVM boundary function from the centroids of the root entries... Note that each entry (or cluster) Ei contains the CF information from which we can efficiently compute the center point % Ci and the radius Ri of the cluster.”)
generating the plurality of contexts around each node by randomly sampling the set of nodes a predetermined number of times. (Pg. 306 “Figure 1 shows the training time of an SVM on different numbers of training data randomly sampled from the original data set. From the graphs, we can infer that it would take years for an SVM to train a million data”)
Hamilton, Kipf and Yu are analogous art because they are all directed to knowledge graph. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton in view of Kipf to include classifying large data sets using SVM of Yu in order to effective cluster large data sets quickly as disclosed by Yu (abstract “This paper presents a new method, Clustering-Based SVM (CB-SVM), which is specifically designed for handling very large data sets. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples that carry the statistical summaries of the data such that the summaries maximize the benefit of learning the SVM. CB-SVM tries to generate the best SVM boundary for very large data sets given limited amount of resources. Our experiments on synthetic and real data sets show that CB-SVM is highly scalable for very large data sets while also generating high classification accuracy”).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over 
Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Kipf et al. (“Semi-Supervised Classification with Graph Convolutional Networks”, hereinafter: Kipf) and further in view of Adhikari et al. (“Distributed Representation of Subgraphs”).
Regarding claim 5
Hamilton in view Kipf teaches the method of claim 4. 
Hamilton in view Kipf does not teach wherein the neighbor feature matrix is generated for each subgraph by: deter mining all neighboring paths for the subgraph by identifying, for each vertex in the subgraph, a plurality of paths in the knowledge graph that include the vertex but no other vertices in the subgraph; randomly selecting a subset of all neighboring paths; and using each path in the subset as a row in the neighbor feature matrix.  
Adhikari teaches wherein the neighbor feature matrix is generated for each subgraph by: determining all neighboring paths for the subgraph by identifying, for each vertex in the subgraph, (pg. 1 right col “We are given a graphG(V, E) whereV is the vertex set, and E is the associated edge-set (we assume undirected graphs here, but our framework can be easily extended to directed graphs as well). We define gi(vi , ei) as a subgraph of G, where vi ⊆ V and ei ⊆ E. For simplicity, we write … As input we require a set of subgraphs S”)
a plurality of paths in the knowledge graph that include the vertex but no other vertices in the subgraph; (pg. 1 right col “We are given a graphG(V, E) whereV is the vertex set, and E is the associated edge-set (we assume undirected graphs here, but our framework can be easily extended to directed graphs as well). We define gi(vi , ei) as a subgraph of G, where vi ⊆ V and ei ⊆ E. For simplicity, we write … As input we require a set of subgraphs S={g1, g2,…. gn}”)
randomly selecting a subset of all neighboring paths; (pg. 3 right col section 3.3 “Given a set of subgraphs S={g1, g2,…. gn}, we generate neighborhood in each gi ∈ S by fixed length subgraph-truncated random walks. Specifically, for a subgraph gi, we choose a node v1 from nodes in gi uniformly at random. Next we generate a sequence of nodes v1,v2,v3 . . .vk to get a random walk of length k, where vj is a node chosen from the neighbors of node vj−1 uniformly at random. We repeat the process for each subgraph in S. Overlaps in the random walks of gi and gj serve as a metric for local proximity.”)
and using each path in the subset as a row in the neighbor feature matrix. (Pg. 3 section 3.1 “We define M as a d × |V 0 | node vector matrix, where each column is m(n) (the vector representation of nodes n ∈ V 0 ). Similarly, we define function f (gi) as the embedding function for subgraph gi, where f (gi) is a d-dimensional vector. We denote S as the subgraph matrix, where each column is f (дi) for all subgraphs in S.”)
Hamilton, Kipf and Adhikari are analogous art because they are all directed to knowledge graph. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton in view of Kipf to include randomly selecting a subset of neighboring paths of Adhikari in order to choose uniform nodes randomly to generate sequence of nodes as disclosed by Adhikari (pg. 3 right col section 3.3 “Given a set of subgraphs S={g1, g2,…. gn}, we generate neighborhood in each gi ∈ S by fixed length subgraph-truncated random walks. Specifically, for a subgraph gi, we choose a node v1 from nodes in gi uniformly at random. Next we generate a sequence of nodes v1,v2,v3 . . .vk to get a random walk of length k, where vj is a node chosen from the neighbors of node vj−1 uniformly at random. We repeat the process for each subgraph in S. Overlaps in the random walks of gi and gj serve as a metric for local proximity.”).

Claim(s) 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Kipf et al. (“Semi-Supervised Classification with Graph Convolutional Networks”, hereinafter: Kipf) and further in view of Badr et al. (US 2018/0324117 A1).
Regarding claim 7
Hamilton in view Kipf teaches the method of claim 1. 
Kipf further teaches wherein a plurality of labelled components are used in training the GCNN (pg. 6 “The datasets contain sparse bag-of-words feature vectors for each document and a list of citation links between documents. We treat the citation links as (undirected) edges and construct a binary, symmetric adjacency matrix A. Each document has a class label. For training, we only use 20 labels per class, but all feature vectors.”);
Hamilton in view Kipf does not teach and the plurality of feature vectors each provide a functional score indicating similarity to one of the labelled components.  
Badr teaches and the plurality of feature vectors each provide a functional score indicating similarity to one of the labelled components. (Para [0045] “Module 108 further includes scoring/ranking logic 208. Logic 208 is used to analyze multiple labels generated using logic 206 and, based on the analysis, generate respective confidence scores for each label of the multiple labels. Each label can be assigned confidence score that indicates a relevance of a particular word or text phrase (e.g., a label ) with regard to attributes, or extracted image features, of a received item of digital content” also see para [0053] “Module 110 can generate one or more conversational replies based on a similarity between image pixel data received from module 108 and at least one content item of the extracted chat content stored in database 212. In alternative implementations, module 110 can generate one or more conversational replies based on a similarity between at least one received label received from module 108 and at least one content item of the extracted chat content.”)
Hamilton, Kipf and Badr are analogous art because they are all directed to knowledge graph.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton in view of Kipf to include scoring labels based on similarity of Badr in order to accurately generate respective confidence score based on relevancy as disclosed by Badr (para [0045] “Module 108 further includes scoring/ranking logic 208. Logic 208 is used to analyze multiple labels generated using logic 206 and, based on the analysis, generate respective confidence scores for each label of the multiple labels. Each label can be assigned confidence score that indicates a relevance of a particular word or text phrase (e.g., a label ) with regard to attributes, or extracted image features, of a received item of digital content”)

Regarding claim 8
Hamilton in view Kipf with Badr teaches the method of claim 7. 
Badr further teaches wherein the functional groups of components included in the system are identified: identifying a subset of the feature vectors having a functional score above a threshold value; (para [0049] “Module 108 can generate multiple labels and can use logic 208 to rank each label based on a respective confidence score that is assigned to each label to form a subset of ranked labels. In some implementations, a subset of ranked labels can include at least two labels that have the highest confidence scores from among the respective confidence scores assigned to each of the multiple labels. In other implementations, a subset of ranked labels can include one or more labels having confidence scores that exceed a threshold confidence score.”)
and designating the labelled components corresponding to the subset of the feature vectors as the functional groups of components included in the system. (para [0049] “Module 108 can generate multiple labels and can use logic 208 to rank each label based on a respective confidence score that is assigned to each label to form a subset of ranked labels. In some implementations, a subset of ranked labels can include at least two labels that have the highest confidence scores from among the respective confidence scores assigned to each of the multiple labels. In other implementations, a subset of ranked labels can include one or more labels having confidence scores that exceed a threshold confidence score.”)  
Hamilton, Kipf and Badr are analogous art because they are all directed to knowledge graph.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton in view of Kipf to include scoring labels based on similarity of Badr in order to accurately generate respective confidence score based on relevancy as disclosed by Badr (para [0045] “Module 108 further includes scoring/ranking logic 208. Logic 208 is used to analyze multiple labels generated using logic 206 and, based on the analysis, generate respective confidence scores for each label of the multiple labels. Each label can be assigned confidence score that indicates a relevance of a particular word or text phrase (e.g., a label ) with regard to attributes, or extracted image features, of a received item of digital content”)

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over 
Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Kipf et al. (“Semi-Supervised Classification with Graph Convolutional Networks”, hereinafter: Kipf) and further in view of Banerjee et al. (“Generating Digital Twin models using Knowledge Graphs for Industrial Production Lines”).
Regarding claim 9
Hamilton in view Kipf teaches the method of claim 1. 
Hamilton further teaches wherein the knowledge graph comprises a plurality of subgraphs (pg. 3 section 1.1 “However, we will also discuss some approaches for supervised representation learning, where the models make use of classification or regression labels in order to optimize the embeddings. These classification labels may be associated with individual nodes or entire subgraphs and are the prediction targets for downstream machine learning tasks (e.g., they might label protein roles, or the therapeutic properties of a molecule, based on its graph representation).”); 
Hamilton in view Kipf does not teach …and each subgraph represents a digital twin of a subsystem of the system.  
Banerjee teaches …and each subgraph represents a digital twin of a subsystem of the system. (Abstract “In this paper we introduce a simple way of formalizing knowledge as digital twin models coming from sensors in industrial production lines. We present a way on to extract and infer knowledge from large scale production line data, and enhance manufacturing process management with reasoning capabilities, by introducing a semantic query mechanism.”)
Hamilton, Kipf and Banerjee are analogous art because they are all directed to knowledge graph.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Kipf in view of Hamilton to include a method and system for generating digital twin models using knowledge graphs of Banerjee in order to provide real-time capabilities for manufactures about complex system and reduce computational costs as disclosed by Banerjee (pg. 1 right col second paragraph “In this paper, on the virtual side, we have added numerous reasoning characteristics so that an entire production line can be tested for performance capabilities. Our lightweight model allows manufacturers to reason about complex system, including their physical behaviors, in real-time and with acceptable computational costs. With this, our model shifts the need of having domain experts who help create rules for production line management, to an artificially intelligent reasoning system which anybody with minimal technical know-how can use.”).

Claims 10 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Wang et al. (US Pat No. 10157333 B1). 
Regarding claim 10
Hamilton teaches a computer-implemented method for learning structural relationships between nodes of a graph, (Figure 8 “This structure can be exploited by learning embeddings at various levels of the hierarchy, and only applying the regularization between layers that are in a parent-child relationship. C-E, Example application of multi-layer graph embedding to protein-protein interaction graphs derived from different brain tissues;”)
the method comprising: receiving a knowledge graph comprising a plurality of subgraphs labeled with functional labels describing one or more systems, (Examiner notes that Algorithm 1 shows receiving graphs at the input stage see pg. 12 first paragraph “In the encoding phase, the neighborhood aggregation methods build up the representation for a node in an iterative, or recursive, fashion (see Algorithm 1 for pseudocode). First, the node embeddings are initialized to be equal to the input node attributes. Then at each iteration of the encoder algorithm, nodes aggregate the embeddings of their neighbors, using an aggregation function that operates over sets of vectors.”)
wherein each subgraph corresponds to a subsystem of at least one of the systems; (pg. 17 section 3 “We now turn to the task of representation learning on (sub)graphs, where the goal is to encode a set of nodes and edges into a low-dimensional vector embedding. More formally, the goal is to learn a continuous vector representation, zS ∈ R d , of an induced subgraph G[S] of the full graph G, where S ⊆ V. (Note that these methods can embed both subgraphs (S ⊂ V) as well as entire graphs (S = V).) The embedding, zS, can then be used to make predictions about the entire subgraph”)
training a GCNN to classify structural relationships between nodes of the subgraphs into the functional labels; (pg. 4 section 2.1 “In this framework, we organize the various methods around two key mapping functions: an encoder, which maps each node to a low-dimensional vector, or embedding, and a decoder, which decodes structural information about the graph from the learned embeddings (Figure 3)… that maps nodes to vector embeddings, zi ∈ Rd (where zi corresponds to the embedding for node vi ∈ V). The decoder is a function that accepts a set of node embeddings and decodes user-specified graph statistics from these embeddings. For example, the decoder might predict the existence of edges between nodes, given their embeddings [1, 35], or it might predict the community that a node belongs to in the graph [28, 34] (Figure 3).” Examiner notes that the classification labels are associated with each individual node as evidence by pg. 3 section 1.1)
receiving a new knowledge graph corresponding to a new system; (Examiner notes that since the process repeats the second time it starts over corresponds to feeding new knowledge graph as evidence by pg. 12 first paragraph “First, the node embeddings are initialized to be equal to the input node attributes. Then at each iteration of the encoder algorithm, nodes aggregate the embeddings of their neighbors, using an aggregation function that operates over sets of vectors. After this aggregation, every node is assigned a new embedding, equal to its aggregated neighborhood vector combined with its previous embedding from the last iteration. Finally, this combined embedding is fed through a dense neural network layer and the process repeats. As the process iterates, the node embeddings contain information aggregated from further and further reaches of the graph. However, the dimensionality of the embeddings remains constrained as the process iterates, so the encoder is forced to compress all the neighborhood information into a low dimensional vector.”)
Hamilton does not teach and using the GCNN to determine a plurality of functional scores for the new system, each function score corresponding to one of the functional labels.  
Wang teaches and using the …to determine a plurality of functional scores for the new system, (col 17 lines 32-42 “During training time (e.g. generation of values on a system with significant resources), the last layer serves as a fully-connected layer to produce classification scores. Following training of the DCNN, this fully connected layer is converted to as a convolutional layer. In this way, each convolutional kernel produce a prediction score map for an image category. In such embodiments, the generated framework is capable of obtaining dense sub-window recognition scores by applying convolutional kernels layer by layer to the whole image (instead of cropped sub-windows).”)
each function score corresponding to one of the functional labels. (Col 14 lines 42-51 “A set of visual tags 708 are then assigned to the image data 702 based on output values from the DCNN 704. Visual tags 708 include output values for particular items that are part of the DCNN prior training, including tree with a value of 0.518, grass with a value of 0.434, table with a value of 0.309, and basketball court with a value of 0.309. These values are presented for illustrative purposes, and it is to be understood that different values and items may be used in different embodiments. Visual tags 708 includes items for which the output scores exceed a threshold (e.g. 0.3).”)
Hamilton and Wang are analogous art because they are both directed to knowledge graph. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton to include determining scores for each functioning labels of Wang in order to accurately classify visual tags with output scores as disclosed by Wang (col 14 lines 42-51 “A set of visual tags 708 are then assigned to the image data 702 based on output values from the DCNN 704. Visual tags 708 include output values for particular items that are part of the DCNN prior training, including tree with a value of 0.518, grass with a value of 0.434, table with a value of 0.309, and basketball court with a value of 0.309. These values are presented for illustrative purposes, and it is to be understood that different values and items may be used in different embodiments. Visual tags 708 includes items for which the output scores exceed a threshold (e.g. 0.3).”)

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over 
Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Wang et al. (US Pat No. 10157333 B1) and further in view of Banerjee et al. (“Generating Digital Twin models using Knowledge Graphs for Industrial Production Lines”).
Regarding claim 11
Hamilton in view Wang teaches claim 10. 
Hamilton further teaches wherein the knowledge graph comprises a plurality of subgraphs (pg. 3 section 1.1 “However, we will also discuss some approaches for supervised representation learning, where the models make use of classification or regression labels in order to optimize the embeddings. These classification labels may be associated with individual nodes or entire subgraphs and are the prediction targets for downstream machine learning tasks (e.g., they might label protein roles, or the therapeutic properties of a molecule, based on its graph representation).”); 
Hamilton in view Wang does not teach …and each subgraph represents a digital twin of a subsystem of the system.  
Banerjee teaches …and each subgraph represents a digital twin of a subsystem of the system. (Abstract “In this paper we introduce a simple way of formalizing knowledge as digital twin models coming from sensors in industrial production lines. We present a way on to extract and infer knowledge from large scale production line data, and enhance manufacturing process management with reasoning capabilities, by introducing a semantic query mechanism.”)
Hamilton, Wang and Banerjee are analogous art because they are all directed to knowledge graph.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton in view of Wang to include a method and system for generating digital twin models using knowledge graphs of Banerjee in order to provide real-time capabilities for manufactures about complex system and reduce computational costs as disclosed by Banerjee (pg. 1 right col second paragraph “In this paper, on the virtual side, we have added numerous reasoning characteristics so that an entire production line can be tested for performance capabilities. Our lightweight model allows manufacturers to reason about complex system, including their physical behaviors, in real-time and with acceptable computational costs. With this, our model shifts the need of having domain experts who help create rules for production line management, to an artificially intelligent reasoning system which anybody with minimal technical know-how can use.”).

Claims 12-13 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Wang et al. (US Pat No. 10157333 B1) and further in view of Badr et al. (US 2018/0324117 A1).
Regarding claim 12
Hamilton in view Wang teaches claim 10. 
Hamilton in view Wang does not teach wherein the functional labels are generated by: extracting the functional labels from one or more domain-specific models corresponding to the system.  
Badr teaches wherein the functional labels are generated by: extracting the functional labels from one or more domain-specific models corresponding to the system. (Para [0050] “As noted above, each respective confidence score indicates a relevance of a particular label to an attribute or extracted image feature of the item of digital content. Module 108 can select at least one label based on a confidence score of the at least one label exceeding a threshold confidence score. Module 108 can provide the selected at least one label to one or more of modules 110, 112, and 114. Alternatively, module 108 can select at least one label, of the subset of ranked labels, and provide the selected at least one label to one or more of modules 110, 112, and 114.”)
Hamilton, Wang and Badr are analogous art because they are all directed to knowledge graph.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton in view of Wang to include extracting labels feature of Badr in order to accurately select labels within confidence score as disclosed by Badr (para [0050] “As noted above, each respective confidence score indicates a relevance of a particular label to an attribute or extracted image feature of the item of digital content. Module 108 can select at least one label based on a confidence score of the at least one label exceeding a threshold confidence score. Module 108 can provide the selected at least one label to one or more of modules 110, 112, and 114. Alternatively, module 108 can select at least one label, of the subset of ranked labels, and provide the selected at least one label to one or more of modules 110, 112, and 114.”).

Regarding claim 13
Hamilton in view Wang with Badr teaches claim 12. 
Hamilton further teaches wherein the GCNN comprises a graph feature compression layer configured to apply an embedding to the knowledge graph using structural relationships between nodes in the knowledge graph as context to yield an embedded graph in a vector space. (Pg. 3 section 1.1 “We also assume that the methods can make use of a real-valued matrix of node attributes X ∈ R m×|V| (e.g., representing text or metadata associated with nodes). The goal is to use the information contained in A and X to map each node, or a subgraph, to a vector z ∈ R d , where d << |V|. Most of the methods we review will optimize this mapping in an unsupervised manner, making use of only information in A and X, without knowledge of the downstream machine learning task. However, we will also discuss some approaches for supervised representation learning, where the models make use of classification or regression labels in order to optimize the embeddings.”)

Regarding claim 15 
Hamilton in view Wang with Badr teaches claim 12. 
Hamilton further teaches wherein the GCNN comprises a neighbor nodes aggregation layer configured to (i) derive a plurality of neighbor node feature vectors for subgraphs of the embedded graph (pg. 4 section 2 “We begin with a discussion of methods for node embedding, where the goal is to encode nodes as low-dimensional vectors that summarize their graph position and the structure of their local graph neighborhood. These lowdimensional embeddings can be viewed as encoding, or projecting, nodes into a latent space, where geometric relations in this latent space correspond to interactions (e.g., edges) in the original graph [32]. Figure 2 visualizes an example embedding of the famous Zachary Karate Club social network [46], where two dimensional node embeddings capture the community structure implicit in the social network.”)
and (ii) aggregate the neighbor node feature vectors with their corresponding subgraphs of the embedded graph to yield a plurality of aggregated subgraphs. (Figure 7 “Overview of the neighborhood aggregation methods. To generate an embedding for a node, these methods first collect the node’s k-hop neighborhood (occasionally sub-sampling the full neighborhood for efficiency). In the next step, these methods aggregate the attributes of node’s neighbors, using neural network aggregators. This aggregated neighborhood information is used to generate an embedding, which is then fed to the decoder.”)

Claims 14 and 16 is rejected under 35 U.S.C. 103 as being unpatentable over 
Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Wang et al. in view of Badr et al. and further in view of Yu et al. (“Classifying Large Data Sets Using SVMs with Hierarchical Clusters”).
Regarding claim 14
Hamilton in view Wang with Badr teaches claim 12. 
Hamilton in view Wang with Badr does not teach wherein the structural relationships are generated for each node by: identifying a set of nodes with a predetermined radius from the node in the knowledge graph generating the structural relationships for each node by randomly sampling the set of nodes a predetermined number of times. 
Yu teaches wherein the structural relationships are generated for each node by: identifying a set of nodes with a predetermined radius from the node in the knowledge graph (pg. 308 section 3.1.1 “A leaf entry, the entry in a leaf node, only has a % a without a child pointer. So, a leaf or a nonleaf node represents a cluster made up of all the subclusters represented by its entries. The threshold is a constraint for the leaf entries to satisfy such that the radius of an entry in a leaf node has to be less than t”… also see pg. 309 “We first train an SVM boundary function from the centroids of the root entries... Note that each entry (or cluster) Ei contains the CF information from which we can efficiently compute the center point % Ci and the radius Ri of the cluster.”)
generating the plurality of contexts around each node by randomly sampling the set of nodes a predetermined number of times. (Pg. 306 “Figure 1 shows the training time of an SVM on different numbers of training data randomly sampled from the original data set. From the graphs, we can infer that it would take years for an SVM to train a million data”)
Hamilton, Wang, Badr and Yu are analogous art because they are all directed to knowledge graph. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton in view of Wang with Badr to include random sampling of Yu in order to effective cluster large data sets quickly as disclosed by Yu (abstract “This paper presents a new method, Clustering-Based SVM (CB-SVM), which is specifically designed for handling very large data sets. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples that carry the statistical summaries of the data such that the summaries maximize the benefit of learning the SVM. CB-SVM tries to generate the best SVM boundary for very large data sets given limited amount of resources. Our experiments on synthetic and real data sets show that CB-SVM is highly scalable for very large data sets while also generating high classification accuracy”).

Regarding claim 16
Hamilton in view Wang with Badr and Yu teaches claim 14. 
Hamilton further teaches wherein the GCNN further comprises a subgraph convolution layer configured to determine the plurality of …based on the aggregated subgraphs. (Pg. 12 first paragraph “In the encoding phase, the neighborhood aggregation methods build up the representation for a node in an iterative, or recursive, fashion (see Algorithm 1 for pseudocode). First, the node embeddings are initialized to be equal to the input node attributes. Then at each iteration of the encoder algorithm, nodes aggregate the embeddings of their neighbors, using an aggregation function that operates over sets of vectors.”)
Wang further teaches …determine the plurality of functional scores. (Col 17 lines 32-42 “During training time (e.g. generation of values on a system with significant resources), the last layer serves as a fully-connected layer to produce classification scores. Following training of the DCNN, this fully connected layer is converted to as a convolutional layer. In this way, each convolutional kernel produce a prediction score map for an image category. In such embodiments, the generated framework is capable of obtaining dense sub-window recognition scores by applying convolutional kernels layer by layer to the whole image (instead of cropped sub-windows).”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton in view of Badr with Yu to include determining scores for each functioning labels of Wang in order to accurately classify visual tags with output scores as disclosed by Wang (col 14 lines 42-51 “A set of visual tags 708 are then assigned to the image data 702 based on output values from the DCNN 704. Visual tags 708 include output values for particular items that are part of the DCNN prior training, including tree with a value of 0.518, grass with a value of 0.434, table with a value of 0.309, and basketball court with a value of 0.309. These values are presented for illustrative purposes, and it is to be understood that different values and items may be used in different embodiments. Visual tags 708 includes items for which the output scores exceed a threshold (e.g. 0.3).”)

Claims 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hamilton et al. (“Representation Learning on Graphs: Methods and Applications”) in view of Badr et al. (US 2018/0324117 A1). 
Regarding claim 17
Hamilton teaches a computer-implemented method for inferring functionality of a system: generating a knowledge graph representative of the system, (Figure 2 “Graph structure of the Zachary Karate Club social network, where nodes are connected if the corresponding individuals are friends. The nodes are colored according to the different communities that exist in the network. B, Twodimensional visualization of node embeddings generated from this graph using the DeepWalk method (Section 2.2.2) [46]. The distances between nodes in the embedding space reflect proximity in the original graph, and the node embeddings are spatially clustered according to the different color-coded communities.”)
where the knowledge graph comprises a plurality of subgraphs corresponding to subsystems of the system; (pg. 3 section 1.1 “We also assume that the methods can make use of a real-valued matrix of node attributes X ∈ R m×|V| (e.g., representing text or metadata associated with nodes). The goal is to use the information contained in A and X to map each node, or a subgraph, to a vector z ∈ R d , where d << |V|. Most of the methods we review will optimize this mapping in an unsupervised manner, making use of only information in A and X, without knowledge of the downstream machine learning task.” Also see section 3 “We now turn to the task of representation learning on (sub)graphs, where the goal is to encode a set of nodes and edges into a low-dimensional vector embedding. More formally, the goal is to learn a continuous vector representation, zS ∈ R d , of an induced subgraph G[S] of the full graph G, where S ⊆ V.”)
Hamilton does not teach using a …to determine a plurality of functional scores for the system, each functional score corresponding to one of a plurality of functional labels describing the subsystems; and based on the functional scores, identifying a functionally equivalent alternative subsystem for at least one of the subsystems of the system. 
Badr teaches using a …to determine a plurality of functional scores for the system, (para [0045] “Module 108 further includes scoring / ranking logic 208. Logic 208 is used to analyze multiple labels generated using logic 206 and, based on the analysis, generate respective confidence scores for each label of the multiple labels. Each label can be assigned confidence score that indicates a relevance of a particular word or text phrase (e.g., a label) with regard to attributes, or extracted image features, of a received item of digital content”) 
each functional score corresponding to one of a plurality of functional labels describing the subsystems; (para [0046] “In some implementations, labels that are more definitive or descriptive of particular attributes or extracted image features of an item of digital content may be assigned a higher confidence score relative to labels that more generic. For example, referencing the above extracted features for the Eiffel tower and the dog, descriptive labels such as “Eiffel” or “Eiffel tower” may receive higher confidence scores when compared to more generic labels such as "tower” or “Paris.””)
and based on the functional scores, identifying a functionally equivalent alternative subsystem for at least one of the subsystems of the system. (Para [057] “Module 110 can use scoring/ranking logic 214 to determine at least one similarity score that indicates a similarity between : i) image pixel data of an item of digital content; and ii) at least one content item of chat content extracted from an electronic conversation.”)
Hamilton and Badr are analogous art because they are both directed to knowledge graph.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton to include determining  plurality of functional scores of Badr in order to accurately select labels within confidence score as disclosed by Badr (para [0050] “As noted above, each respective confidence score indicates a relevance of a particular label to an attribute or extracted image feature of the item of digital content. Module 108 can select at least one label based on a confidence score of the at least one label exceeding a threshold confidence score. Module 108 can provide the selected at least one label to one or more of modules 110, 112, and 114. Alternatively, module 108 can select at least one label, of the subset of ranked labels, and provide the selected at least one label to one or more of modules 110, 112, and 114.”).

Regarding claim 18
Hamilton in view of Badr he method of claim 17. 
Badr further teaches the method presenting the functionally equivalent alternative subsystem to a user along with a measurement of how similar the functionally equivalent alternative subsystem is to the system. (Para [057] “Module 110 can use scoring/ranking logic 214 to determine at least one similarity score that indicates a similarity between : i) image pixel data of an item of digital content; and ii) at least one content item of chat content extracted from an electronic conversation.”)
Hamilton and Badr are analogous art because they are both directed to knowledge graph.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for learning graphs with graph convolutional network of Hamilton to include determining  plurality of functional scores of Badr in order to accurately select labels within confidence score as disclosed by Badr (para [0050] “As noted above, each respective confidence score indicates a relevance of a particular label to an attribute or extracted image feature of the item of digital content. Module 108 can select at least one label based on a confidence score of the at least one label exceeding a threshold confidence score. Module 108 can provide the selected at least one label to one or more of modules 110, 112, and 114. Alternatively, module 108 can select at least one label, of the subset of ranked labels, and provide the selected at least one label to one or more of modules 110, 112, and 114.”).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.M./Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126