Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action

1.	The Examiner acknowledges the applicant’s amendment filed July 5, 2022.  At this point claims 1-20 are pending in the instant application and ready for examination by the Examiner.

Response to Arguments
2.	Applicant’s arguments filed on 7/5/2022 for claims 1-20 have been fully considered but are not persuasive.

3.	Applicant’s argument:
Neither HE nor Duan disclose, teach or suggest candidate generation, involving creating graphs relating to specified information with information fields from the plurality of datasets as nodes, which is discussed below. In addition, there are several details that are also distinguishable over the cited prior art.

Examiner’s answer:
The He reference is almost word for word. The pre-processing step (or candidate graph generation): in candidate graph generation, a smaller graph containing the source and sink vertices is created based on heuristics (e.g. node degree). (He, p121)

4.	Applicant’s argument:
HE discloses a diversity subgraph extraction algorithm to provide a meso view of interdisciplinary collaborative relationships to focus on meaningful collaboration patterns without getting lost in huge graphs. HE’s method of diversity path mining and electrical network computation based on Ohm’s Law has a significantly different foundation than that of the presently claimed invention. HE’s graph is a homogeneous graph where each node represents an author. It is a collaboration network, and therefore an undirected graph. The diversity path mining in HE’s approach is based on the topic difference of nodes in a path. In addition, Ohm’s law in HE’s publication is based on the frequency of collaboration or the similarity of the topic of the nodes.

In distinction, in the presently claimed invention, the graph is a heterogeneous and directed graph by virtue of the diverse subgraph generation step and module, wherein selected paths that carry the larger amount of current have more new nodes generated in an iterative process, wherein each iteration, the path that scores the highest marginal current per number of existing types of nodes is selected and added to the diverse subgraph.

Examiner’s answer:
There is no mention of a heterogeneous being claimed. There is no mention of He having a homogeneous foundation. Applicant’s argument is merely his opinion. 

5.	Applicant’s argument:
Additionally and specifically relating to Claims 10 and 20, the diversity path mining in the claimed heterogeneous and directed graph is based on node types (e.g., whether a node is a drug or a disease) and edge semantics (e.g., whether the edge is a binding relationship or a protein-protein interaction relationship)

Furthermore and specifically relating to Claims 6 and 16, such a graph is directed, wherein the diversity has to calculated based on the direction of the path and is maximized.

Examiner’s answer:
Applicant makes no argument. 

6.	Applicant’s argument:
Duan fails to address the deficiencies of HE. Duan discloses keyword searches over RDF graphs and uses graph partition to speed up the process to deal with large scale RDF triples. Duan’s disclosure is based on graph theory and graph homomorphisms to partition a large RDF triple set into a semantically similar and edge disjoint subgraphs. The presently claimed candidate generation is different and patently distinguishable from the RDF triples disclosed in Duan. First, the subgraphs generated based on Duan’s method use graph homomorphisms to partition a large graph into subgraphs to guarantee graph homomorphism among different subgraphs, wherein the presently claimed invention uses adjusted shortest path algorithms by considering semantic to generate subgraphs. Second, Duan teaches dividing the large graph into smaller but similar subgraphs to enable parallel search of all the small subgraphs at the same time, in contrast to the claimed generation of subgraphs by capturing the semantic importance of two nodes and considering the node types of connected nodes and edge semantics in the subgraph. 

Examiner’s answer:
There is no mention of triples within claims 7 and 17. There is no mention of triples within the cited art for claims 7 and 17. There is no mention of homomorphisms within Duan. There is no mention of semantic importance within the claimed invention. There is no mention of node type within the clamed invention. There is no mention of connected nodes or edge semantics within the claimed invention. 

7.	Applicant’s argument:
Third, Duan’s RDF triples are not a graph because 1) RDF triples have dangling nodes which have no connections at all (such as object nodes with the data type properties, or object nodes which are a number or a literal), while the claimed diverse subgraph generation inherently does not have the dangling nodes; 2) RDF triples use owl:sameAs or rdfs:seeAlso to connect the subjects sharing the same URLs, but do not merge these subjects into one node; while the claimed subgraph generation does not use owl:sameAs or rdfs:seeAlso to create new edges among similar subjects rather merges similar subjects into one single node; and 3) graph-based machine learning methods (such as Page Rank) cannot run on RDF triples, but can run on graphs constructed according to the Claims of the present application. Therefore, the presently claimed invention has dramatic differences in terms of both methods and datasets distinguishing over HE and Duan, and HE and Duan in any combination with the cited prior art.

Examiner’s answer:
There is no mention of dangling nodes within Duan. This appears to be the applicant’s opinion. ‘Owl’ of ‘rdf’ is not mentioned within Duan nor the claims nor within the Office Action. ‘Pagerank’ is not mentioned within the Office Action. The examiner disagrees with the statement, ‘(such as Page Rank) cannot run on RDF triples.’ Applicant’s argument is based on basic statements which are not within the specification nor in the office action. Thus the conclusion has no support. 

8.	Applicant’s argument:
With regard to Claims 1 and 11, HE’s method creates scientific collaboration graphs with the topic vector as the attribute of a given node which is completely different from the semantic analysis using a heterogeneous graph as described in the present specification. 

Examiner’s answer:
This is not a valid argument. 

9.	Applicant’s argument:
 HE’s graph is a homogeneous graph directly generated from one publication database and does not deal with the integration of several different datasets. 

Examiner’s answer:
There is no mention within Duan containing homogeneous graph information. There is no mentioned within the claimed invention of ‘several different databases’ within the claimed invention. 

10.	Applicant’s argument:
The candidate graph mentioned in HE’s paper is the subgraph extracted from the one big co-author graph, while the disclosed method and system integrate several large databases that describe compound (e.g., CREMBL, PubChem), gene (e.g., Gene Ontology), disease (e.g., OMIM), drug (e.g., drugbank), and pathway (e.g. uniprot) relationships into one big graph. The disclosed candidate graphs or diversity graphs are generated based on the semantic approach proposed in present disclosure to capture the semantics of nodes and heterogenous path patterns, which might look similar to the Ohm’s law and electronic network methods mentioned in HE’s publication, but which is fundamentally different.The disclosed and claimed subgraphs are not based on a topic vector of nodes to calculated the similarity (or KL divergence), instead are calculated on the importance of a subgraph by capturing the semantics of nodes and edges (e.g., node types and edge types), 

Examiner’s answer:
‘The importance of a subgraph by capturing the semantics of nodes and edges’ is not with the specification. This applicant’s argument has no support. 

11.	Applicant’s argument:
Heterogeneous path patterns which have biological meanings (such as, if protein A and protein B are on the same pathway, protein A binds compound C, then protein B might bind compound C). The disclosed semantic structures and heterogeneous path patterns are the basis upon which construct the candidate graph and calculate the importance of the candidate graph, which is completely different from HE’s method. 

Examiner’s answer:
‘Heterogeneous path patterns’ is not within the specification. 

12.	Applicant’s argument:
Furthermore, the disclosed system, method, and software do not require the specific type of storage system as that disclosed in Duan, definitely not non-transient memory, as our proposed approach does not deal with graph partitions and may be computed on one single machine without partitioning our graph. 

Examiner’s answer:
Claim 1 recites non-transitory memory and Duan has this limitation. 

13.	Applicant’s argument:
While Duan’s disclosure deals with the multiple machines or storage systems and requires them to partition a large RDF triple set into several subgraphs which share graph monomorphism, this is definitely not the case in the disclosed invention. Machine learning methods may be applied to the integrated graphs of the presently disclosed invention directly on one single machine without the need to partition the graph into several subgraphs sharing similar graph monomorphism. Therefore, one of ordinary skill in the art (e.g., semantic web technologies in terms of RDF triples, or Ohm’s law and electronic networks) would not find the presently disclosed and claimed invention obvious over the disclosures of HE and Duan. The disclosed and claimed systems approach of including an integrated dataset (e.g., a graph), a representation (e.g., semantic representation) to capture semantics of nodes and edges and heterogeneous patterns, and a machine learning method to predict potential new connections in the disclosed and claimed integrated graph. This systems approach would not be obvious over HE and Duan even considering one of ordinary skill with related domain knowledge in semantic web technology, drug discovery, and machine learning. 

Examiner’s answer:
The argument above is based on the applicant’s opinion and therefore the argument has no support. 

14.	Applicant’s argument:
With regard to Claims 2 and 12, RDF (Resource Description Framework) is a common data representation model which is widely used in academia and industry. The data model for RDF is a graph which RDF triple (subject, predicate, object) forms a graph in which subject and object are two nodes connected by a predicate. RDF is only one kind of graph data representations. In HE’s publication, RDF was not the data representation model for HE’s co- author graph. HE just uses the normal graph data model (e.g., nodes and edges) to represent the co-author graph. Duan’s invention is entirely based upon RDF triples and RDF is the data representation model. The disclosed invention uses RDF as the data representation model to integrate diverse datasets which are stored in relational databases, Excel sheets, XML datasets, or even text. It uses RDF as the common data model to integrate these data with different storage formats to generate the integrated graph, which is different and patently distinguishable from HE’s publication, where RDF was not used to represent one big graph (no effort was made to integrate different datasets as He’s publication does not deal with data integration) and Duan’s data is in RDF already and Duan’s invention does not deal with data integration using RDF, rather than on how to enable keyword search on a large scale RDF triple set using graph partitions. A person with ordinary skill with this art armed with HE’s publication and Duan’s disclosure would not find the presently disclosed and claimed invention obvious because neither HE nor Duan deal with complicated data integration issues using RDF as a data representation method in.

Examiner’s answer:
He uses triples and Duan uses RDF which is a type of triple. These are within the same domain and the examiner sees no problem for a reason to combine. There is no mention of ‘complicated data integration’ within the specification. Applicant’s arguments have no support. 

15.	Applicant’s argument:
With regard to Claims 3 and 13, RDF is a common data model which employs a triple comprising a subject, a predicate, and an object. RDF is a W3C standard (https://www.w3.org/RDF/) and widely used in many applications in academia and industry, and which do not need Duan’s invention to in order to utilize RDF. RDF was published as a standard back to 2004 and anyone can use RDF to represent data. In the disclosed invention, RDF is not used to represent datasets, rather RDF is used as a common data representation model to integrate datasets stored in different formats, such as relational databases, XML databases, Excel sheets, and text files. Converting these datasets with different storage formats into RDF format is not discussed in either HE’s publication or Duan’s disclosure. Furthermore, unlike Duan’s RDF triples, which are Linked Open Data (LOD), the disclosed integrated graph goes beyond LOD to create a real graph. LOD is not real graph because the same subject can have several different nodes in the LOD dataset which can generate severe problems when applying machine learning methods (such as Page Rank). The disclosure of the present invention discloses merging all related subject nodes into one node, such as one compound with different IDs from different databases (e.g., CheBML, PubChem) into one node to represent this one compound, while in Duan RDF triples one compound can have several subject nodes with different URLs and connected owl:sameAs or rdfs:seeAlso edges.

Examiner’s answer:
Applicant’s arguments do not pertain to the claimed invention. They make statements such as ‘…RDF is not used to represent datasets, rather RDF is used as a common data representation model to integrate datasets stored in different formats….’ This does not address the claimed invention nor the cited art. RDF is not mentioned by either the claims or the cited art for these claims. 

16.	Applicant’s argument:
With regard to Claim 4, HE’s approach of creating subgraphs using source and sink vertices is fundamentally different from the subgraph generation of the disclosed invention. HE’s graph is also a homogenous graph where both source and sink nodes are authors (such as maps to John and Mary) with the topic vectors as their node properties. In constrast, the subgraph of the disclosed invention is a heterogeneous subgraph in which nodes have their types (such as drugs, genes, or diseases) and edges have their types (such as protein-protein interaction, or binding relationships). Such disclosed subgraphs are generated based on the disclosed semantic approach by capturing node and edge semantics and heterogenous path patterns between the source and sink vertices.

Examiner’s answer:
The reference He makes no statement about being a homogenous graph. Source and sink nodes are not mentioned within the invention. There is no mention of sink types within claim 4. ‘Node’, ‘edge’, ‘semantics’, ‘source’ and ‘sink’ are not even within the same paragraph of the specification. 

17.	Applicant’s argument:
With regard to Claims 4 and 14, Figure 4 of the present application shows the examples of semantic definitions which are different from the Table 1 in HE’s publication. The nodes in paths in Table 1 have the same type: author. All the nodes are authors, while in Figure 4 of the present disclosure, the nodes in the top ranked patterns have different types. Some of them may be drugs, some be genes, and so on. Second, the way these path patterns are ranked in Figure 4 of the present disclosure, is different from that shown in Table 1 of HE’s publication. The calculated semantic similarity of adjacent nodes in the path is used, and also the edge semantics based on the predefined path patterns are considered and which may capture the semantic and topological features of the paths.

Examiner’s answer:
Claim 4 recites…
…wherein the candidate generation step further comprises creating smaller graphs containing source and sink vertices based on heuristic values.
There is no mention of types of graphs. There is no outline or concept within specification of what constitutes a ‘same type’ as opposed to a ‘different type.’ This subject matter is not even disclosed within the claim. ‘Semantic similarity’ is not even mentioned within the specification. 

18.	Applicant’s argument:
In regards to Claims 5 and 15, Figure 4 of the present disclosure shows the examples of semantic associations which are different from the Table 1 in HE’s publication. The nodes in paths in HE’s Table 1 have the same type: author. All the nodes are authors, while in Figure 4 of the present disclosure, the nodes in the top ranked patterns have different types. Some of them may be drugs, some genes, and so on. Second, the path patterns in Figure 4 of the present disclosure is different from that shown in Table 1 of HE’s publication. The semantic properties are used to calculate the semantic similarity of adjacent nodes in the path, and are also used to consider the edge semantics based on the predefined path patterns which may capture the semantic and topological features of the paths.

Examiner’s answer:
There is no outline or concept within specification of what constitutes a ‘same type’ as opposed to a ‘different type.’ There is no mention that ‘authors’ have to be different types. ‘Semantic properties’ is not even mentioned within the specification. 

19.	Applicant’s argument:
With regard to Claims 6 and 16, the diversity function may be different in different situations. HE’s disclosure assumes to understand the collaboration between two authors based on a homogeneous network. Contrasted against the disclosed invention which is designed to predict the new binding relationship between a drug node and a gene/protein node) based on a heterogeneous network. Figure 7 of the present disclosure clearly shows the difference between the diversity graph of HE’s disclosure and the diversity graph of the present disclosure. The diversity of the subgraph in the present disclosure has to deal with the biological meaning of a path. For example, in Figure 7 of the present disclosure, the subgraph between Troglitazone and PPARG is used to predict the potential binding edge between Troglitazone and PPARG, along with the semantic meaning of surrounding paths between Troglitazon and PPARG. For example, if Troglitazon and Rosiglitazone are under the same super class of the chemical ontology, and Rosiglitazone binds to PPARG, the disclosure of the invention suggests that one infer that Troglitazone might bind PPARG, because this path will have higher weights to contribute to the link predication between Troglitazone and PPARG. Another example is that Troglitazone binds ACSL4 and ACSL4 and PPARG are in the same PPAR signaling pathway, and also allowing for the inference that Troglitazone might bind PPARG.

Examiner’s answer:
There is no mention of a ‘diversity function’ in these claims. ‘May be different’ means it also may not be different. ‘Binding relationship’ is not mentioned with the specification. The claims recite, ‘selects a subset of the graphs which maximize a diversity function.’ These is no mention of a requirement of being a subgraph having a ‘biological meaning.’ 

20.	Applicant’s argument:
With regard to Claims 7 and 17, HE’s diversity subgraph is based on Faloutsos’ 2004 paper about the heuristics of nodes and edges in the graph. But both HE’s graph and Faloutsos’ graph are homogenous graph and they only have one node type, for example, HE’s group only has the author as the node type. The edge semantics are also limited, in HE’s graph, the edge semantics are only co-author. HE uses the topic as the node property and topics can form an ontology. But this kind of ontology is formed by the topics automatically extracted from the text. This is in contrast to the disclosed invention, wherein topics are manually curated by the domain experts, namely using ontologies about compounds and diseases. Given the edge semantics of HE’s graph, the diversity subgraph is based on node similarity calculated based on the topic vector as the node property. This is contrasted to the present disclosure’s diversity subgraph (see Figure 7 of the instant specification), wherein diversity is calculated based on the semantics which captures the similarity of nodes and their types and their relationships with domain ontologies about compound and disease, and the heterogeneous path patterns to capture biological meaning of the path (see Figure 4 of the present disclosure).

Examiner’s answer:
Again there is no mention homogenous graph within He. The applicant merely states it is so.  ‘Edge semantics’ is not mentioned within He. The claims recite, ‘wherein the diverse subgraph generation step includes determining semantic identifiers for the paths selected and added to the diverse subgraph.’ There is no mention of ‘property’, ‘topics’ or ‘ontology.’ Therefore the argument, ‘…wherein topics are manually curated by the domain experts, namely using ontologies about compounds and diseases.’ ‘Manual’ is not mentioned within the claims. Again, ‘edge semantics’ is not mentioned with He. The applicant’s argument is groundless. 

21.	Applicant’s argument:
With regard to Claims 8 and 18, HE’s paper has no stated definition of the term “semantic identifiers”. Their path patterns are the collaboration patterns. In the Figure 1 of HE’s paper, the diversity subgraph contains of the top-k paths. These paths show the topic diversity of author nodes. For example, the first path for the diversity subgraph in Figure 1 (see below) of He’s paper, John and Mary are connected via authors with machine learning, data mining, and parallel computing Note that in this diversity subgraph, you can find that nodes are all authors and there are no edge semantics (no edge labels), while our subgraph in Figure 4 is fundamentally different from He’s diversity subgraph. Our path patterns are nontrivial, and laymen with the only the education of He’s publication cannot re-invent our path patterns.

Examiner’s answer:
Ramakrishnan, Milnor, Perry, and Sheth (2005) adapted Faloutsos’ algorithm (2004) with heuristics for edge weighting that depends indirectly on the semantics of the entity and property types in the ontology and on characteristics of the instance data. (He, p119) Claims 7 and 17
These studies have provided useful techniques for detecting semantic associations, but none of previous studies have addressed the problem of detecting diverse subgraphs in collaboration networks annotated with authors’ expertise. (He, p119) Claim 8 and 18.
The claims are not asking for a specific definition. ‘Edge semantics’ is not mentioned within He nor the specification. ‘Nontrivial’ is not mentioned within the specification. 

22.	Applicant’s argument:
With regard to Claims 9 and 19, HE’s method of generating diversity subgraphs is based on the topical vector of author nodes, and the topical vector is automatically extracted and learned from this author’s publications. The nodes of the present disclosure are compounds, genes, and diseases, their topical vectors need to use specialized topic modeling algorithms as the general topic modeling algorithm used in HE’s publication cannot learn the topical vector of these biological entities. Furthermore, in the present disclosure also considers the edge semantics which may be binding relationship or protein-protein interaction, while HE’s graph does not have the edge semantics, therefore HE’s approach does not disclose, teach or suggest the presently claimed invention.

Examiner’s answer:
. 	‘Edge semantics’ is not mentioned within He nor the specification. ‘Learn’ or ‘teach’ is not mentioned within the claims. The applicant’s argument is groundless. 

23.	Applicant’s argument:
With regard to Claims 10 and 20, HE’s steps to generate candidate graph are to form a co-authorship graph using 230,000 papers, while the disclosed invention uses RDF as the common data model to integrate different datasets about compounds, genes, diseases, pathways and side effects. The present application’s disclosed method of generating candidate graphs involves integrating data, while HE’s approach is to form a graph from a set of publications. HE’s application does not need to deal with authors with different IDs, such as one author moved from one institution to another institution or changed name due to marriage or divorce, HE’s approach will treat these two authors as two nodes because the same authors have two different names, and they will be treated as two different nodes in the coauthor network. While the disclosure of the present application involves merging the same compound with different IDs into one single node because the same compound may have different IDs, as there are four major compound databases and each database may provide a unique ID for each compound. So, if one merges these found compound databases together, one would need to merge the same compound with different IDs into one node in our integrated graph. Furthermore, our edge semantics are much richer than HE’s graph in which only one edge semantic exists (e.g., the co-author semantic edge type). HE’s application uses an ATC model to generate the topical vector for authors, while the presently disclosed invention uses a different model to generate topical vector for biological entities.

Examiner’s answer:
Applicant makes the statement, ‘HE’s approach will treat these two authors as two nodes because the same authors have two different names, and they will be treated as two different nodes in the coauthor network.’ This is not cited within the Office Action nor even stated within He. ‘Different ID’ is not within the same sentence or paragraph in the specification. The applicant’s argument is groundless. 

Claim Rejections – 35 USC 103
24. 	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over He in view of Duan. (‘Mining diversity subgraph in multidisciplinary scientific collaboration networks: A meso perspective’, referred to as He; U. S. Patent Publication 20140143280, referred to as Duan)

Claim 1
He discloses a method for semantic analysis of disparate data, in an environment having a plurality of datasets having distinct information fields relating to a topic, the method comprising the steps of: candidate generation, involving creating graphs relating to specified information with information fields from the plurality of datasets as nodes (He, p121; The pre-processing step (or candidate graph generation): in candidate graph generation, a smaller graph containing the source and sink vertices is created based on heuristics (e.g. node degree).); electrical network computation, involving representing graphs as an electrical circuit to calculate the voltage of each node and the current of each edge by solving a system of linear equations (He, p121; Electrical network computation: the candidate graph is viewed as an electrical circuit. According to Ohm’s law and the conservation of electricity, the voltage of each node and the current of each edge are obtained by solving a system of linear equations. Currents carried by all the source-to-sink paths are calculated (Faloutsos et al., 2004)); and diverse subgraph generation, involving selecting paths that carry the larger amount of current and have more new nodes in an iterative process, wherein each iteration, the path that scores the highest marginal current per number of existing types of nodes is selected and added to the diverse subgraph (He, p121; Diversity subgraph generation: paths that carry the larger amount of current and have more new nodes are selected in an iterative process. In each iteration, the path that scores the highest marginal current per number of existing types of nodes is selected and added to the diversity subgraphs.); and association generation between selected nodes scored by relevancy resulting in a ranked list of a plurality of paths in the subgraph. (He, p123; Table 1 presents the top ranked path in fig 4.)
He does not disclose expressly the associate generation including creating a data set stored in non-transient memory, the data set including the ranked list.
Duan discloses the associate generation including creating a data set stored in non-transient memory, the data set including the ranked list. (Duan, 0082, 0073; ‘Suitable data processing systems for storing and/or executing program code include, but are not limited to, at least one processor coupled directly or indirectly to memory elements through a system bus.’ And ‘...or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 2
He does not disclose expressly wherein the candidate generation step involves creating nodes in the form of triples.
Duan discloses wherein the candidate generation step involves creating nodes in the form of triples. (Duan, 0015; A resource description framework (RDF) dataset is a graph, i.e., an RDF graph, containing a plurality of triples. Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 3
He does not disclose expressly wherein triples have the form of subject, predicate, and object.
Duan discloses wherein triples have the form of subject, predicate, and object. (Duan, 0015; Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 4
He discloses wherein the candidate generation step further comprises creating smaller graphs containing source and sink vertices based on heuristic values. (He, fig 1, 0119; Source and sink maps to John and Mary while heuristic values maps to ‘machine learning’, ‘text mining’ for examples, )

Claim 5
He discloses wherein the electrical network computation step involves assuming the current flows from source to sink. (He, table 1, 9123; Electrical values are calculated via different paths.)

Claim 6
He discloses wherein the candidate generation step selects a subset of the graphs which maximize a diversity function. (He, p120; 3.2.1. Diversity optimization for connected subgraph problem Given: An edge-weighted undirected graph G with nodes labeled with classes, sources, and sink e from G. Find: A connected subgraph Gs composed of the top-k paths between s and e that maximizes the diversity function D(Gs, k).)

Claim 7
He discloses wherein the diverse subgraph generation step includes determining semantic identifiers for the paths selected and added to the diverse subgraph. (He, p119; Ramakrishnan, Milnor, Perry, and Sheth (2005) adapted Faloutsos’ algorithm (2004) with heuristics for edge weighting that depends indirectly on the semantics of the entity and property types in the ontology and on characteristics of the instance data.)

Claim 8
He discloses wherein determination of semantic identifiers is in part based on at least one of path patterns and semantic connections. (He, p119; These studies have provided useful techniques for detecting semantic associations, but none of previous studies have addressed the problem of detecting diverse subgraphs in collaboration networks annotated with authors’ expertise. )

Claim 9
He discloses wherein the association generation step involves topic analysis of the nodes, wherein each node has a topic value related to textual information related to the node and contextual information about nodes in proximity in the subgraphs. (He, p118; While a micro view takes a single edge (representing coauthorship) or node (representing a coauthor) as the unit of analysis in a coauthorship network, a macro view computes topological metrics for the collaboration network as a whole. EC: The author has a topic associated with it. Proximity maps to the association with another author.)

Claim 10
He discloses wherein the candidate generation step involves creating nodes associated with different types of entities, and creating links between nodes associated with different types of relationships. (He, 9123; ‘The coauthor data set consists of 640,134 authors and 1,554,643 coauthor linkages. In the coauthorship network, nodes represent authors and edges represent coauthorship weighted by the number of co- authored papers.’ And ‘Additionally, the titles of about 230,000 papers associated with those authors are used as input for the ACT model, which generates a probability distribution over topics for each author and a set of representative words for each topic.’)

Claim 11
He discloses a system for semantic analysis of disparate data, the system comprising:.... a plurality of datasets (He, p7; The DSE and CDSE algorithms are implemented in an academic coauthor network extracted from the academic search system ArnetMiner (Tang, Zhang, et al., 2008; Tang et al., 2007) in the computer science field. The coauthor data set consists of 640,134 authors and 1,554,643 coauthor linkages.) .... candidate generation module accessible by the processor and memory, having software instructions capable of enabling the processor and memory to create graphs relating to specified information with information fields from the plurality of datasets as nodes (He, p121; The pre-processing step (or candidate graph generation): in candidate graph generation, a smaller graph containing the source and sink vertices is created based on heuristics (e.g. node degree).); electrical network computation module accessible by the processor and memory, having software instructions capable of enabling the processor and memory to represent graphs as an electrical circuit to calculate the voltage of each node and the current of each edge by solving a system of linear equations (He, p121; Electrical network computation: the candidate graph is viewed as an electrical circuit. According to Ohm’s law and the conservation of electricity, the voltage of each node and the current of each edge are obtained by solving a system of linear equations. Currents carried by all the source-to-sink paths are calculated (Faloutsos et al., 2004)); and diverse subgraph generation module accessible by the processor and memory, having software instructions capable of enabling the processor and memory to select paths that carry the larger amount of current and have more new nodes in an iterative process, wherein each iteration, the path that scores the highest marginal current per number of existing types of nodes is selected and added to the diverse subgraph (He, p121; Diversity subgraph generation: paths that carry the larger amount of current and have more new nodes are selected in an iterative process. In each iteration, the path that scores the highest marginal current per number of existing types of nodes is selected and added to the diversity subgraphs. ); and.... to associate between selected nodes scored by relevancy resulting in a ranked list of a plurality of paths in the subgraph. (He, p123; Table 1 presents the top ranked path in fig 4.)
He does not disclose expressly a processor and related memory;... having distinct information fields relating to a topic, the plurality of datasets being accessible by the processor and memory;....association generation module accessible by the processor and memory, having software instructions capable of enabling the processor and memory ...., said association generation module including a data set creation module for creating a data set in non- transient memory including the ranked list.
Duan discloses a processor and related memory (Duan, 0082, 0073; ‘Suitable data processing systems for storing and/or executing program code include, but are not limited to, at least one processor coupled directly or indirectly to memory elements through a system bus.’ And ‘...or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.);....having distinct information fields relating to a topic, the plurality of datasets being accessible by the processor and memory (Duan, 0015; Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object. EC: A triple can be seen as a data structure. Subject and/or object can be seen as a ‘topic.’):....association generation module accessible by the processor and memory, having software instructions capable of enabling the processor and memory (Duan, 0082, 0073; ‘Suitable data processing systems for storing and/or executing program code include, but are not limited to, at least one processor coupled directly or indirectly to memory elements through a system bus.’ And ‘...or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.)...., said association generation module including a data set creation module for creating a data set in non- transient memory including the ranked list. (Duan, 0015; A resource description framework (RDF) dataset is a graph, i.e., an RDF graph, containing a plurality of triples. Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 12
He does not disclose expressly wherein the candidate generation module involves creating nodes in the form of triples.
Duan discloses wherein the candidate generation module involves creating nodes in the form of triples. (Duan, 0015; A resource description framework (RDF) dataset is a graph, i.e., an RDF graph, containing a plurality of triples. Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 13
He does not disclose expressly wherein triples have the form of subject, predicate, and object.
Duan discloses wherein triples have the form of subject, predicate, and object. (Duan, 0015; Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 14
He discloses wherein the candidate generation module further comprises creating smaller graphs containing source and sink vertices based on heuristic values. (He, fig 1, 9119; Source and sink maps to John and Mary while heuristic values maps to ‘machine learning’, ‘text mining’ for examples, )

Claim 15
He discloses wherein the electrical network computation module involves assuming the current flows from source to sink. (He, table 1, p123; Electrical values are calculated via different paths.)

Claim 16
He discloses wherein the candidate generation module selects a subset of the graphs which maximize a diversity function. (He, p120; 3.2.1. Diversity optimization for connected subgraph problem Given: An edge-weighted undirected graph G with nodes labeled with classes, sources, and sink e from G. Find: A connected subgraph Gs composed of the top-k paths between s and e that maximizes the diversity function D(Gs, k).)

Claim 17
He discloses wherein the diverse subgraph generation module includes determining semantic identifiers for the paths selected and added to the diverse subgraph. (He, p119; Ramakrishnan, Milnor, Perry, and Sheth (2005) adapted Faloutsos’ algorithm (2004) with heuristics for edge weighting that depends indirectly on the semantics of the entity and property types in the ontology and on characteristics of the instance data.)

Claim 18
He discloses wherein determination of semantic identifiers is in part based on at least one of path patterns and semantic connections. (He, p119; These studies have provided useful techniques for detecting semantic associations, but none of previous studies have addressed the problem of detecting diverse subgraphs in collaboration networks annotated with authors’ expertise. )

Claim 19
He discloses wherein the association generation module involves topic analysis of the nodes, wherein each node has a topic value related to textual information related to the node and contextual information about nodes in proximity in the subgraphs. (He, p118; While a micro view takes a single edge (representing coauthorship) or node (representing a coauthor) as the unit of analysis in a coauthorship network, a macro view computes topological metrics for the collaboration network as a whole. EC: The author has a topic associated with it. Proximity maps to the association with another author.)

Claim 20
He discloses wherein the candidate generation module involves creating nodes associated with different types of entities, and creating links between nodes associated with different types of relationships. (He, 9123; ‘The coauthor data set consists of 640,134 authors and 1,554,643 coauthor linkages. In the coauthorship network, nodes represent authors and edges represent coauthorship weighted by the number of co- authored papers.’ And ‘Additionally, the titles of about 230,000 papers associated with those authors are used as input for the ACT model, which generates a probability distribution over topics for each author and a set of representative words for each topic.’)

25.	Claims 1-20 are rejected.
	
Conclusion – Final
26.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Correspondence Information
27.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Michael Huntley can be reached at (303) 297-4307.  .  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129