Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action
1.	Claims 1-20 are pending in this application.

Claim Rejections - 35 USC § 103
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over He in view of Duan. (‘Mining diversity subgraph in multidisciplinary scientific collaboration He; U. S. Patent Publication 20140143280, referred to as Duan)

Claim 1
He discloses a method for semantic analysis of disparate data, in an environment having a plurality of datasets having distinct information fields relating to a topic, the method comprising the steps of: candidate generation, involving creating graphs relating to specified information with information fields from the plurality of datasets as nodes (He, p121; The pre-processing step (or candidate graph generation): in candidate graph generation, a smaller graph containing the source and sink vertices is created based on heuristics (e.g. node degree).); electrical network computation, involving representing graphs as an electrical circuit to calculate the voltage of each node and the current of each edge by solving a system of linear equations (He, p121; Electrical network computation: the candidate graph is viewed as an electrical circuit. According to Ohm’s law and the conservation of electricity, the voltage of each node and the current of each edge are obtained by solving a system of linear equations. Currents carried by all the source-to-sink paths are calculated (Faloutsos et al., 2004)); and diverse subgraph generation, involving selecting paths that carry the larger amount of current and have more new nodes in an iterative process, wherein each iteration, the path that scores the highest marginal current per number of existing types of nodes is selected and added to the diverse subgraph (He, p121; Diversity subgraph generation: paths that carry the larger amount of current and have more new nodes are selected in an iterative process. In each iteration, the path that scores the highest marginal current per number of He, p123; Table 1 presents the top ranked path in fig 4.)
He does not disclose expressly the associate generation including creating a data set stored in non-transient memory, the data set including the ranked list.
Duan discloses the associate generation including creating a data set stored in non-transient memory, the data set including the ranked list. (Duan, 0082, 0073; ‘Suitable data processing systems for storing and/or executing program code include, but are not limited to, at least one processor coupled directly or indirectly to memory elements through a system bus.’ And ‘…or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 2

Duan discloses wherein the candidate generation step involves creating nodes in the form of triples. (Duan, 0015; A resource description framework (RDF) dataset is a graph, i.e., an RDF graph, containing a plurality of triples. Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 3
He does not disclose expressly wherein triples have the form of subject, predicate, and object.
Duan discloses wherein triples have the form of subject, predicate, and object. (Duan, 0015; Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for 

Claim 4
He discloses wherein the candidate generation step further comprises creating smaller graphs containing source and sink vertices based on heuristic values. (He, fig 1, p119; Source and sink maps to John and Mary while heuristic values maps to ‘machine learning’, ‘text mining’ for examples,)

Claim 5
He discloses wherein the electrical network computation step involves assuming the current flows from source to sink. (He, table 1, p123; Electrical values are calculated via different paths.)

Claim 6
He discloses wherein the candidate generation step selects a subset of the graphs which maximize a diversity function. (He, p120; 3.2.1. Diversity optimization for connected subgraph problem Given: An edge-weighted undirected graph G with nodes labeled with classes, sources, and sink e from G. Find: A connected subgraph Gs composed of the top-k paths between s and e that maximizes the diversity function D(Gs, k).)

Claim 7
He, p119; Ramakrishnan, Milnor, Perry, and Sheth (2005) adapted Faloutsos’ algorithm (2004) with heuristics for edge weighting that depends indirectly on the semantics of the entity and property types in the ontology and on characteristics of the instance data.)

Claim 8
He discloses wherein determination of semantic identifiers is in part based on at least one of path patterns and semantic connections. (He, p119; These studies have provided useful techniques for detecting semantic associations, but none of previous studies have addressed the problem of detecting diverse subgraphs in collaboration networks annotated with authors’ expertise.)

Claim 9
He discloses wherein the association generation step involves topic analysis of the nodes, wherein each node has a topic value related to textual information related to the node and contextual information about nodes in proximity in the subgraphs. (He, p118; While a micro view takes a single edge (representing coauthorship) or node (representing a coauthor) as the unit of analysis in a coauthorship network, a macro view computes topological metrics for the collaboration network as a whole. EC: The author has a topic associated with it. Proximity maps to the association with another author.)


He discloses wherein the candidate generation step involves creating nodes associated with different types of entities, and creating links between nodes associated with different types of relationships. (He, p123; ‘The coauthor data set consists of 640,134 authors and 1,554,643 coauthor linkages. In the coauthorship network, nodes represent authors and edges represent coauthorship weighted by the number of co-authored papers.’ And ‘Additionally, the titles of about 230,000 papers associated with those authors are used as input for the ACT model, which generates a probability distribution over topics for each author and a set of representative words for each topic.’)

Claim 11
He discloses a system for semantic analysis of disparate data, the system comprising:…. a plurality of datasets (He, p7; The DSE and CDSE algorithms are implemented in an academic coauthor network extracted from the academic search system ArnetMiner (Tang, Zhang, et al., 2008; Tang et al., 2007) in the computer science field. The coauthor data set consists of 640,134 authors and 1,554,643 coauthor linkages.) …. candidate generation module accessible by the processor and memory, having software instructions capable of enabling the processor and memory to create graphs relating to specified information with information fields from the plurality of datasets as nodes (He, p121; The pre-processing step (or candidate graph generation): in candidate graph generation, a smaller graph containing the source and sink vertices is created based on heuristics (e.g. node degree).); electrical network computation He, p121; Electrical network computation: the candidate graph is viewed as an electrical circuit. According to Ohm’s law and the conservation of electricity, the voltage of each node and the current of each edge are obtained by solving a system of linear equations. Currents carried by all the source-to-sink paths are calculated (Faloutsos et al., 2004)); and diverse subgraph generation module accessible by the processor and memory, having software instructions capable of enabling the processor and memory to select paths that carry the larger amount of current and have more new nodes in an iterative process, wherein each iteration, the path that scores the highest marginal current per number of existing types of nodes is selected and added to the diverse subgraph (He, p121; Diversity subgraph generation: paths that carry the larger amount of current and have more new nodes are selected in an iterative process. In each iteration, the path that scores the highest marginal current per number of existing types of nodes is selected and added to the diversity subgraphs.); and…. to associate between selected nodes scored by relevancy resulting in a ranked list of a plurality of paths in the subgraph. (He, p123; Table 1 presents the top ranked path in fig 4.)
He does not disclose expressly a processor and related memory;….having distinct information fields relating to a topic, the plurality of datasets being accessible by the processor and memory;….association generation module accessible by the processor and memory, having software instructions capable of enabling the processor 
Duan discloses a processor and related memory (Duan, 0082, 0073; ‘Suitable data processing systems for storing and/or executing program code include, but are not limited to, at least one processor coupled directly or indirectly to memory elements through a system bus.’ And ‘…or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.);….having distinct information fields relating to a topic, the plurality of datasets being accessible by the processor and memory (Duan, 0015; Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object. EC: A triple can be seen as a data structure. Subject and/or object can be seen as a ‘topic.’);….association generation module accessible by the processor and memory, having software instructions capable of enabling the processor and memory (Duan, 0082, 0073; ‘Suitable data processing systems for storing and/or executing program code include, but are not limited to, at least one processor coupled directly or indirectly to memory elements through a system bus.’ And ‘…or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.)…., said association generation module including a data set creation module for creating a data set in non- transient memory including the ranked list. (Duan, 0015; A resource description framework (RDF) dataset is a graph, 

Claim 12
He does not disclose expressly wherein the candidate generation module involves creating nodes in the form of triples.
Duan discloses wherein the candidate generation module involves creating nodes in the form of triples. (Duan, 0015; A resource description framework (RDF) dataset is a graph, i.e., an RDF graph, containing a plurality of triples. Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification. 


He does not disclose expressly wherein triples have the form of subject, predicate, and object.
Duan discloses wherein triples have the form of subject, predicate, and object. (Duan, 0015; Each triple is formed by a subject, a predicate and an object such that the predicate connects the subject to the object.) It would have been obvious to one having ordinary skill in the art, having the teachings of He and Duan before him before the effective filing date of the claimed invention, to modify He to incorporate computer hardware, data format in the form of triples of Duan. Given the advantage of implementing the invention in the real world and using triples with semantics fields for text evaluation, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 14
He discloses wherein the candidate generation module further comprises creating smaller graphs containing source and sink vertices based on heuristic values. (He, fig 1, p119; Source and sink maps to John and Mary while heuristic values maps to ‘machine learning’, ‘text mining’ for examples,)

Claim 15
He discloses wherein the electrical network computation module involves assuming the current flows from source to sink. (He, table 1, p123; Electrical values are calculated via different paths.)

Claim 16
He discloses wherein the candidate generation module selects a subset of the graphs which maximize a diversity function. (He, p120; 3.2.1. Diversity optimization for connected subgraph problem Given: An edge-weighted undirected graph G with nodes labeled with classes, sources, and sink e from G. Find: A connected subgraph Gs composed of the top-k paths between s and e that maximizes the diversity function D(Gs, k).)

Claim 17
He discloses wherein the diverse subgraph generation module includes determining semantic identifiers for the paths selected and added to the diverse subgraph. (He, p119; Ramakrishnan, Milnor, Perry, and Sheth (2005) adapted Faloutsos’ algorithm (2004) with heuristics for edge weighting that depends indirectly on the semantics of the entity and property types in the ontology and on characteristics of the instance data.)

Claim 18
He discloses wherein determination of semantic identifiers is in part based on at least one of path patterns and semantic connections. (He, p119; These studies have provided useful techniques for detecting semantic associations, but none of previous studies have addressed the problem of detecting diverse subgraphs in collaboration networks annotated with authors’ expertise.)

Claim 19
He discloses wherein the association generation module involves topic analysis of the nodes, wherein each node has a topic value related to textual information related to the node and contextual information about nodes in proximity in the subgraphs. (He, p118; While a micro view takes a single edge (representing coauthorship) or node (representing a coauthor) as the unit of analysis in a coauthorship network, a macro view computes topological metrics for the collaboration network as a whole. EC: The author has a topic associated with it. Proximity maps to the association with another author.)

Claim 20
He discloses wherein the candidate generation module involves creating nodes associated with different types of entities, and creating links between nodes associated with different types of relationships. (He, p123; ‘The coauthor data set consists of 640,134 authors and 1,554,643 coauthor linkages. In the coauthorship network, nodes represent authors and edges represent coauthorship weighted by the number of co-authored papers.’ And ‘Additionally, the titles of about 230,000 papers associated with those authors are used as input for the ACT model, which generates a probability distribution over topics for each author and a set of representative words for each topic.’)

3.	Claims 1-20 are rejected.

Conclusion	
4.	The prior art of record and not relied upon is considered pertinent to the applicant’s disclosure.
	-Search terms: claim 1 & ip.com
	-U. S. Patent Publication 20070192306: Papakonstantinou

Correspondence Information
5.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Li Zhen can be reached at (571) 272-3768.  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,

or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121