X-RIME: Hadoop based large
scale social network analysis
Motivation
Today's
Internet-based social network sites
possess huge user communities. They hold large amount of data about
their users and want to generate core competency from the data. A key
enabler for this is a cost efficient solution for social data
management and social network analysis (SNA).
Such a
solution faces a few challenges. The most important one is that the
solution should be able to handle massive and heterogeneous data
sets. Facing this challenge, the traditional data warehouse based
solutions are usually not cost efficient enough. On the other hand,
existing SNA tools are mostly used in single workstation mode, and
not scalable enough. To this end, low cost and highly scalable data
management and processing technologies from cloud computing society
should be brought in to help.
However, most of existing cloud
based data analysis solutions are trying to provide SQL-like general
purpose query languages, and do not directly support social network
analysis. This makes them hard to optimize and hard to use for SNA
users. So, we came up with X-RIME to fix this gap.
So,
briefly speaking, X-RIME wants to provide a few value-added layers on
top of existing cloud infrastructure, to support smart decision loops
based on massive data sets and SNA. To end users, X-RIME is a library
consists of Map-Reduce programs, which are used to do raw data
pre-processing, transformation, SNA metrics and structures calculation,
and graph / network visualization. The library could be integrated with
other Hadoop based data warehouses (e.g., HIVE) to build more
comprehensive solutions.
About the Project
This
project is a join effort of Beijing University of Posts and
Telecommunications (BUPT) and IBM China Research Lab, supported by IBM
Open Collaboration Research program. The code is open source under
Apache License V2.0.
Currently Supported SNA Metrics and
Structures
- vertex degree statistics
- weakly connected components (WCC)
- strongly connected components (SCC)
- bi-connected components (BCC)
- ego-centric network density
- bread first search / single source shortest path (BFS/SSSP)
- K-core
- maximal cliques
- pagerank
- hyperlink-induced topic search (HITS)
- minimal spanning tree (MST)
- grid variant of Fruchterman-Reingold network layout algorithm
References
- Social Network Analysis: A Handbook,John P Scott,Sage Publications Ltd; 2nd edition (March 25, 2000).
- "Finding
Strongly Connected Components in Distributed Graphs",William McLendon
III, Bruce Hendrickson, Steven J. Plimpton, Lawrence Rauchwerger,
http://www.sandia.gov/~sjplimp/papers/jpdc05.pdf
- "An
Experimental Study of Parallel Biconnected Components Algorithms on
Symmetric Multiprocessors (SMPs)", Cong, Guojing, Bader, David A.,
http://smartech.gatech.edu/bitstream/1853/14382/1/GT-CSE-06-03.pdf
- "Graph Algorithms with MapReduce".Jimmy Lin(2008).
- "An
O(m) Algorithm for Cores Decomposition of Networks",V. Batagelj, M.
Zaversnik, http://vlado.fmf.uni-lj.si/pub/networks/doc/cores/cores.pdf
- "Finding
All Cliques of an Undirected Graph",Coen Bron, Joep Kerbosch,
http://www.dfki.de/~neumann/ie-seminar/presentations/finding_cliques.pdf
- "On
Computing All Maximal Cliques Distributedly",Fábio Protti,
Felipe M. G. França, and
Jayme Luiz Szwarcfiter,
http://www.springerlink.com/content/g5km75g7u2l80446/fulltext.pdf
- "The PageRank citation ranking: Bringing order to the Web".
Page, Lawrence; Brin, Sergey;
Motwani, Rajeev and Winograd, Terry (1999).
- "Algorithms for Estimating Relative Importance in Networks".
Scott White and Padhraic Smyth (2003).
- "Authoritative sources in a hyperlinked environment".Kleinberg, Jon (1999).Journal of the ACM 46 (5): 604¨C632
- "Improvement of HITS-based Algorithms on Web Documents".
Li, L.; Shang, Y.; Zhang, W. (2002).Proceedings of the 11th International World Wide Web Conference (WWW 2002).
- "A
Distributed Algorithm for Minimum-Weight Spanning Trees",R. G.
Gallager, P. A. Humblet, P. M. Spira,
http://www.cs.tau.ac.il/~afek/p66-gallager.pdf
- "Graph
Drawing by Force-directed Placement", Thomas M. J. Fruchterman, Edward
M. Reingold,
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol21/issue11/spe060tf.pdf
Call for Proposals
Do you have interesting social
network analysis algorithms or applications, and want to make them big?
Please tell us. Maybe we can help to implement them on Hadoop and find
out their value.
History
Release V1.0.0 2009-09-04
Release v1.0.1 2010-07-09
Maintainers
- Juwei Shi, shijuwei@gmail.com
- Bin Cai, caibinbupt@gmail.com
- Wei Xue, simonxue21@gmail.com
- Bo Yang, bo.yang.9527@gmail.com
Users
Download
X-RIME files
Project
detail and discuss
Get
support