A Distributed and Adaptive Graph Simulation System

Pooja Nilangekar; Mohammad Hammoud

doi:10.5339/qfarc.2016.ICTPP3327

oa A Distributed and Adaptive Graph Simulation System
المؤلفون: Pooja Nilangekar¹ and Mohammad Hammoud¹
عرض الانتماءات إخفاء الانتسابات المهنية

االانتسابات المهنيه ¹ Carnegie Mellon University, Qatar, QA
الناشر: Hamad bin Khalifa University Press (HBKU Press)
المصدر: Qatar Foundation Annual Research Conference Proceedings, Qatar Foundation Annual Research Conference Proceedings Volume 2016 Issue 1, مارس ٢٠١٦, المجلد 2016, ICTPP3327
DOI https://doi.org/10.5339/qfarc.2016.ICTPP3327

ملخص

Large-scale graph processing is becoming central to our modern life. For instance, graph pattern matching (GPM) can be utilized to search and analyze social graphs, biological data and road networks, to mention a few. Conceptually, a GPM algorithm is typically defined in terms of subgraph isomorphism, whereby it seeks to find subgraphs in an input data graph, G, which are similar to a given query graph, Q. Although subgraph isomorphism forms a uniquely important class of graph queries, it is NP-complete and very restrictive in capturing sensible matches for emerging applications like software plagiarism detection, protein interaction networks, and intelligence analysis, among others. Consequently, GPM has been recently relaxed and defined in terms of graph simulation. As opposed to subgraph isomorphism, graph simulation can run in quadratic time, return more intuitive matches, and scale well with modern big graphs (i.e., graphs with billions of vertices and edges). Nonetheless, the current state-of-the-art distributed graph simulation systems still rely on graph partitioning (which is also NP-complete), induce significant communication overhead between worker machines to resolve local matches, and fail to adapt to various complexities of query graphs.

In this work, we observe that big graphs are not big data. That is, the largest big graph that we know of can still fit on a single physical or virtual disk (e.g., 6TB physical disks are cheaply available nowadays and AWS EC2 instances can offer up to 24 × 2048GB virtual disks). However, since graph simulation requires exploring the entire input big graph, G, and naturally lacks data locality, existing memory capacities can get significantly dwarfed by G's size. As such, we propose GraphSim, a novel distributed and adaptive system for efficient and scalable graph simulation. GraphSim precludes graph partitioning altogether, yet still exploits parallel processing across cluster machines. In particular, GraphSim stores G at each machine but only matches an interval of G's vertices at the machine. All machines are run in parallel and each machine simulates its interval locally. Nevertheless, if necessary, a machine can inspect remaining dependent vertices in G to fully resolve its local matches without communicating with any other machine. Hence, GraphSim does not shuffle intermediate data whatsoever. In addition, it attempts not to overwhelm the memory of any machine via employing a mathematical model to predict the best number of machines for any given query graph, Q, based on Q's complexity, G's size and the memory capacity of each machine. Subsequently, GraphSim renders adaptive as well. We experimentally verified the efficiency and the scalability of GraphSim over private and public clouds using real-life and synthetic big graphs. Results show that GraphSim can outperform the current fastest distributed graph simulation system by several orders of magnitude.

جارٍ تحميل قياسات المقالة...

/content/papers/10.5339/qfarc.2016.ICTPP3327

٢٠١٦-٠٣-٢١

٢٠٢٤-١١-٢٣

Full text loading...

/content/papers/10.5339/qfarc.2016.ICTPP3327

الأكثر اقتباسًا لهذا الشهر Most Cited RSS feed

- Barriers and facilitators influencing the physical activity of Arabic adults: A literature review
  
  المؤلفون: Kathleen Benjamin and Tam Truong Donnelly
- Effect of green marketing on consumer purchase behavior
  
  المؤلفون: Narges Delafrooz, Mohammad Taleghani and Bahareh Nouri
- Osteoporosis: An under-recognized public health problem
  
  المؤلفون: Rajasree Vijayakumar and Dietrich Büsselberg
- E-learning in Saudi Arabia: Past, present and future
  
  المؤلفون: Ali Mohammad Al-Asmari and M Shamsur Rabb Khan
- Association of erythrocytes antioxidant enzymes and their cofactors with markers of oxidative stress in patients with sickle cell anemia
  
  المؤلفون: Lamia M. Al-Naama, Mea'ad K. Hassan and Jawad K. Mehdi
مزيد أقل

oa A Distributed and Adaptive Graph Simulation System

ملخص

Most Read This Month

الأكثر اقتباسًا لهذا الشهر Most Cited RSS feed

Barriers and facilitators influencing the physical activity of Arabic adults: A literature review

Effect of green marketing on consumer purchase behavior

Osteoporosis: An under-recognized public health problem

E-learning in Saudi Arabia: Past, present and future

Association of erythrocytes antioxidant enzymes and their cofactors with markers of oxidative stress in patients with sickle cell anemia