Big Data Evaluator Homepage


BDEv

NEW: BDEv 2.2 is released! Check out the News section.

Big Data models have gained a great popularity in the last several years. That is the case of MapReduce, a programming model originally developed by Google which is used for generating and processing large data sets. Apache Hadoop is a open-source and Java-based project that has raised as the de-facto standard implementation.

Hadoop is oriented to commodity hardware clusters, which hinders it from totally leveraging high-performance resources typically available in High Performance Computing (HPC) systems, like Solid State Drive (SSD) disks or InfiniBand networks. This situation has caused the emergence of many frameworks oriented to HPC systems. The suitability of each framework to a particular cluster depends on its design and implementation, the underlying system resources and the type of application to be run. Therefore, the appropriate selection of one of these solutions generally involves the execution of multiple experiments in order to assess their performance, scalability and resource efficiency.

Furthermore, new frameworks have been developed to overcome the limitations of Hadoop. By redesigning its architecture and implementation, they are capable of increasing the performance of the workloads. These new solutions, like Spark or Flink, widen the range of operations that can be applied to the data being processed, while also supporting traditional MapReduce algorithms.

The Big Data Evaluator (BDEv) tool allows to evaluate several Big Data frameworks using different workloads, in order to extract valuable information about the adaptability of each one to the underlying system. BDEv has evolved from MREv [1], which was originally aimed to evaluate HPC-oriented MapReduce frameworks. BDEv performs the workloads unifying the configuration of the frameworks and extracting statistical values about their performance and resource utilization, easing the task of collecting results.

The predecessor of BDEv, MREv, has been used for research purposes in [2], which analyses the behaviour of HPC-oriented MapReduce frameworks on an HPC cluster. It has also been used in the evaluation of Flame-MR [3], an efficient MapReduce framework that improves the performance of Hadoop.

References