News
11/04/2024 BDEv 3.9 is released!
BDEv 3.9 is a point release for the BDEv 3.x line
- Updated versions of the following frameworks
- Hadoop 3.3.6
- Spark 3.2.4
- Spark 3.3.4
- Flink 1.14.6
- Flink 1.15.4
- Support for the Hadoop 3.4.x line
- Support for the Spark 3.4.x and 3.5.x lines
- Support for the Flink 1.16.x, 1.17.x, 1.18.x and 1.19.x lines
- Add HDFS parameter to configure DataNode heartbeat (hdfs-default.sh)
- Add YARN parameter to configure the scheduler class (yarn-default.sh)
- Add several options to customize the YARN fair scheduler (yarn-default.sh)
- Add YARN parameter to configure NodeManager heartbeat (yarn-default.sh)
- Configuration of each experiment is now copied into the output report directory
- HDFS formatting and data deletion can be configured (hdfs-default.sh)
- HDFS replication factor is automatically set when there are not enough DataNodes
- HDFS input data is reused if already exist
- Update RAPL-based power monitoring tool to PAPI v7.1.0
- Upate RGen data generator to include minor improvements and bug fixes
- Update resource monitoring tool to dool v1.3.1
- SSH optional parameters can now be set (bdev-default.sh)
- Fix bug when running Spark and Flink on YARN
- Remove support for Flink versions 1.12.x and 1.13.x
- Minor changes and bug fixes
29/07/2022 BDEv 3.8 is released!
BDEv 3.8 is a point release for the BDEv 3.x line
- Updated versions of the following frameworks
- Hadoop 2.10.2
- Hadoop 3.2.4
- Hadoop 3.3.3
- Spark 3.2.2
- Flink 1.14.5
- Support for the Spark 3.3.x line
- Support for the Flink 1.15.x line
- Add new sorting benchmark for Hadoop, Spark and Flink: TPCx-HS
- Update RGen data generator to support TPCx-HS (HSGen) and fix TeraGen bug with Hadoop 3.X
- Hadoop classpath is now automatically configured for Spark and Flink
- Add option to use hostnames or IPs for cluster nodes (bdev-default.sh)
- Add Hadoop parameter for setting timeout on waiting response from server (core-default.sh)
- Add HDFS parameter to configure write packet size (hdfs-default.sh)
- Add HDFS parameter to configure retries when writing blocks to DataNodes (hdfs-default.sh)
- Add HDFS parameter to configure handler count for DataNodes (hdfs-default.sh)
- Add option to set an IP for loopback interface (system-default.sh)
- Improve frameworks cleanup
- Add sanity checks for networking setup
- Remove support for Flink versions 1.11.x
- Minor changes and bug fixes
18/03/2022 BDEv 3.7 is released!
BDEv 3.7 is a point release for the BDEv 3.x line
- Updated versions of the following frameworks
- Hadoop 3.3.2
- Spark 3.1.3
- Flink 1.11.6
- Flink 1.12.7
- Flink 1.13.6
- Support for the Spark 3.2.x line
- Support for the Flink 1.14.x line
- Add Flink configuration parameters for several shuffle settings
- Remove support for Flink versions 1.10.x
- Add sanity check for Flink configuration file
- Update resource monitoring tool to dool v1.0.0
- Minor changes and bug fixes
16/07/2021 BDEv 3.6 is released!
BDEv 3.6 is a point release for the BDEv 3.x line
- Updated versions of the following frameworks
- Hadoop 2.10.1
- Hadoop 3.1.4
- Hadoop 3.2.2
- Hadoop 3.3.1
- Spark 2.4.8
- Spark 3.0.3
- Flink 1.10.3
- Flink 1.11.3
- Support for the Spark 3.1.x line
- Support for the Flink 1.12.x and 1.13.x lines
- Add Spark configuration parameters for
- spark.memory.fraction
- spark.memory.storageFraction
- spark.kryoserializer.buffer.max
- spark.executor.heartbeatInterval
- Adaptative Query Execution (AQE) can be enabled and configured for Spark 3.x
- Add Flink configuration parameters for several network memory settings and timeouts
- Add HDFS configuration parameters for setting socket-related timeouts
- Command batch mode now supports running multiple scripts stored in a directory
- Remove support for Flink versions <= 1.9
- All input datasets are now created using an internal data generator tool (RGen)
- Set Hadoop classpath properly
- Add sanity checks for some dependencies
- Fix missing dool files
- Minor changes and bug fixes
07/09/2020 BDEv 3.5 is released!
BDEv 3.5 is a point release for the BDEv 3.x line
- Updated versions of the following frameworks
- Spark 2.4.6
- Flink 1.9.3
- Flink 1.10.2
- Support for the Hadoop 3.3.x line
- Support for the Spark 3.0.x line
- Support for the Flink 1.11.x line
- Update resource monitoring tool to dool v0.9.9, a Python 3 compatible clone of dstat
- Update RAPL-based power monitoring tool to use PAPI v6.0.0
- Update iLO scripts to version 5.30
- RAPL-based monitoring tool now can report readings for uncore devices if PP0 is available
- Fix bug when setting iLO credentials
- Minor bug fixes
25/03/2020 BDEv 3.4 is released!
BDEv 3.4 is a point release for the BDEv 3.x line
- Updated versions of the following frameworks
- Spark 2.4.5
- Flink 1.8.3
- Flink 1.9.2
- Support for the Flink 1.10.x line
- Allow to start the job history server for Flink
- Simplified and improved cluster deployments for Spark & Flink
- Add support to configure the YARN Timeline server (yarn-default.sh)
- Add support to configure the disk health checker service for YARN (yarn-default.sh)
- Add support to configure last access time precision for HDFS (hdfs-default.sh)
- Add support to enable short-circuit local reads for HDFS (hdfs-default.sh)
- Memory settings for some daemons (eg, DataNode) can now be set individually
- Fix SQL and ConnComp workloads for Spark
- Spark & Flink benchmarking suites are now downloaded on demand (removed from git repo)
- Scala version for Spark & Flink can be set in solutions-default.sh (2.11 by default)
- Set LD_LIBRARY_PATH in Spark & Flink to include the path to Hadoop native libraries
- Hadoop existence is checked when deployed together with other frameworks
- Minor bug fixes
17/12/2019 BDEv 3.3 is released!
BDEv 3.3 is a point release for the BDEv 3.x line. From this version onwards, BDEv is hosted at GitHub
- Updated versions of the following frameworks
- Hadoop 2.10.0
- Hadoop 3.1.3
- Hadoop 3.2.1
- Spark 2.3.4
- Spark 2.4.4
- Flink 1.8.2
- Support for the Flink 1.9.x line (tested with Flink 1.9.1)
- Add support for BDWatchdog tool, which allows per-process resource monitoring and JVM profiling
- BDEv now runs by default in command mode when no framework is selected in solutions.lst
- Update PAPI library to version 5.7.0
- Update RAPL power monitoring tool to use PAPI 5.7.0
- Add support to configure the binary names for Python2 and Python3 (system-default.sh)
- Fix bug when running Flink in standalone mode
- Other minor bug fixes
06/06/2019 BDEv 3.2 is released!
BDEv 3.2 is a point release for the BDEv 3.x line
- Updated versions of the following frameworks
- Hadoop 2.7.7
- Hadoop 2.8.5
- Hadoop 2.9.2
- Hadoop 3.1.2
- Hadoop 3.2.0
- RDMA-Hadoop-3 0.9.1
- Flame-MR 1.2
- Spark 2.3.3
- Spark 2.4.3
- Flink 1.7.2
- Flink 1.8.0
- Add support to configure handler count for Hadoop NameNode (hdfs-default.sh)
- Add support to use unsafe based Kryo serializer for Spark (solutions-default.sh)
- Fix bug when running multiple Executors per worker/node in Spark configuration
- Minor bug fixes
16/05/2018 BDEv 3.1 is released!
BDEv 3.1 is a point release for the BDEv 3.x line
- Updated versions of some frameworks
- Hadoop 2.7.6
- Hadoop 2.8.3
- Hadoop 2.9.0
- Flame-MR 1.1
- RDMA-Hadoop-2 1.3.5
- Spark 2.3.0
- RDMA-Spark 0.9.5
- Flink 1.4.2
- Support for the Hadoop 3.1.x line (tested with Hadoop 3.1.0)
- Improved customization of experiments with setup and clean up commands
- Extended configuration parameters
07/11/2017 BDEv 3.0 is released!
This is a major release which contains a number of significant enhancements, along with new characteristics previously unreleased.
- New monitoring tools
- Power monitoring via RAPL
- Microarchitecture-level event counting via Oprofile
- Enhanced framework configuration parameters
- Support for the Flink 1.3.x line (tested with Flink 1.3.2)
- Support for the Spark 2.2.x line (tested with Spark 2.2.0)
- Support for the Flame-MR 1.x line (tested with Flame-MR 1.0)
- Updated versions of some frameworks
- Hadoop 2.7.4
- Hadoop 2.8.1
- RDMA-Hadoop 1.2.0
- RDMA-Spark 0.9.4
- Updated version of resource monitoring tool to dstat 0.7.3
- Improved graph generation portability
25/11/2016 BDEv 2.3 is released!
BDEv 2.3 is a point release for the BDEv 2.x line
- Added support for Flame-MR version 0.10.0
- Added support for the Flink 1.1.x line (tested with version 1.1.3)
- Added support for the SLURM job scheduler
- Updated versions of some frameworks
- Hadoop 2.6.5
- Hadoop 2.7.3
- RDMA-Hadoop-2 1.1.0
- Spark 1.6.3
- Flink 1.0.3
- Memory settings for mappers and reducers can now be set separately
- Added configuration variable in etc/core-default.sh to control the Hadoop parameter "io.file.buffer.size"
- Virtual and physical memory limits in YARN containers as well as the ratio between them can now be configured in etc/yarn.default.sh
- Fix critical bug when running Flink under YARN related with JobManager memory
- Minor bug fixes
04/07/2016 BDEv 2.2 is released!
BDEv 2.2 is a point release for the BDEv 2.x line
- Add support for multiple Workers/TaskManagers per node in Spark and Flink
- Enhancements in several Flink workloads
- Grep
- PageRank (with delta iterations)
- Connected Components (with Gelly)
- K-Means
- Enhancements in several Spark workloads
- TeraSort
- K-Means (with MLlib)
- Bug fixes
05/05/2016 BDEv 2.1 is released!
BDEv 2.1 is a point release for the BDEv 2.x line
- Allow to start the job history server for Spark-based frameworks
- Added support for some new frameworks
- Hadoop 2.6.4
- RDMA-Hadoop-2 0.9.9
- Spark 1.6.1
- Flink 1.0.2
- Added support for Mahout 0.11.2 and 0.12.0
- Added new benchmarks for Spark
- PageRank
- Connected Components
- K-Means
- Bayes
- Hive SQL queries:
- Aggregation
- Join
- Scan
- Added new benchmarks for Flink
- PageRank
- Connected Components
- K-Means
- Multiple bug fixes
02/03/2016 BDEv 2.0 is released!
BDEv 2.0 is a major release which contains a number of significant enhancements. Starting with this release, the MapReduce Evaluator (MREv) tool has been renamed to Big Data Evaluator (BDEv).
- Added support for Flink version 0.10.2 on YARN and standalone modes
- Added support for RDMA-Spark version 0.9.1 on YARN and standalone modes
- Added support for Spark on standalone mode
- Updated versions of some frameworks
- Hadoop-2.7.2
- RDMA-Hadoop-2 0.9.8
- Spark 1.5.2
- Spark 1.6.0
- New data set generation using the HiBench DataGen tool
- Added new benchmarks for Hadoop-based frameworks
- Grep
- K-Means
- Connected Components
- Hive SQL queries:
- Aggregation
- Join
- Scan
- Added new benchmarks for Spark and Flink
- WordCount
- Sort
- Grep
- TeraSort
- Added new benchmark for DataMPI
- Grep
- Allow to start the job history server for Hadoop-based frameworks
- Input datasize for WordCount, Sort and TeraSort can now be different
- Enhanced configuration of parameters
- Separated configuration files for HDFS, MapReduce and YARN
- Solution-specific parameters in a separated file
- Many new configuration parameters
- Support for multiple disks
- Network interfaces are now optional parameters. If not specified, IPs are determined using the hostfile
- Updated iLO scripts to version 4.70
- Allow to configure IP, user name and password for iLO interfaces
- Automatically download Apache Mahout and Apache Hive on demand
- Optimized code refactorization and simplified internal configuration
- Multiple bug fixes
10/08/2015 MREv 1.1 is released!
MREv 1.1 has been released and can be obtained from the Downloads section. The main new features of MREv 1.1 are:
- Updated versions of the frameworks
- Hadoop-2.7.1-GbE
- Hadoop-2.7.1-IPoIB
- Hadoop-2.7.1-UDA
- RDMA-Hadoop-2-0.9.7-GbE
- RDMA-Hadoop-2-0.9.7-IPoIB
- Spark-1.4.1-YARN-GbE
- Spark-1.4.1-YARN-IPoIB
- Configurable timeout for the workloads
- User-defined batch command
- Separate configuration and log directories for each evaluation
- Enhanced resource configuration
09/12/2014 MREv 1.0 is released!
Version 1.0 of MREv has been released and can be obtained from the Downloads section.