BDTK in 10 minutes

Introduction

Big Data Analytic Toolkit is a set of acceleration libraries aimed to optimize big data analytic frameworks.

By using this library, frontend SQL engines like Prestodb/Spark query performance will be significant improved.

The following diagram shows the design architecture.

_images/BDTK-arch.PNG

Major components of the project include:

  • Cider:

    a modularized and general-purposed Just-In-Time (JIT) compiler for data analytic query engine. It employs Substrait as a protocol allowing to support multiple front-end engines. Currently it provides a LLVM based implementation based on HeavyDB ).

  • Velox Plugin:

    a Velox-plugin is a bridge to enable Big Data Analytic Toolkit onto Velox. It introduces hybrid execution mode for both compilation and vectorization (existed in Velox). It works as a plugin to Velox seamlessly without changing Velox code.

  • Intel Codec Library:

    Intel Codec Library for BigData provides compression and decompression library for Apache Hadoop/Spark to make use of the acceleration hardware for compression/decompression.

APIs

The following table shows the query parameters for this service.

Attribute Description Required
CiderRuntimeModule The runtime module of Cider Yes