Implementation Specification¶

Exception Specification¶

cider exception¶

For exceptions in the cider directory, please import cider/include/cider/CiderException.h

Exception types such as CiderException, CiderRuntimeException, and CiderCompileException are defined,

please use the macro CIDER_THROW to throw exceptions, such as CIDER_THROW(CiderCompileException, “Exception message”);

cider-velox exception¶

For exceptions in the cider-velox directory, please use thirdparty/velox/velox/common/base/Exceptions.h,

Exception types such as VeloxException, VeloxUserError, VeloxRuntimeError are defined,

please use the macros VELOX_UNSUPPORTED, VELOX_FAIL, VELOX_USER_FAIL, VELOX_NYI to throw exceptions,

for example VELOX_UNSUPPORTED(“Conversion is not supported yet, type is {}”, type);

CiderAllocator use guide¶

introduction to CiderAllocator¶

We provide the allocation interface in CiderAllocator.h and two default implementations

CiderDefaultAllocator provide the default allocate and deallocate function via system library

AlignAllocator provide the function to perform memory alignment

how to use¶

user can construct an allocator from upstream memory pool, and then memory allocated will be in the pool. or use the CiderDefaultAllocator, AlignAllocator if they don’t want to do it, but this allocator must be present and passed throughout the calling logic because Only in this way can you know the memory usage during the entire operation for memory managerment

A typical usage in CiderRunTimeModule

1. add the member variables in .h file¶

std::shared_ptr<CiderAllocator> allocator

and add it in the constructor:

CiderRuntimeModule(
  std::shared_ptr<CiderCompilationResult> ciderCompilationResult,
  CiderCompilationOption ciderCompilationOption = CiderCompilationOption::defaults(),
  CiderExecutionOption ciderExecutionOption = CiderExecutionOption::defaults(),
  std::shared_ptr<CiderAllocator> allocator = std::make_shared<CiderDefaultAllocator>());

2. in .cpp implementation class¶

CiderRuntimeModule::CiderRuntimeModule(
    std::shared_ptr<CiderCompilationResult> ciderCompilationResult,
    CiderCompilationOption ciderCompilationOption,
    CiderExecutionOption ciderExecutionOption,
    std::shared_ptr<CiderAllocator> allocator)
    : ciderCompilationResult_(ciderCompilationResult)
    , ciderCompilationOption_(ciderCompilationOption)
    , ciderExecutionOption_(ciderExecutionOption)
    , allocator_(allocator)

3. Then we can use it in the implementation class for memory allocation and release¶

const int8_t** col_buffers = reinterpret_cast<const int8_t**>(allocator_->allocate(sizeof(int8_t**) * (total_col_num)));
allocator_->deallocate(reinterpret_cast<int8_t*>(col_buffers),sizeof(int8_t**) * (total_col_num));

4. Finally we can pass a custom allocator when creating CiderRunTimeModule or use the default¶

auto customAllocator = std::make_shared<CustomAllocator>();
cider_runtime_module_ = std::make_shared<CiderRuntimeModule>(compile_res_, compile_option, exe_option, customAllocator);

Logger¶

#include "util/Logger.h"

The BDTK Logger is based on Boost.Log with a design goal of being largely, though not completely, backward compatible with glog in usage, but with additional control over the logging format and other features.

For the developer, log entries are made in a syntax that is similar to std::ostream. Example:

LOG(INFO) << "x = " << x;

where INFO is one common example of the log “severity” level. Other severities are WARNING, FATAL, DEBUG1, etc. See Severity.

Initialization and Global Instances¶

To initialize the logging system, the function:

namespace logger {

void init(LogOptions const&);

}

LogOptions has the following public member variables and member functions:

boost::filesystem::path log_dir_{"bdtk_log"}; // Logging directory. May be relative to data directory, or absolute.

std::string file_name_pattern_{".{SEVERITY}.%Y%m%d-%H%M%S.log"}; // file_name_pattern is prepended with base_name.

std::string symlink_{".{SEVERITY}"}; // symlink is prepended with base_name.

Severity severity_{Severity::INFO}; // log level.

Severity severity_clog_{Severity::ERROR}; //Log to console severity level.

bool auto_flush_{true}; // Flush logging buffer to file after each message.

size_t max_files_{100}; // The maximum number of log files, if it is 0, no log will be recorded

size_t min_free_space_{20 << 20}; // Minimum number of bytes left on device before oldest log files are deleted.

size_t rotation_size_{10 << 20}; // Maximum file size in bytes before new log files are started.

LogOptions(char const* argv0);

boost::filesystem::path full_log_dir() const; // get log path

void set_base_path(std::string const& base_path); // set log base path

void parse_command_line(int, char const* const*); // Pass parameters using the command line

--log-directory : Logging directory. May be relative to data directory, or absolute.

--log-file-name : Log file name relative to log-directory.

--log-symlink : Symlink to active log.

--log-severity : Log to file severity level.

--log-severity-clog : Log to console severity level.

--log-auto-flush : Flush logging buffer to file after each message.

--log-max-files : Maximum number of log files to keep.

--log-min-free-space : Minimum number of bytes left on device before oldest log files are deleted.

--log-rotation-size : Maximum file size in bytes before new log files are started.

must be invoked with the logger::LogOptions object to be applied. It is recommended to run this from main() as early as possible. use the LOG/CHECK macros for all normal logging. Example:

logger::LogOptions log_options(argv[0]);
log_options.parse_command_line(argc, argv);
log_options.max_files_ = 0;
log_options.set_base_path("/root/work");
log_options.log_dir_ = "bdtk_log";
logger::init(log_options);

Usage¶

Severity¶

There are currently 8 severity levels that can be used with the LOG() macro, in decreasing order of severity from most severe to least:

Severity	When to Use
`FATAL`	An unrecoverable error has occurred, and must be fixed in the software.
`ERROR`	A recoverable error has occurred, and must be fixed in the software.
`WARNING`	Something that “should not have” happened happened, but is not as demanding of an immediate fix as an `ERROR`. Example: A deprecated feature is still being used, even though the user was informed of its deprecation.
`INFO`	Significant and informative milestones in the execution of the program. One must balance logging useful and informative information, against logging too much useless and redundant information that drowns the signal out with noise.
`DEBUG1`	More detailed information about the execution of the program than `INFO` that would be useful for debugging, but less detailed than the below debug levels. E.g. don’t log every row of a million-row table.
`DEBUG2`	More detailed information than `DEBUG1`.
`DEBUG3`	More detailed information than `DEBUG2`.
`DEBUG4`	More detailed information than `DEBUG3`.

All LOG() calls with a lesser severity are ignored and not logged. For example if LOG(DEBUG1) then log calls LOG(DEBUG1), LOG(INFO), LOG(WARNING), LOG(ERROR), and LOG(FATAL) are active and will produce log entries when executed, and log calls LOG(DEBUG4), LOG(DEBUG3), and LOG(DEBUG2) are ignored. If a function is called in the input stream, e.g. LOG(DEBUG2) << f(x), then f(x) will be invoked if any only if DEBUG2 is an active log severity.

Errors that are logged with ERROR and FATAL should be considered software errors, and not user errors. For example, if the user inputs a malformed SQL query, e.g. SELEKT instead of SELECT, then this should NOT be logged as an ERROR, but instead logged as an INFO with an appropriate response to the user. This would be considered correct behavior of the software. In contrast, anytime an ERROR or FATAL is logged, then it means there is a bug in the software that must be fixed.

Log Files¶

A separate log file is produced for INFO, WARNING, ERROR, and FATAL if the log severity is active. If any of the DEBUG severities are active, they are included into the INFO log file. Each log file redundantly includes all entries that are more severe than itself. For example if LOG(DEBUG1) then the INFO log file will include all log calls LOG(DEBUG1), LOG(INFO), LOG(WARNING), LOG(ERROR), and LOG(FATAL); the WARNING log file will include all log calls LOG(WARNING), LOG(ERROR), and LOG(FATAL).

The name of the log file by default includes the program name, severity, and timestamp of when its first entry was made. Example:

cider.INFO.20220928-162525.log

New log files are started on each (re)start. A symbolic link without the timestamp and file extension suffix points to the latest version. Example:

cider.INFO -> cider.INFO.20220928-162525.log

Format¶

The general format of a log entry is:

(timestamp) (severity) (process_id) (query_id) (thread_id) (filename:line_number) (message)

Example:

2019-09-18T16:25:25.659248 I 26481 0 0 measure.h:80 Timer start

Field descriptions:

Timestamp in local timezone with microsecond resolution.
Single-character severity level. In same order as above severity levels:
F E W I 1 2 3 4
For instance the I implies that the above log entry is of INFO severity.
The process_id assigned by the operating system.
The query_id is a unique 64-bit positive integer incrementally assigned to each new SQL query. A value of 0 indicates that the log line is outside of the context of any particular query, or that the query_id is not available.
The thread_id is a unique 64-bit positive integer incrementally assigned to each new thread. thread_id=1 is assigned to the first thread each time the program starts.
Source filename:Line number.
Custom message sent to LOG() via the insertion << operator.

Note that log entries can contain line breaks, thus not all log lines will begin with these fields if the message itself contains multiple lines.

Macros¶

LOG¶

In addition to the LOG() macro, there are:

LOG_IF(severity, condition) - Same as LOG(severity) but first checks a boolean condition and logs only if evaluated to true.

VLOG(n) - Same as LOG(DEBUGn) for n = 1, 2, 3, 4.

CHECK¶

In release mode, the CHECK macro does nothing. In debug mode, the CHECK(condition) macro evaluates condition as a boolean value. If true, then execution continues with nothing logged. Otherwise both the condition source code string is logged at FATAL severity, along with any optional << message.

Similarly there are 6 binary CHECK comparison macros:
CHECK_EQ, CHECK_NE, CHECK_LT, CHECK_LE, CHECK_GT, CHECK_GE
which accept two parameters, and apply the comparison operators ==, !=, <, <=, >, >=, respectively. For example, CHECK_LT(1u, list.size()) will evaluate 1u < list.size(), and log and abort() if not true. The advantage of calling CHECK_LT(1u, list.size()) over CHECK(1u < list.size()) is that the value of both operands will be logged if the test fails, which is not reported with CHECK().

DEBUG_TIMER¶

DebugTimer objects can be instantiated in the code that measure and log the duration of their own lifetimes, and include the following features:

Globally accessible via a macro. E.g. auto timer = DEBUG_TIMER(__func__).
Single multi-line log entry is reported for nested timers.
Enabled with the --enable-debug-timer program option. Without it, the timer objects have no effect.
Include timers from spawned threads. Requires a call on the child thread informing the parent thread id: DEBUG_TIMER_NEW_THREAD(parent_thread_id);

Example:

void foo() {
  auto timer = DEBUG_TIMER(__func__);
  ...
  bar();
  ...
}

void bar() {
  auto timer = DEBUG_TIMER(__func__);
  ...
  bar2();
  ...
  timer.stop();  // Manually stop timer for bar().
  ...
}

void bar2() {
  auto timer = DEBUG_TIMER(__func__);
  ...
}

Upon the destruction of the timer object within foo(), a log entry similar to the following will be made:

2019-10-17T15:22:53.981002 I 8980 foobar.cpp:70 DEBUG_TIMER thread_id(140719710320384)
19ms total duration for foo
  17ms start(10ms) bar foobar.cpp:100
    13ms start(10ms) bar2 foobar.cpp:130

Fields for the Duration lines (last two line above) are:

Lifetime of timer object.
Time after start of current thread. (This can be used to find gaps in timing coverage.)
String parameter to DEBUG_TIMER (__func__ in above examples.)
File:Line where DEBUG_TIMER was called from.

The first root DEBUG_TIMER instance is in foo(), and the two others in bar() and bar2() are initiated upon subsequent calls into the call stack, represented by the indentations. Once the first root timer object destructs, the entire DurationTree of recorded times are logged together into a single multi-line log entry, one line per timer instance.

There is a DebugTimer::stop() method that manually stops the timer, serving the same function as the destructor. The destructor then will have no further effect.

To embed timers in a spawned child thread, call DEBUG_TIMER_NEW_THREAD(parent_thread_id); from the child thread. The parent_thread_id must get its value from logger::thread_id() before the new thread is spawned. This will not start a timer, but will record the child-parent relationship so that subsequent DEBUG_TIMER calls are stored in the correct node of the parent tree.

Note

Any timer that is created in a thread when no other timers are active in the same or parent thread is called a root timer. The timer stack is logged when the root timer destructs, or stop() is called, after which memory used for tracking the timer trees are freed. The performance cost of this should be kept in mind when placing timers within the code.

Warning

Non-root timers that end after their root timer ends will result in a segmentation fault . This is easily avoided by not interleaving timer lifetimes with one another in the same block of code, and making sure that all child threads end prior to the ending of any corresponding root timer.