Implementation Specification¶
Exception Specification¶
cider exception¶
For exceptions in the cider directory, please import cider/include/cider/CiderException.h
Exception types such as CiderException, CiderRuntimeException, and CiderCompileException are defined,
please use the macro CIDER_THROW to throw exceptions, such as CIDER_THROW(CiderCompileException, “Exception message”);
cider-velox exception¶
For exceptions in the cider-velox directory, please use thirdparty/velox/velox/common/base/Exceptions.h,
Exception types such as VeloxException, VeloxUserError, VeloxRuntimeError are defined,
please use the macros VELOX_UNSUPPORTED, VELOX_FAIL, VELOX_USER_FAIL, VELOX_NYI to throw exceptions,
for example VELOX_UNSUPPORTED(“Conversion is not supported yet, type is {}”, type);
CiderAllocator use guide¶
introduction to CiderAllocator¶
We provide the allocation interface in CiderAllocator.h and two default implementations
CiderDefaultAllocator provide the default allocate and deallocate function via system library
AlignAllocator provide the function to perform memory alignment
how to use¶
user can construct an allocator from upstream memory pool, and then memory allocated will be in the pool. or use the CiderDefaultAllocator, AlignAllocator if they don’t want to do it, but this allocator must be present and passed throughout the calling logic because Only in this way can you know the memory usage during the entire operation for memory managerment
A typical usage in CiderRunTimeModule
1. add the member variables in .h file¶
std::shared_ptr<CiderAllocator> allocator
and add it in the constructor:
CiderRuntimeModule( std::shared_ptr<CiderCompilationResult> ciderCompilationResult, CiderCompilationOption ciderCompilationOption = CiderCompilationOption::defaults(), CiderExecutionOption ciderExecutionOption = CiderExecutionOption::defaults(), std::shared_ptr<CiderAllocator> allocator = std::make_shared<CiderDefaultAllocator>());
2. in .cpp implementation class¶
CiderRuntimeModule::CiderRuntimeModule(
std::shared_ptr<CiderCompilationResult> ciderCompilationResult,
CiderCompilationOption ciderCompilationOption,
CiderExecutionOption ciderExecutionOption,
std::shared_ptr<CiderAllocator> allocator)
: ciderCompilationResult_(ciderCompilationResult)
, ciderCompilationOption_(ciderCompilationOption)
, ciderExecutionOption_(ciderExecutionOption)
, allocator_(allocator)
3. Then we can use it in the implementation class for memory allocation and release¶
const int8_t** col_buffers = reinterpret_cast<const int8_t**>(allocator_->allocate(sizeof(int8_t**) * (total_col_num)));
allocator_->deallocate(reinterpret_cast<int8_t*>(col_buffers),sizeof(int8_t**) * (total_col_num));
4. Finally we can pass a custom allocator when creating CiderRunTimeModule or use the default¶
auto customAllocator = std::make_shared<CustomAllocator>();
cider_runtime_module_ = std::make_shared<CiderRuntimeModule>(compile_res_, compile_option, exe_option, customAllocator);
Logger¶
#include "util/Logger.h"
The BDTK Logger is based on Boost.Log with a design goal of being largely, though not completely, backward compatible with glog in usage, but with additional control over the logging format and other features.
For the developer, log entries are made in a syntax that is similar to std::ostream
. Example:
LOG(INFO) << "x = " << x;
where INFO
is one common example of the log “severity” level. Other severities are WARNING
, FATAL
,
DEBUG1
, etc. See Severity.
Initialization and Global Instances¶
To initialize the logging system, the function:
namespace logger {
void init(LogOptions const&);
}
LogOptions has the following public member variables and member functions:
boost::filesystem::path log_dir_{"bdtk_log"};
// Logging directory. May be relative to data directory, or absolute.std::string file_name_pattern_{".{SEVERITY}.%Y%m%d-%H%M%S.log"};
// file_name_pattern is prepended with base_name.std::string symlink_{".{SEVERITY}"};
// symlink is prepended with base_name.Severity severity_{Severity::INFO};
// log level.Severity severity_clog_{Severity::ERROR};
//Log to console severity level.bool auto_flush_{true};
// Flush logging buffer to file after each message.size_t max_files_{100};
// The maximum number of log files, if it is 0, no log will be recordedsize_t min_free_space_{20 << 20};
// Minimum number of bytes left on device before oldest log files are deleted.size_t rotation_size_{10 << 20};
// Maximum file size in bytes before new log files are started.LogOptions(char const* argv0);
boost::filesystem::path full_log_dir() const;
// get log pathvoid set_base_path(std::string const& base_path);
// set log base pathvoid parse_command_line(int, char const* const*);
// Pass parameters using the command line
--log-directory
: Logging directory. May be relative to data directory, or absolute.--log-file-name
: Log file name relative to log-directory.--log-symlink
: Symlink to active log.--log-severity
: Log to file severity level.--log-severity-clog
: Log to console severity level.--log-auto-flush
: Flush logging buffer to file after each message.--log-max-files
: Maximum number of log files to keep.--log-min-free-space
: Minimum number of bytes left on device before oldest log files are deleted.--log-rotation-size
: Maximum file size in bytes before new log files are started.
must be invoked with the logger::LogOptions
object to be applied. It is recommended
to run this from main()
as early as possible. use the LOG
/CHECK
macros for all normal logging. Example:
logger::LogOptions log_options(argv[0]);
log_options.parse_command_line(argc, argv);
log_options.max_files_ = 0;
log_options.set_base_path("/root/work");
log_options.log_dir_ = "bdtk_log";
logger::init(log_options);
Usage¶
Severity¶
There are currently 8 severity levels that can be used with the LOG()
macro, in decreasing order of
severity from most severe to least:
Severity | When to Use |
FATAL |
An unrecoverable error has occurred, and must be fixed in the software. |
ERROR |
A recoverable error has occurred, and must be fixed in the software. |
WARNING |
Something that “should not have” happened happened, but is not as demanding
of an immediate fix as an ERROR . Example: A deprecated feature is still
being used, even though the user was informed of its deprecation. |
INFO |
Significant and informative milestones in the execution of the program. One must balance logging useful and informative information, against logging too much useless and redundant information that drowns the signal out with noise. |
DEBUG1 |
More detailed information about the execution of the program than INFO
that would be useful for debugging, but less detailed than the below debug
levels. E.g. don’t log every row of a million-row table. |
DEBUG2 |
More detailed information than DEBUG1 . |
DEBUG3 |
More detailed information than DEBUG2 . |
DEBUG4 |
More detailed information than DEBUG3 . |
All LOG()
calls with a lesser severity are ignored and not logged. For example if LOG(DEBUG1)
then log
calls LOG(DEBUG1)
, LOG(INFO)
, LOG(WARNING)
, LOG(ERROR)
, and LOG(FATAL)
are active and
will produce log entries when executed, and log calls LOG(DEBUG4)
, LOG(DEBUG3)
, and LOG(DEBUG2)
are ignored. If a function is called in the input stream, e.g. LOG(DEBUG2) << f(x)
, then f(x)
will
be invoked if any only if DEBUG2
is an active log severity.
Errors that are logged with ERROR
and FATAL
should be considered software errors, and not user errors.
For example, if the user inputs a malformed SQL query, e.g. SELEKT
instead of SELECT
, then this
should NOT be logged as an ERROR
, but instead logged as an INFO
with an appropriate response to
the user. This would be considered correct behavior of the software. In contrast, anytime an ERROR
or
FATAL
is logged, then it means there is a bug in the software that must be fixed.
Log Files¶
A separate log file is produced for INFO
, WARNING
, ERROR
, and FATAL
if the log severity is
active. If any of the DEBUG
severities are active, they are included into the INFO
log file. Each log
file redundantly includes all entries that are more severe than itself. For example if LOG(DEBUG1)
then the INFO
log file will include all log calls LOG(DEBUG1)
, LOG(INFO)
, LOG(WARNING)
,
LOG(ERROR)
, and LOG(FATAL)
; the WARNING
log file will include all log calls LOG(WARNING)
,
LOG(ERROR)
, and LOG(FATAL)
.
The name of the log file by default includes the program name, severity, and timestamp of when its first entry was made. Example:
cider.INFO.20220928-162525.log
New log files are started on each (re)start. A symbolic link without the timestamp and file extension suffix points to the latest version. Example:
cider.INFO -> cider.INFO.20220928-162525.log
Format¶
The general format of a log entry is:
(timestamp) (severity) (process_id) (query_id) (thread_id) (filename:line_number) (message)
Example:
2019-09-18T16:25:25.659248 I 26481 0 0 measure.h:80 Timer start
Field descriptions:
F
E
W
I
1
2
3
4
I
implies that the above log entry is of INFO
severity.LOG()
via the insertion <<
operator.Note that log entries can contain line breaks, thus not all log lines will begin with these fields if the message itself contains multiple lines.
Macros¶
LOG¶
In addition to the LOG()
macro, there are:
LOG_IF(severity, condition)
- Same asLOG(severity)
but first checks a booleancondition
and logs only if evaluated totrue
.VLOG(n)
- Same asLOG(DEBUGn)
forn = 1, 2, 3, 4
.
CHECK¶
In release mode, the CHECK
macro does nothing. In debug mode,
the CHECK(condition)
macro evaluates condition
as a boolean value. If true, then execution continues
with nothing logged. Otherwise both the condition
source code string is logged at FATAL
severity,
along with any optional << message
.
CHECK
comparison macros:CHECK_EQ
, CHECK_NE
, CHECK_LT
, CHECK_LE
, CHECK_GT
, CHECK_GE
==
, !=
, <
, <=
, >
, >=
, respectively. For example, CHECK_LT(1u, list.size())
will evaluate 1u < list.size()
, and log and abort()
if not true. The advantage of calling CHECK_LT(1u, list.size())
over CHECK(1u < list.size())
is that the value of both operands will be logged if the test fails, which is not reported with CHECK()
.DEBUG_TIMER¶
DebugTimer
objects can be instantiated in the code that measure and log the duration of their own lifetimes,
and include the following features:
- Globally accessible via a macro. E.g.
auto timer = DEBUG_TIMER(__func__)
. - Single multi-line log entry is reported for nested timers.
- Enabled with the
--enable-debug-timer
program option. Without it, thetimer
objects have no effect. - Include timers from spawned threads. Requires a call on the child thread informing the parent thread id:
DEBUG_TIMER_NEW_THREAD(parent_thread_id);
Example:
void foo() {
auto timer = DEBUG_TIMER(__func__);
...
bar();
...
}
void bar() {
auto timer = DEBUG_TIMER(__func__);
...
bar2();
...
timer.stop(); // Manually stop timer for bar().
...
}
void bar2() {
auto timer = DEBUG_TIMER(__func__);
...
}
Upon the destruction of the timer
object within foo()
, a log entry similar to the following will be made:
2019-10-17T15:22:53.981002 I 8980 foobar.cpp:70 DEBUG_TIMER thread_id(140719710320384)
19ms total duration for foo
17ms start(10ms) bar foobar.cpp:100
13ms start(10ms) bar2 foobar.cpp:130
Fields for the Duration
lines (last two line above) are:
- Lifetime of
timer
object. - Time after start of current thread. (This can be used to find gaps in timing coverage.)
- String parameter to
DEBUG_TIMER
(__func__
in above examples.) - File:Line where
DEBUG_TIMER
was called from.
The first root DEBUG_TIMER
instance is in foo()
, and the two others in bar()
and bar2()
are initiated
upon subsequent calls into the call stack, represented by the indentations. Once the first root timer
object
destructs, the entire DurationTree
of recorded times are logged together into a single multi-line log entry,
one line per timer
instance.
There is a DebugTimer::stop()
method that manually stops the timer, serving the same function
as the destructor. The destructor then will have no further effect.
To embed timers in a spawned child thread, call DEBUG_TIMER_NEW_THREAD(parent_thread_id);
from the child
thread. The parent_thread_id
must get its value from logger::thread_id()
before the new thread is spawned.
This will not start a timer, but will record the child-parent relationship so that subsequent DEBUG_TIMER
calls are stored in the correct node of the parent tree.
Note
Any timer that is created in a thread when no other timers are active in the same or parent thread is
called a root timer. The timer stack is logged when the root timer destructs, or stop()
is called,
after which memory used for tracking the timer trees are freed. The performance cost of this should be
kept in mind when placing timers within the code.
Warning
Non-root timers that end after their root timer ends will result in a segmentation fault . This is easily avoided by not interleaving timer lifetimes with one another in the same block of code, and making sure that all child threads end prior to the ending of any corresponding root timer.