Performance Considerations¶
Hyperscan supports a wide range of patterns in all three scanning modes. It is capable of extremely high levels of performance, but certain patterns can reduce performance markedly.
The following guidelines will help construct patterns and pattern sets that will perform better:
Regular expression constructs¶
Tip
Do not hand-optimize regular expression constructs.
Quite a large number of regular expressions can be written in multiple ways.
For example, caseless matching of /abc/
can be written as:
/[Aa][Bb][Cc]/
/(A|a)(B|b)(C|c)/
/(?i)abc(?-i)/
/abc/i
Hyperscan is capable of handling all these constructs. Unless there is a specific reason otherwise, do not rewrite patterns from one form to another.
As another example, matching of /foo(bar|baz)(frotz)?/
can be
equivalently written as:
/foobarfrotz|foobazfrotz|foobar|foobaz/
This change will not improve performance or reduce overheads.
Library usage¶
Tip
Do not hand-optimize library usage.
The Hyperscan library is capable of dealing with small writes, unusually large and small pattern sets, etc. Unless there is a specific performance problem with some usage of the library, it is best to use Hyperscan in a simple and direct fashion. For example, it is unlikely for there to be much benefit in buffering input to the library into larger blocks unless streaming writes are tiny (say, 1-2 bytes at a time).
Unlike many other pattern matching products, Hyperscan will run faster with small numbers of patterns and slower with large numbers of patterns in a smooth fashion (as opposed to, typically, running at a moderate speed up to some fixed limit then either breaking or running half as fast).
Hyperscan also provides high-throughput matching with a single thread of control per core; if a database runs at 3.0 Gbps in Hyperscan it means that a 3000-bit block of data will be scanned in 1 microsecond in a single thread of control, not that it is required to scan 22 3000-bit blocks of data in 22 microseconds. Thus, it is not usually necessary to buffer data to supply Hyperscan with available parallelism.
Block-based matching¶
Tip
Prefer block-based matching to streaming matching where possible.
Whenever input data appears in discrete records, or already requires some sort of transformation (e.g. URI normalization) that requires all the data to be accumulated before processing, it should be scanned in block rather than in streaming mode.
Unnecessary use of streaming mode reduces the number of optimizations that can be applied in Hyperscan and may make some patterns run slower.
If there is a mixture of ‘block’ and ‘streaming’ mode patterns, these should be scanned in separate databases except in the case that the streaming patterns vastly outnumber the block mode patterns.
Unnecessary databases¶
Tip
Avoid unnecessary ‘union’ databases.
If there are 5 different types of network traffic T1 through T5 that must be scanned against 5 different signature sets, it will be far more efficient to construct 5 separate databases and scan traffic against the appropriate one than it will be to merge all 5 signature sets and remove inappropriate matches after the fact.
This will be true even in the case where there is substantial overlap among the signatures. Only if the common subset of the signatures is overwhelmingly large (say, 90% of the signatures appear in all 5 traffic types) should a database that merges all 5 signature sets be considered, and only then if there are no performance issues with specific patterns that appear outside the common subset.
Allocate scratch ahead of time¶
Tip
Do not allocate scratch space for your pattern database just before calling a scan function. Instead, do it just after the pattern database is compiled or deserialized.
Scratch allocation is not necessarily a cheap operation. Since it is the first
time (after compilation or deserialization) that a pattern database is used,
Hyperscan performs some validation checks inside hs_alloc_scratch()
and
must also allocate memory.
Therefore, it is important to ensure that hs_alloc_scratch()
is not
called in the application’s scanning path just before hs_scan()
(for
example).
Instead, scratch should be allocated immediately after a pattern database is compiled or deserialized, then retained for later scanning operations.
Allocate one scratch space per scanning context¶
Tip
A scratch space can be allocated so that it can be used with any one of a number of databases. Each concurrent scan operation (such as a thread) needs its own scratch space.
The hs_alloc_scratch()
function can accept an existing scratch space and
“grow” it to support scanning with another pattern database. This means that
instead of allocating one scratch space for every database used by an
application, one can call hs_alloc_scratch()
with a pointer to the same
hs_scratch_t
and it will be sized appropriately for use with any of
the given databases. For example:
hs_database_t *db1 = buildDatabaseOne();
hs_database_t *db2 = buildDatabaseTwo();
hs_database_t *db3 = buildDatabaseThree();
hs_error_t err;
hs_scratch_t *scratch = NULL;
err = hs_alloc_scratch(db1, &scratch);
if (err != HS_SUCCESS) {
printf("hs_alloc_scratch failed!");
exit(1);
}
err = hs_alloc_scratch(db2, &scratch);
if (err != HS_SUCCESS) {
printf("hs_alloc_scratch failed!");
exit(1);
}
err = hs_alloc_scratch(db3, &scratch);
if (err != HS_SUCCESS) {
printf("hs_alloc_scratch failed!");
exit(1);
}
/* scratch may now be used to scan against any of
the databases db1, db2, db3. */
Anchored patterns¶
Tip
If a pattern is meant to appear at the start of data, be sure to anchor it.
Anchored patterns (/^.../
) are far simpler to match than other
patterns, especially patterns anchored to the start of the buffer (or stream, in
streaming mode). Anchoring patterns to the end of the buffer results in less of
a performance gain, especially in streaming mode.
There are a variety of ways to anchor a pattern to a particular offset:
The
^
and\A
constructs anchor the pattern to the start of the buffer. For example,/^foo/
can only match at offset 3.The
$
,\z
and\Z
constructs anchor the pattern to the end of the buffer. For example,/foo\z/
can only match when the data buffer being scanned ends infoo
. (It should be noted that$
and\Z
will also match before a newline at the end of the buffer, so/foo\z/
would match against eitherabc foo
orabc foo\n
.)The
min_offset
andmax_offset
extended parameters may also be used to constrain where a pattern could match. For example, the pattern/foo/
with amax_offset
of 10 will only match at offsets less than or equal to 10 in the buffer. (This pattern could also be written as/^.{0,7}foo/
, compiled with theHS_FLAG_DOTALL
flag).
Matching everywhere¶
Tip
Avoid patterns that match everywhere, and remember that our semantics are ‘match everywhere, end of match only’.
Pattern that match everywhere will run slowly due to the sheer number of matches that they return.
Patterns like /.*/
in an automata-based matcher will match before and
after every single character position, so a buffer with 100 characters will
return 101 matches. Greedy pattern matchers such as libpcre will return a
single match in this case, but our semantics is to return all matches. This is
likely to be very expensive for our code and for the client code of the
library.
Another result of our semantics (“match everywhere”) is that patterns that have
optional start or ending sections – for example /x?abcd*/
– may not
perform as expected.
Firstly, the x?
portion of the pattern is unnecessary, as it will not
affect the match results.
Secondly, the above pattern will match ‘more’ than /abc/
but
/abc/
will always detect any input data that will be matched by
/x?abcd*/
– it will just produce fewer matches.
For example, input data 0123abcdddd
will match /abc/
once but
/abcd*/
five times (at abc
, abcd
, abcdd
, abcddd
, and
abcdddd
).
Bounded repeats in streaming mode¶
Tip
Bounded repeats are expensive in streaming mode.
A bounded repeat construction such as /X.{1000,1001}abcd/
is extremely
expensive in streaming mode, of necessity. It requires us to take action on
each X
character (itself expensive, relative to searching for longer strings)
and potentially record a history of hundreds of offsets where X
occurred in
case the X
and abcd
characters are separated by a stream boundary.
Heavy and unnecessary use of bounded repeats should be avoided, especially where other parts of a signature are quite specific. For example, a virus signature that matches a virus payload may be sufficient without including a prefix that includes, for example, a 2-character Windows executable prefix and a bounded repeat beforehand.
Prefer literals¶
Tip
Where possible, prefer patterns which ‘require’ literals, especially longer literals, and in streaming mode, prefer signatures that ‘require’ literals earlier in the pattern.
Patterns which must match on a literal will run faster than patterns that do not. For example:
/\wab\d*\w\w\w/
will run faster than/\w\w\d*\w\w/
, or, for that matter/\w(abc)?\d*\w\w\w/
(this contains a literal but it need not appear in the input).
Even implicit literals are better than none: /[0-2][3-5].*\w\w/
still effectively contains 9 2-character literals. No hand-optimization of this
case is required; this pattern will not run faster if rewritten as:
/(03|04|05|13|14|15|23|24|25).*\w\w/
.
Under all circumstances it is better to use longer literals than shorter ones. A database consisting of 100 14-character literals will scan considerably faster than one consisting of 100 4-character literals and return fewer positives.
Additionally, in streaming mode, a signature that contains a longer literal early in the pattern is preferred to one that does not.
For example: /b\w*foobar/
is not as good a pattern as
/blah\w*foobar/
.
The disparity between these patterns is much smaller in block mode.
Longer literals anywhere in the pattern are still preferred in streaming mode.
For example, both of the above patterns are stronger and will scan faster than
/b\w*fo/
even in streaming mode.
“Dot all” mode¶
Tip
Use “dot all” mode where possible.
Not using the HS_FLAG_DOTALL
pattern flag can be expensive, as
implicitly, it means that patterns of the form /A.*B/
become
/A[^\n]*B/
.
It is likely that scanning tasks without the DOTALL flag are better done ‘line at a time’, with the newline sequences marking the beginning and end of each block.
This will be true in most use-cases (an exception being where the DOTALL flag
is off but the pattern contains either explicit newlines or constructs such as
\s
that implicitly match a newline character).
Single-match flag¶
Tip
Consider using the single-match flag to limit matches to one match per pattern only if possible.
If only one match per pattern is required, use the flag provided to indicate
this (HS_FLAG_SINGLEMATCH
). This flag can allow a number of
optimizations to be applied, allowing both performance improvements and state
space reductions when streaming.
However, there is some overhead associated with tracking whether each pattern in the pattern set has matched, and some applications with infrequent matches may see reduced performance when the single-match flag is used.
Start of Match flag¶
Tip
Do not request Start of Match information if it is not not needed.
Start of Match (SOM) information can be expensive to gather and can require
large amounts of stream state to store in streaming mode. As such, SOM
information should only be requested with the HS_FLAG_SOM_LEFTMOST
flag for patterns that require it.
SOM information is not generally expected to be cheaper (in either performance
terms or in stream state overhead) than the use of bounded repeats.
Consequently, /foo.*bar/L
with a check on start of match values after
the callback is considerably more expensive and general than
/foo.{300}bar/
.
Similarly, the hs_expr_ext::min_length
extended parameter can be
used to specify a lower bound on the length of the matches for a pattern. Using
this facility may be more lightweight in some circumstances than using the SOM
flag and post-confirming match length in the calling application.
Approximate matching¶
Tip
Approximate matching is an experimental feature.
There is generally a performance impact associated with approximate matching due to the reduced specificity of the matches. This impact may vary significantly depending on the pattern and edit distance.