Performance Considerations

Hyperscan supports a wide range of patterns in all three scanning modes. It is capable of extremely high levels of performance, but certain patterns can reduce performance markedly.

The following guidelines will help construct patterns and pattern sets that will perform better:

Regular expression constructs


Do not hand-optimize regular expression constructs.

Quite a large number of regular expressions can be written in multiple ways. For example, caseless matching of /abc/ can be written as:

  • /[Aa][Bb][Cc]/

  • /(A|a)(B|b)(C|c)/

  • /(?i)abc(?-i)/

  • /abc/i

Hyperscan is capable of handling all these constructs. Unless there is a specific reason otherwise, do not rewrite patterns from one form to another.

As another example, matching of /foo(bar|baz)(frotz)?/ can be equivalently written as:

  • /foobarfrotz|foobazfrotz|foobar|foobaz/

This change will not improve performance or reduce overheads.

Library usage


Do not hand-optimize library usage.

The Hyperscan library is capable of dealing with small writes, unusually large and small pattern sets, etc. Unless there is a specific performance problem with some usage of the library, it is best to use Hyperscan in a simple and direct fashion. For example, it is unlikely for there to be much benefit in buffering input to the library into larger blocks unless streaming writes are tiny (say, 1-2 bytes at a time).

Unlike many other pattern matching products, Hyperscan will run faster with small numbers of patterns and slower with large numbers of patterns in a smooth fashion (as opposed to, typically, running at a moderate speed up to some fixed limit then either breaking or running half as fast).

Hyperscan also provides high-throughput matching with a single thread of control per core; if a database runs at 3.0 Gbps in Hyperscan it means that a 3000-bit block of data will be scanned in 1 microsecond in a single thread of control, not that it is required to scan 22 3000-bit blocks of data in 22 microseconds. Thus, it is not usually necessary to buffer data to supply Hyperscan with available parallelism.

Block-based matching


Prefer block-based matching to streaming matching where possible.

Whenever input data appears in discrete records, or already requires some sort of transformation (e.g. URI normalization) that requires all the data to be accumulated before processing, it should be scanned in block rather than in streaming mode.

Unnecessary use of streaming mode reduces the number of optimizations that can be applied in Hyperscan and may make some patterns run slower.

If there is a mixture of ‘block’ and ‘streaming’ mode patterns, these should be scanned in separate databases except in the case that the streaming patterns vastly outnumber the block mode patterns.

Unnecessary databases


Avoid unnecessary ‘union’ databases.

If there are 5 different types of network traffic T1 through T5 that must be scanned against 5 different signature sets, it will be far more efficient to construct 5 separate databases and scan traffic against the appropriate one than it will be to merge all 5 signature sets and remove inappropriate matches after the fact.

This will be true even in the case where there is substantial overlap among the signatures. Only if the common subset of the signatures is overwhelmingly large (say, 90% of the signatures appear in all 5 traffic types) should a database that merges all 5 signature sets be considered, and only then if there are no performance issues with specific patterns that appear outside the common subset.

Allocate scratch ahead of time


Do not allocate scratch space for your pattern database just before calling a scan function. Instead, do it just after the pattern database is compiled or deserialized.

Scratch allocation is not necessarily a cheap operation. Since it is the first time (after compilation or deserialization) that a pattern database is used, Hyperscan performs some validation checks inside hs_alloc_scratch() and must also allocate memory.

Therefore, it is important to ensure that hs_alloc_scratch() is not called in the application’s scanning path just before hs_scan() (for example).

Instead, scratch should be allocated immediately after a pattern database is compiled or deserialized, then retained for later scanning operations.

Allocate one scratch space per scanning context


A scratch space can be allocated so that it can be used with any one of a number of databases. Each concurrent scan operation (such as a thread) needs its own scratch space.

The hs_alloc_scratch() function can accept an existing scratch space and “grow” it to support scanning with another pattern database. This means that instead of allocating one scratch space for every database used by an application, one can call hs_alloc_scratch() with a pointer to the same hs_scratch_t and it will be sized appropriately for use with any of the given databases. For example:

hs_database_t *db1 = buildDatabaseOne();
hs_database_t *db2 = buildDatabaseTwo();
hs_database_t *db3 = buildDatabaseThree();

hs_error_t err;
hs_scratch_t *scratch = NULL;
err = hs_alloc_scratch(db1, &scratch);
if (err != HS_SUCCESS) {
    printf("hs_alloc_scratch failed!");
err = hs_alloc_scratch(db2, &scratch);
if (err != HS_SUCCESS) {
    printf("hs_alloc_scratch failed!");
err = hs_alloc_scratch(db3, &scratch);
if (err != HS_SUCCESS) {
    printf("hs_alloc_scratch failed!");

/* scratch may now be used to scan against any of
   the databases db1, db2, db3. */

Anchored patterns


If a pattern is meant to appear at the start of data, be sure to anchor it.

Anchored patterns (/^.../) are far simpler to match than other patterns, especially patterns anchored to the start of the buffer (or stream, in streaming mode). Anchoring patterns to the end of the buffer results in less of a performance gain, especially in streaming mode.

There are a variety of ways to anchor a pattern to a particular offset:

  • The ^ and \A constructs anchor the pattern to the start of the buffer. For example, /^foo/ can only match at offset 3.

  • The $, \z and \Z constructs anchor the pattern to the end of the buffer. For example, /foo\z/ can only match when the data buffer being scanned ends in foo. (It should be noted that $ and \Z will also match before a newline at the end of the buffer, so /foo\z/ would match against either abc foo or abc foo\n.)

  • The min_offset and max_offset extended parameters may also be used to constrain where a pattern could match. For example, the pattern /foo/ with a max_offset of 10 will only match at offsets less than or equal to 10 in the buffer. (This pattern could also be written as /^.{0,7}foo/, compiled with the HS_FLAG_DOTALL flag).

Matching everywhere


Avoid patterns that match everywhere, and remember that our semantics are ‘match everywhere, end of match only’.

Pattern that match everywhere will run slowly due to the sheer number of matches that they return.

Patterns like /.*/ in an automata-based matcher will match before and after every single character position, so a buffer with 100 characters will return 101 matches. Greedy pattern matchers such as libpcre will return a single match in this case, but our semantics is to return all matches. This is likely to be very expensive for our code and for the client code of the library.

Another result of our semantics (“match everywhere”) is that patterns that have optional start or ending sections – for example /x?abcd*/ – may not perform as expected.

Firstly, the x? portion of the pattern is unnecessary, as it will not affect the match results.

Secondly, the above pattern will match ‘more’ than /abc/ but /abc/ will always detect any input data that will be matched by /x?abcd*/ – it will just produce fewer matches.

For example, input data 0123abcdddd will match /abc/ once but /abcd*/ five times (at abc, abcd, abcdd, abcddd, and abcdddd).

Bounded repeats in streaming mode


Bounded repeats are expensive in streaming mode.

A bounded repeat construction such as /X.{1000,1001}abcd/ is extremely expensive in streaming mode, of necessity. It requires us to take action on each X character (itself expensive, relative to searching for longer strings) and potentially record a history of hundreds of offsets where X occurred in case the X and abcd characters are separated by a stream boundary.

Heavy and unnecessary use of bounded repeats should be avoided, especially where other parts of a signature are quite specific. For example, a virus signature that matches a virus payload may be sufficient without including a prefix that includes, for example, a 2-character Windows executable prefix and a bounded repeat beforehand.

Prefer literals


Where possible, prefer patterns which ‘require’ literals, especially longer literals, and in streaming mode, prefer signatures that ‘require’ literals earlier in the pattern.

Patterns which must match on a literal will run faster than patterns that do not. For example:

  • /\wab\d*\w\w\w/ will run faster than

  • /\w\w\d*\w\w/, or, for that matter

  • /\w(abc)?\d*\w\w\w/ (this contains a literal but it need not appear in the input).

Even implicit literals are better than none: /[0-2][3-5].*\w\w/ still effectively contains 9 2-character literals. No hand-optimization of this case is required; this pattern will not run faster if rewritten as: /(03|04|05|13|14|15|23|24|25).*\w\w/.

Under all circumstances it is better to use longer literals than shorter ones. A database consisting of 100 14-character literals will scan considerably faster than one consisting of 100 4-character literals and return fewer positives.

Additionally, in streaming mode, a signature that contains a longer literal early in the pattern is preferred to one that does not.

For example: /b\w*foobar/ is not as good a pattern as /blah\w*foobar/.

The disparity between these patterns is much smaller in block mode.

Longer literals anywhere in the pattern are still preferred in streaming mode. For example, both of the above patterns are stronger and will scan faster than /b\w*fo/ even in streaming mode.

“Dot all” mode


Use “dot all” mode where possible.

Not using the HS_FLAG_DOTALL pattern flag can be expensive, as implicitly, it means that patterns of the form /A.*B/ become /A[^\n]*B/.

It is likely that scanning tasks without the DOTALL flag are better done ‘line at a time’, with the newline sequences marking the beginning and end of each block.

This will be true in most use-cases (an exception being where the DOTALL flag is off but the pattern contains either explicit newlines or constructs such as \s that implicitly match a newline character).

Single-match flag


Consider using the single-match flag to limit matches to one match per pattern only if possible.

If only one match per pattern is required, use the flag provided to indicate this (HS_FLAG_SINGLEMATCH). This flag can allow a number of optimizations to be applied, allowing both performance improvements and state space reductions when streaming.

However, there is some overhead associated with tracking whether each pattern in the pattern set has matched, and some applications with infrequent matches may see reduced performance when the single-match flag is used.

Start of Match flag


Do not request Start of Match information if it is not not needed.

Start of Match (SOM) information can be expensive to gather and can require large amounts of stream state to store in streaming mode. As such, SOM information should only be requested with the HS_FLAG_SOM_LEFTMOST flag for patterns that require it.

SOM information is not generally expected to be cheaper (in either performance terms or in stream state overhead) than the use of bounded repeats. Consequently, /foo.*bar/L with a check on start of match values after the callback is considerably more expensive and general than /foo.{300}bar/.

Similarly, the hs_expr_ext::min_length extended parameter can be used to specify a lower bound on the length of the matches for a pattern. Using this facility may be more lightweight in some circumstances than using the SOM flag and post-confirming match length in the calling application.

Approximate matching


Approximate matching is an experimental feature.

There is generally a performance impact associated with approximate matching due to the reduced specificity of the matches. This impact may vary significantly depending on the pattern and edit distance.