API Reference: Files

File: hs.h

The complete Hyperscan API definition.

Hyperscan is a high speed regular expression engine.

This header includes both the Hyperscan compiler and runtime components. See the individual component headers for documentation.

Defines

HS_MAJOR
HS_MINOR
HS_PATCH

File: hs_common.h

The Hyperscan common API definition.

Hyperscan is a high speed regular expression engine.

This header contains functions available to both the Hyperscan compiler and runtime.

Defines

HS_SUCCESS

The engine completed normally.

HS_INVALID

A parameter passed to this function was invalid.

This error is only returned in cases where the function can detect an invalid parameter it cannot be relied upon to detect (for example) pointers to freed memory or other invalid data.

HS_NOMEM

A memory allocation failed.

HS_SCAN_TERMINATED

The engine was terminated by callback.

This return value indicates that the target buffer was partially scanned, but that the callback function requested that scanning cease after a match was located.

HS_COMPILER_ERROR

The pattern compiler failed, and the hs_compile_error_t should be inspected for more detail.

HS_DB_VERSION_ERROR

The given database was built for a different version of Hyperscan.

HS_DB_PLATFORM_ERROR

The given database was built for a different platform (i.e., CPU type).

HS_DB_MODE_ERROR

The given database was built for a different mode of operation. This error is returned when streaming calls are used with a block or vectored database and vice versa.

HS_BAD_ALIGN

A parameter passed to this function was not correctly aligned.

HS_BAD_ALLOC

The memory allocator (either malloc() or the allocator set with hs_set_allocator()) did not correctly return memory suitably aligned for the largest representable data type on this platform.

HS_SCRATCH_IN_USE

The scratch region was already in use.

This error is returned when Hyperscan is able to detect that the scratch region given is already in use by another Hyperscan API call.

A separate scratch region, allocated with hs_alloc_scratch() or hs_clone_scratch(), is required for every concurrent caller of the Hyperscan API.

For example, this error might be returned when hs_scan() has been called inside a callback delivered by a currently-executing hs_scan() call using the same scratch region.

Note: Not all concurrent uses of scratch regions may be detected. This error is intended as a best-effort debugging tool, not a guarantee.

HS_ARCH_ERROR

Unsupported CPU architecture.

This error is returned when Hyperscan is able to detect that the current system does not support the required instruction set.

At a minimum, Hyperscan requires Supplemental Streaming SIMD Extensions 3 (SSSE3).

HS_INSUFFICIENT_SPACE

Provided buffer was too small.

This error indicates that there was insufficient space in the buffer. The call should be repeated with a larger provided buffer.

Note: in this situation, it is normal for the amount of space required to be returned in the same manner as the used space would have been returned if the call was successful.

HS_UNKNOWN_ERROR

Unexpected internal error.

This error indicates that there was unexpected matching behaviors. This could be related to invalid usage of stream and scratch space or invalid memory operations by users.

Typedefs

typedef struct hs_database hs_database_t

A Hyperscan pattern database.

Generated by one of the Hyperscan compiler functions:

typedef int hs_error_t

A type for errors returned by Hyperscan functions.

typedef void *(*hs_alloc_t)(size_t size)

The type of the callback function that will be used by Hyperscan to allocate more memory at runtime as required, for example in hs_open_stream() to allocate stream state.

If Hyperscan is to be used in a multi-threaded, or similarly concurrent environment, the allocation function will need to be re-entrant, or similarly safe for concurrent use.

Param size

The number of bytes to allocate.

Return

A pointer to the region of memory allocated, or NULL on error.

typedef void (*hs_free_t)(void *ptr)

The type of the callback function that will be used by Hyperscan to free memory regions previously allocated using the hs_alloc_t function.

Param ptr

The region of memory to be freed.

Functions

hs_error_t hs_free_database(hs_database_t *db)

Free a compiled pattern database.

The free callback set by hs_set_database_allocator() (or hs_set_allocator()) will be used by this function.

Parameters
  • db – A compiled pattern database. NULL may also be safely provided, in which case the function does nothing.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_serialize_database(const hs_database_t *db, char **bytes, size_t *length)

Serialize a pattern database to a stream of bytes.

The allocator callback set by hs_set_misc_allocator() (or hs_set_allocator()) will be used by this function.

Parameters
  • db – A compiled pattern database.

  • bytes – On success, a pointer to an array of bytes will be returned here. These bytes can be subsequently relocated or written to disk. The caller is responsible for freeing this block.

  • length – On success, the number of bytes in the generated byte array will be returned here.

Returns

HS_SUCCESS on success, HS_NOMEM if the byte array cannot be allocated, other values may be returned if errors are detected.

hs_error_t hs_deserialize_database(const char *bytes, const size_t length, hs_database_t **db)

Reconstruct a pattern database from a stream of bytes previously generated by hs_serialize_database().

This function will allocate sufficient space for the database using the allocator set with hs_set_database_allocator() (or hs_set_allocator()); to use a pre-allocated region of memory, use the hs_deserialize_database_at() function.

Parameters
Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_deserialize_database_at(const char *bytes, const size_t length, hs_database_t *db)

Reconstruct a pattern database from a stream of bytes previously generated by hs_serialize_database() at a given memory location.

This function (unlike hs_deserialize_database()) will write the reconstructed database to the memory location given in the db parameter. The amount of space required at this location can be determined with the hs_serialized_database_size() function.

Parameters
  • bytes – A byte array generated by hs_serialize_database() representing a compiled pattern database.

  • length – The length of the byte array generated by hs_serialize_database(). This should be the same value as that returned by hs_serialize_database().

  • db – Pointer to an 8-byte aligned block of memory of sufficient size to hold the deserialized database. On success, the reconstructed database will be written to this location. This database can then be used for pattern matching. The user is responsible for freeing this memory; the hs_free_database() call should not be used.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_stream_size(const hs_database_t *database, size_t *stream_size)

Provides the size of the stream state allocated by a single stream opened against the given database.

Parameters
  • database – Pointer to a compiled (streaming mode) pattern database.

  • stream_size – On success, the size in bytes of an individual stream opened against the given database is placed in this parameter.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_database_size(const hs_database_t *database, size_t *database_size)

Provides the size of the given database in bytes.

Parameters
  • database – Pointer to compiled pattern database.

  • database_size – On success, the size of the compiled database in bytes is placed in this parameter.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_serialized_database_size(const char *bytes, const size_t length, size_t *deserialized_size)

Utility function for reporting the size that would be required by a database if it were deserialized.

This can be used to allocate a shared memory region or other “special” allocation prior to deserializing with the hs_deserialize_database_at() function.

Parameters
Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_database_info(const hs_database_t *database, char **info)

Utility function providing information about a database.

Parameters
  • database – Pointer to a compiled database.

  • info – On success, a string containing the version and platform information for the supplied database is placed in the parameter. The string is allocated using the allocator supplied in hs_set_misc_allocator() (or malloc() if no allocator was set) and should be freed by the caller.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_serialized_database_info(const char *bytes, size_t length, char **info)

Utility function providing information about a serialized database.

Parameters
  • bytes – Pointer to a serialized database.

  • length – Length in bytes of the serialized database.

  • info – On success, a string containing the version and platform information for the supplied serialized database is placed in the parameter. The string is allocated using the allocator supplied in hs_set_misc_allocator() (or malloc() if no allocator was set) and should be freed by the caller.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_set_allocator(hs_alloc_t alloc_func, hs_free_t free_func)

Set the allocate and free functions used by Hyperscan for allocating memory at runtime for stream state, scratch space, database bytecode, and various other data structure returned by the Hyperscan API.

The function is equivalent to calling hs_set_stream_allocator(), hs_set_scratch_allocator(), hs_set_database_allocator() and hs_set_misc_allocator() with the provided parameters.

This call will override any previous allocators that have been set.

Note: there is no way to change the allocator used for temporary objects created during the various compile calls (hs_compile(), hs_compile_multi(), hs_compile_ext_multi()).

Parameters
  • alloc_func – A callback function pointer that allocates memory. This function must return memory suitably aligned for the largest representable data type on this platform.

  • free_func – A callback function pointer that frees allocated memory.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_set_database_allocator(hs_alloc_t alloc_func, hs_free_t free_func)

Set the allocate and free functions used by Hyperscan for allocating memory for database bytecode produced by the compile calls (hs_compile(), hs_compile_multi(), hs_compile_ext_multi()) and by database deserialization (hs_deserialize_database()).

If no database allocation functions are set, or if NULL is used in place of both parameters, then memory allocation will default to standard methods (such as the system malloc() and free() calls).

This call will override any previous database allocators that have been set.

Note: the database allocator may also be set by calling hs_set_allocator().

Note: there is no way to change how temporary objects created during the various compile calls (hs_compile(), hs_compile_multi(), hs_compile_ext_multi()) are allocated.

Parameters
  • alloc_func – A callback function pointer that allocates memory. This function must return memory suitably aligned for the largest representable data type on this platform.

  • free_func – A callback function pointer that frees allocated memory.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_set_misc_allocator(hs_alloc_t alloc_func, hs_free_t free_func)

Set the allocate and free functions used by Hyperscan for allocating memory for items returned by the Hyperscan API such as hs_compile_error_t, hs_expr_info_t and serialized databases.

If no misc allocation functions are set, or if NULL is used in place of both parameters, then memory allocation will default to standard methods (such as the system malloc() and free() calls).

This call will override any previous misc allocators that have been set.

Note: the misc allocator may also be set by calling hs_set_allocator().

Parameters
  • alloc_func – A callback function pointer that allocates memory. This function must return memory suitably aligned for the largest representable data type on this platform.

  • free_func – A callback function pointer that frees allocated memory.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_set_scratch_allocator(hs_alloc_t alloc_func, hs_free_t free_func)

Set the allocate and free functions used by Hyperscan for allocating memory for scratch space by hs_alloc_scratch() and hs_clone_scratch().

If no scratch allocation functions are set, or if NULL is used in place of both parameters, then memory allocation will default to standard methods (such as the system malloc() and free() calls).

This call will override any previous scratch allocators that have been set.

Note: the scratch allocator may also be set by calling hs_set_allocator().

Parameters
  • alloc_func – A callback function pointer that allocates memory. This function must return memory suitably aligned for the largest representable data type on this platform.

  • free_func – A callback function pointer that frees allocated memory.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_set_stream_allocator(hs_alloc_t alloc_func, hs_free_t free_func)

Set the allocate and free functions used by Hyperscan for allocating memory for stream state by hs_open_stream().

If no stream allocation functions are set, or if NULL is used in place of both parameters, then memory allocation will default to standard methods (such as the system malloc() and free() calls).

This call will override any previous stream allocators that have been set.

Note: the stream allocator may also be set by calling hs_set_allocator().

Parameters
  • alloc_func – A callback function pointer that allocates memory. This function must return memory suitably aligned for the largest representable data type on this platform.

  • free_func – A callback function pointer that frees allocated memory.

Returns

HS_SUCCESS on success, other values on failure.

const char *hs_version(void)

Utility function for identifying this release version.

Returns

A string containing the version number of this release build and the date of the build. It is allocated statically, so it does not need to be freed by the caller.

hs_error_t hs_valid_platform(void)

Utility function to test the current system architecture.

Hyperscan requires the Supplemental Streaming SIMD Extensions 3 instruction set. This function can be called on any x86 platform to determine if the system provides the required instruction set.

This function does not test for more advanced features if Hyperscan has been built for a more specific architecture, for example the AVX2 instruction set.

Returns

HS_SUCCESS on success, HS_ARCH_ERROR if system does not support Hyperscan.

File: hs_compile.h

The Hyperscan compiler API definition.

Hyperscan is a high speed regular expression engine.

This header contains functions for compiling regular expressions into Hyperscan databases that can be used by the Hyperscan runtime.

Defines

HS_EXT_FLAG_MIN_OFFSET

Flag indicating that the hs_expr_ext::min_offset field is used.

HS_EXT_FLAG_MAX_OFFSET

Flag indicating that the hs_expr_ext::max_offset field is used.

HS_EXT_FLAG_MIN_LENGTH

Flag indicating that the hs_expr_ext::min_length field is used.

HS_EXT_FLAG_EDIT_DISTANCE

Flag indicating that the hs_expr_ext::edit_distance field is used.

HS_EXT_FLAG_HAMMING_DISTANCE

Flag indicating that the hs_expr_ext::hamming_distance field is used.

HS_FLAG_CASELESS

Compile flag: Set case-insensitive matching.

This flag sets the expression to be matched case-insensitively by default. The expression may still use PCRE tokens (notably (?i) and (?-i)) to switch case-insensitive matching on and off.

HS_FLAG_DOTALL

Compile flag: Matching a . will not exclude newlines.

This flag sets any instances of the . token to match newline characters as well as all other characters. The PCRE specification states that the . token does not match newline characters by default, so without this flag the . token will not cross line boundaries.

HS_FLAG_MULTILINE

Compile flag: Set multi-line anchoring.

This flag instructs the expression to make the ^ and $ tokens match newline characters as well as the start and end of the stream. If this flag is not specified, the ^ token will only ever match at the start of a stream, and the $ token will only ever match at the end of a stream within the guidelines of the PCRE specification.

HS_FLAG_SINGLEMATCH

Compile flag: Set single-match only mode.

This flag sets the expression’s match ID to match at most once. In streaming mode, this means that the expression will return only a single match over the lifetime of the stream, rather than reporting every match as per standard Hyperscan semantics. In block mode or vectored mode, only the first match for each invocation of hs_scan() or hs_scan_vector() will be returned.

If multiple expressions in the database share the same match ID, then they either must all specify HS_FLAG_SINGLEMATCH or none of them specify HS_FLAG_SINGLEMATCH. If a group of expressions sharing a match ID specify the flag, then at most one match with the match ID will be generated per stream.

Note: The use of this flag in combination with HS_FLAG_SOM_LEFTMOST is not currently supported.

HS_FLAG_ALLOWEMPTY

Compile flag: Allow expressions that can match against empty buffers.

This flag instructs the compiler to allow expressions that can match against empty buffers, such as .?, .*, (a|). Since Hyperscan can return every possible match for an expression, such expressions generally execute very slowly; the default behaviour is to return an error when an attempt to compile one is made. Using this flag will force the compiler to allow such an expression.

HS_FLAG_UTF8

Compile flag: Enable UTF-8 mode for this expression.

This flag instructs Hyperscan to treat the pattern as a sequence of UTF-8 characters. The results of scanning invalid UTF-8 sequences with a Hyperscan library that has been compiled with one or more patterns using this flag are undefined.

HS_FLAG_UCP

Compile flag: Enable Unicode property support for this expression.

This flag instructs Hyperscan to use Unicode properties, rather than the default ASCII interpretations, for character mnemonics like \w and \s as well as the POSIX character classes. It is only meaningful in conjunction with HS_FLAG_UTF8.

HS_FLAG_PREFILTER

Compile flag: Enable prefiltering mode for this expression.

This flag instructs Hyperscan to compile an “approximate” version of this pattern for use in a prefiltering application, even if Hyperscan does not support the pattern in normal operation.

The set of matches returned when this flag is used is guaranteed to be a superset of the matches specified by the non-prefiltering expression.

If the pattern contains pattern constructs not supported by Hyperscan (such as zero-width assertions, back-references or conditional references) these constructs will be replaced internally with broader constructs that may match more often.

Furthermore, in prefiltering mode Hyperscan may simplify a pattern that would otherwise return a “Pattern too large” error at compile time, or for performance reasons (subject to the matching guarantee above).

It is generally expected that the application will subsequently confirm prefilter matches with another regular expression matcher that can provide exact matches for the pattern.

Note: The use of this flag in combination with HS_FLAG_SOM_LEFTMOST is not currently supported.

HS_FLAG_SOM_LEFTMOST

Compile flag: Enable leftmost start of match reporting.

This flag instructs Hyperscan to report the leftmost possible start of match offset when a match is reported for this expression. (By default, no start of match is returned.)

For all the 3 modes, enabling this behaviour may reduce performance. And particularly, it may increase stream state requirements in streaming mode.

HS_FLAG_COMBINATION

Compile flag: Logical combination.

This flag instructs Hyperscan to parse this expression as logical combination syntax. Logical constraints consist of operands, operators and parentheses. The operands are expression indices, and operators can be ‘!’(NOT), ‘&’(AND) or ‘|’(OR). For example: (101&102&103)|(104&!105) ((301|302)&303)&(304|305)

HS_FLAG_QUIET

Compile flag: Don’t do any match reporting.

This flag instructs Hyperscan to ignore match reporting for this expression. It is designed to be used on the sub-expressions in logical combinations.

HS_CPU_FEATURES_AVX2

CPU features flag - Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2)

Setting this flag indicates that the target platform supports AVX2 instructions.

HS_CPU_FEATURES_AVX512

CPU features flag - Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX512)

Setting this flag indicates that the target platform supports AVX512 instructions, specifically AVX-512BW. Using AVX512 implies the use of AVX2.

HS_CPU_FEATURES_AVX512VBMI

CPU features flag - Intel(R) Advanced Vector Extensions 512 Vector Byte Manipulation Instructions (Intel(R) AVX512VBMI)

Setting this flag indicates that the target platform supports AVX512VBMI instructions. Using AVX512VBMI implies the use of AVX512.

HS_TUNE_FAMILY_GENERIC

Tuning Parameter - Generic

This indicates that the compiled database should not be tuned for any particular target platform.

HS_TUNE_FAMILY_SNB

Tuning Parameter - Intel(R) microarchitecture code name Sandy Bridge

This indicates that the compiled database should be tuned for the Sandy Bridge microarchitecture.

HS_TUNE_FAMILY_IVB

Tuning Parameter - Intel(R) microarchitecture code name Ivy Bridge

This indicates that the compiled database should be tuned for the Ivy Bridge microarchitecture.

HS_TUNE_FAMILY_HSW

Tuning Parameter - Intel(R) microarchitecture code name Haswell

This indicates that the compiled database should be tuned for the Haswell microarchitecture.

HS_TUNE_FAMILY_SLM

Tuning Parameter - Intel(R) microarchitecture code name Silvermont

This indicates that the compiled database should be tuned for the Silvermont microarchitecture.

HS_TUNE_FAMILY_BDW

Tuning Parameter - Intel(R) microarchitecture code name Broadwell

This indicates that the compiled database should be tuned for the Broadwell microarchitecture.

HS_TUNE_FAMILY_SKL

Tuning Parameter - Intel(R) microarchitecture code name Skylake

This indicates that the compiled database should be tuned for the Skylake microarchitecture.

HS_TUNE_FAMILY_SKX

Tuning Parameter - Intel(R) microarchitecture code name Skylake Server

This indicates that the compiled database should be tuned for the Skylake Server microarchitecture.

HS_TUNE_FAMILY_GLM

Tuning Parameter - Intel(R) microarchitecture code name Goldmont

This indicates that the compiled database should be tuned for the Goldmont microarchitecture.

HS_TUNE_FAMILY_ICL

Tuning Parameter - Intel(R) microarchitecture code name Icelake

This indicates that the compiled database should be tuned for the Icelake microarchitecture.

HS_TUNE_FAMILY_ICX

Tuning Parameter - Intel(R) microarchitecture code name Icelake Server

This indicates that the compiled database should be tuned for the Icelake Server microarchitecture.

HS_MODE_BLOCK

Compiler mode flag: Block scan (non-streaming) database.

HS_MODE_NOSTREAM

Compiler mode flag: Alias for HS_MODE_BLOCK.

HS_MODE_STREAM

Compiler mode flag: Streaming database.

HS_MODE_VECTORED

Compiler mode flag: Vectored scanning database.

HS_MODE_SOM_HORIZON_LARGE

Compiler mode flag: use full precision to track start of match offsets in stream state.

This mode will use the most stream state per pattern, but will always return an accurate start of match offset regardless of how far back in the past it was found.

One of the SOM_HORIZON modes must be selected to use the HS_FLAG_SOM_LEFTMOST expression flag.

HS_MODE_SOM_HORIZON_MEDIUM

Compiler mode flag: use medium precision to track start of match offsets in stream state.

This mode will use less stream state than HS_MODE_SOM_HORIZON_LARGE and will limit start of match accuracy to offsets within 2^32 bytes of the end of match offset reported.

One of the SOM_HORIZON modes must be selected to use the HS_FLAG_SOM_LEFTMOST expression flag.

HS_MODE_SOM_HORIZON_SMALL

Compiler mode flag: use limited precision to track start of match offsets in stream state.

This mode will use less stream state than HS_MODE_SOM_HORIZON_LARGE and will limit start of match accuracy to offsets within 2^16 bytes of the end of match offset reported.

One of the SOM_HORIZON modes must be selected to use the HS_FLAG_SOM_LEFTMOST expression flag.

Typedefs

typedef struct hs_compile_error hs_compile_error_t

A type containing error details that is returned by the compile calls (hs_compile(), hs_compile_multi() and hs_compile_ext_multi()) on failure. The caller may inspect the values returned in this type to determine the cause of failure.

Common errors generated during the compile process include:

  • Invalid parameter

    An invalid argument was specified in the compile call.

  • Unrecognised flag

    An unrecognised value was passed in the flags argument.

  • Pattern matches empty buffer

    By default, Hyperscan only supports patterns that will always consume at least one byte of input. Patterns that do not have this property (such as /(abc)?/) will produce this error unless the HS_FLAG_ALLOWEMPTY flag is supplied. Note that such patterns will produce a match for every byte when scanned.

  • Embedded anchors not supported

    Hyperscan only supports the use of anchor meta-characters (such as ^ and $) in patterns where they could only match at the start or end of a buffer. A pattern containing an embedded anchor, such as /abc^def/, can never match, as there is no way for abc to precede the start of the data stream.

  • Bounded repeat is too large

    The pattern contains a repeated construct with very large finite bounds.

  • Unsupported component type

    An unsupported PCRE construct was used in the pattern.

  • Unable to generate bytecode

    This error indicates that Hyperscan was unable to compile a pattern that is syntactically valid. The most common cause is a pattern that is very long and complex or contains a large repeated subpattern.

  • Unable to allocate memory

    The library was unable to allocate temporary storage used during compilation time.

  • Allocator returned misaligned memory

    The memory allocator (either malloc() or the allocator set with hs_set_allocator()) did not correctly return memory suitably aligned for the largest representable data type on this platform.

  • Internal error

    An unexpected error occurred: if this error is reported, please contact the Hyperscan team with a description of the situation.

typedef struct hs_platform_info hs_platform_info_t

A type containing information on the target platform which may optionally be provided to the compile calls (hs_compile(), hs_compile_multi(), hs_compile_ext_multi()).

A hs_platform_info structure may be populated for the current platform by using the hs_populate_platform() call.

typedef struct hs_expr_info hs_expr_info_t

A type containing information related to an expression that is returned by hs_expression_info() or hs_expression_ext_info.

typedef struct hs_expr_ext hs_expr_ext_t

A structure containing additional parameters related to an expression, passed in at build time to hs_compile_ext_multi() or hs_expression_ext_info.

These parameters allow the set of matches produced by a pattern to be constrained at compile time, rather than relying on the application to process unwanted matches at runtime.

Functions

hs_error_t hs_compile(const char *expression, unsigned int flags, unsigned int mode, const hs_platform_info_t *platform, hs_database_t **db, hs_compile_error_t **error)

The basic regular expression compiler.

This is the function call with which an expression is compiled into a Hyperscan database which can be passed to the runtime functions (such as hs_scan(), hs_open_stream(), etc.)

Parameters
  • expression – The NULL-terminated expression to parse. Note that this string must represent ONLY the pattern to be matched, with no delimiters or flags; any global flags should be specified with the flags argument. For example, the expression /abc?def/i should be compiled by providing abc?def as the expression, and HS_FLAG_CASELESS as the flags.

  • flags – Flags which modify the behaviour of the expression. Multiple flags may be used by ORing them together. Valid values are:

    • HS_FLAG_CASELESS - Matching will be performed case-insensitively.

    • HS_FLAG_DOTALL - Matching a . will not exclude newlines.

    • HS_FLAG_MULTILINE - ^ and $ anchors match any newlines in data.

    • HS_FLAG_SINGLEMATCH - Only one match will be generated for the expression per stream.

    • HS_FLAG_ALLOWEMPTY - Allow expressions which can match against an empty string, such as .*.

    • HS_FLAG_UTF8 - Treat this pattern as a sequence of UTF-8 characters.

    • HS_FLAG_UCP - Use Unicode properties for character classes.

    • HS_FLAG_PREFILTER - Compile pattern in prefiltering mode.

    • HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset when a match is found.

    • HS_FLAG_COMBINATION - Parse the expression in logical combination syntax.

    • HS_FLAG_QUIET - Ignore match reporting for this expression. Used for the sub-expressions in logical combinations.

  • mode – Compiler mode flags that affect the database as a whole. One of HS_MODE_STREAM or HS_MODE_BLOCK or HS_MODE_VECTORED must be supplied, to select between the generation of a streaming, block or vectored database. In addition, other flags (beginning with HS_MODE_) may be supplied to enable specific features. See Compile mode flags for more details.

  • platform – If not NULL, the platform structure is used to determine the target platform for the database. If NULL, a database suitable for running on the current host platform is produced.

  • db – On success, a pointer to the generated database will be returned in this parameter, or NULL on failure. The caller is responsible for deallocating the buffer using the hs_free_database() function.

  • error – If the compile fails, a pointer to a hs_compile_error_t will be returned, providing details of the error condition. The caller is responsible for deallocating the buffer using the hs_free_compile_error() function.

Returns

HS_SUCCESS is returned on successful compilation; HS_COMPILER_ERROR on failure, with details provided in the error parameter.

hs_error_t hs_compile_multi(const char *const *expressions, const unsigned int *flags, const unsigned int *ids, unsigned int elements, unsigned int mode, const hs_platform_info_t *platform, hs_database_t **db, hs_compile_error_t **error)

The multiple regular expression compiler.

This is the function call with which a set of expressions is compiled into a database which can be passed to the runtime functions (such as hs_scan(), hs_open_stream(), etc.) Each expression can be labelled with a unique integer which is passed into the match callback to identify the pattern that has matched.

Parameters
  • expressions – Array of NULL-terminated expressions to compile. Note that (as for hs_compile()) these strings must contain only the pattern to be matched, with no delimiters or flags. For example, the expression /abc?def/i should be compiled by providing abc?def as the first string in the expressions array, and HS_FLAG_CASELESS as the first value in the flags array.

  • flags – Array of flags which modify the behaviour of each expression. Multiple flags may be used by ORing them together. Specifying the NULL pointer in place of an array will set the flags value for all patterns to zero. Valid values are:

    • HS_FLAG_CASELESS - Matching will be performed case-insensitively.

    • HS_FLAG_DOTALL - Matching a . will not exclude newlines.

    • HS_FLAG_MULTILINE - ^ and $ anchors match any newlines in data.

    • HS_FLAG_SINGLEMATCH - Only one match will be generated by patterns with this match id per stream.

    • HS_FLAG_ALLOWEMPTY - Allow expressions which can match against an empty string, such as .*.

    • HS_FLAG_UTF8 - Treat this pattern as a sequence of UTF-8 characters.

    • HS_FLAG_UCP - Use Unicode properties for character classes.

    • HS_FLAG_PREFILTER - Compile pattern in prefiltering mode.

    • HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset when a match is found.

    • HS_FLAG_COMBINATION - Parse the expression in logical combination syntax.

    • HS_FLAG_QUIET - Ignore match reporting for this expression. Used for the sub-expressions in logical combinations.

  • ids – An array of integers specifying the ID number to be associated with the corresponding pattern in the expressions array. Specifying the NULL pointer in place of an array will set the ID value for all patterns to zero.

  • elements – The number of elements in the input arrays.

  • mode – Compiler mode flags that affect the database as a whole. One of HS_MODE_STREAM or HS_MODE_BLOCK or HS_MODE_VECTORED must be supplied, to select between the generation of a streaming, block or vectored database. In addition, other flags (beginning with HS_MODE_) may be supplied to enable specific features. See Compile mode flags for more details.

  • platform – If not NULL, the platform structure is used to determine the target platform for the database. If NULL, a database suitable for running on the current host platform is produced.

  • db – On success, a pointer to the generated database will be returned in this parameter, or NULL on failure. The caller is responsible for deallocating the buffer using the hs_free_database() function.

  • error – If the compile fails, a pointer to a hs_compile_error_t will be returned, providing details of the error condition. The caller is responsible for deallocating the buffer using the hs_free_compile_error() function.

Returns

HS_SUCCESS is returned on successful compilation; HS_COMPILER_ERROR on failure, with details provided in the error parameter.

hs_error_t hs_compile_ext_multi(const char *const *expressions, const unsigned int *flags, const unsigned int *ids, const hs_expr_ext_t *const *ext, unsigned int elements, unsigned int mode, const hs_platform_info_t *platform, hs_database_t **db, hs_compile_error_t **error)

The multiple regular expression compiler with extended parameter support.

This function call compiles a group of expressions into a database in the same way as hs_compile_multi(), but allows additional parameters to be specified via an hs_expr_ext_t structure per expression.

Parameters
  • expressions – Array of NULL-terminated expressions to compile. Note that (as for hs_compile()) these strings must contain only the pattern to be matched, with no delimiters or flags. For example, the expression /abc?def/i should be compiled by providing abc?def as the first string in the expressions array, and HS_FLAG_CASELESS as the first value in the flags array.

  • flags – Array of flags which modify the behaviour of each expression. Multiple flags may be used by ORing them together. Specifying the NULL pointer in place of an array will set the flags value for all patterns to zero. Valid values are:

    • HS_FLAG_CASELESS - Matching will be performed case-insensitively.

    • HS_FLAG_DOTALL - Matching a . will not exclude newlines.

    • HS_FLAG_MULTILINE - ^ and $ anchors match any newlines in data.

    • HS_FLAG_SINGLEMATCH - Only one match will be generated by patterns with this match id per stream.

    • HS_FLAG_ALLOWEMPTY - Allow expressions which can match against an empty string, such as .*.

    • HS_FLAG_UTF8 - Treat this pattern as a sequence of UTF-8 characters.

    • HS_FLAG_UCP - Use Unicode properties for character classes.

    • HS_FLAG_PREFILTER - Compile pattern in prefiltering mode.

    • HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset when a match is found.

    • HS_FLAG_COMBINATION - Parse the expression in logical combination syntax.

    • HS_FLAG_QUIET - Ignore match reporting for this expression. Used for the sub-expressions in logical combinations.

  • ids – An array of integers specifying the ID number to be associated with the corresponding pattern in the expressions array. Specifying the NULL pointer in place of an array will set the ID value for all patterns to zero.

  • ext – An array of pointers to filled hs_expr_ext_t structures that define extended behaviour for each pattern. NULL may be specified if no extended behaviour is needed for an individual pattern, or in place of the whole array if it is not needed for any expressions. Memory used by these structures must be both allocated and freed by the caller.

  • elements – The number of elements in the input arrays.

  • mode – Compiler mode flags that affect the database as a whole. One of HS_MODE_STREAM, HS_MODE_BLOCK or HS_MODE_VECTORED must be supplied, to select between the generation of a streaming, block or vectored database. In addition, other flags (beginning with HS_MODE_) may be supplied to enable specific features. See Compile mode flags for more details.

  • platform – If not NULL, the platform structure is used to determine the target platform for the database. If NULL, a database suitable for running on the current host platform is produced.

  • db – On success, a pointer to the generated database will be returned in this parameter, or NULL on failure. The caller is responsible for deallocating the buffer using the hs_free_database() function.

  • error – If the compile fails, a pointer to a hs_compile_error_t will be returned, providing details of the error condition. The caller is responsible for deallocating the buffer using the hs_free_compile_error() function.

Returns

HS_SUCCESS is returned on successful compilation; HS_COMPILER_ERROR on failure, with details provided in the error parameter.

hs_error_t hs_compile_lit(const char *expression, unsigned flags, const size_t len, unsigned mode, const hs_platform_info_t *platform, hs_database_t **db, hs_compile_error_t **error)

The basic pure literal expression compiler.

This is the function call with which a pure literal expression (not a common regular expression) is compiled into a Hyperscan database which can be passed to the runtime functions (such as hs_scan(), hs_open_stream(), etc.)

Parameters
  • expression – The NULL-terminated expression to parse. Note that this string must represent ONLY the pattern to be matched, with no delimiters or flags; any global flags should be specified with the flags argument. For example, the expression /abc?def/i should be compiled by providing abc?def as the expression, and HS_FLAG_CASELESS as the flags. Meanwhile, the string content shall be fully parsed in a literal sense without any regular grammars. For example, the expression abc? simply means a char sequence of a, b, c, and ?. The ? here doesn’t mean 0 or 1 quantifier under regular semantics.

  • flags – Flags which modify the behaviour of the expression. Multiple flags may be used by ORing them together. Compared to hs_compile(), fewer valid values are provided:

    • HS_FLAG_CASELESS - Matching will be performed case-insensitively.

    • HS_FLAG_SINGLEMATCH - Only one match will be generated for the expression per stream.

    • HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset when a match is found.

  • len – The length of the text content of the pure literal expression. As the text content indicated by expression is treated as single character one by one, the special terminating character \0 should be allowed to appear in expression, and not treated as a terminator for a string. Thus, the end of a pure literal expression cannot be indicated by identifying \0, but by counting to the expression length.

  • mode – Compiler mode flags that affect the database as a whole. One of HS_MODE_STREAM or HS_MODE_BLOCK or HS_MODE_VECTORED must be supplied, to select between the generation of a streaming, block or vectored database. In addition, other flags (beginning with HS_MODE_) may be supplied to enable specific features. See Compile mode flags for more details.

  • platform – If not NULL, the platform structure is used to determine the target platform for the database. If NULL, a database suitable for running on the current host platform is produced.

  • db – On success, a pointer to the generated database will be returned in this parameter, or NULL on failure. The caller is responsible for deallocating the buffer using the hs_free_database() function.

  • error – If the compile fails, a pointer to a hs_compile_error_t will be returned, providing details of the error condition. The caller is responsible for deallocating the buffer using the hs_free_compile_error() function.

Returns

HS_SUCCESS is returned on successful compilation; HS_COMPILER_ERROR on failure, with details provided in the error parameter.

hs_error_t hs_compile_lit_multi(const char *const *expressions, const unsigned *flags, const unsigned *ids, const size_t *lens, unsigned elements, unsigned mode, const hs_platform_info_t *platform, hs_database_t **db, hs_compile_error_t **error)

The multiple pure literal expression compiler.

This is the function call with which a set of pure literal expressions is compiled into a database which can be passed to the runtime functions (such as hs_scan(), hs_open_stream(), etc.) Each expression can be labelled with a unique integer which is passed into the match callback to identify the pattern that has matched.

Parameters
  • expressions – The NULL-terminated expression to parse. Note that this string must represent ONLY the pattern to be matched, with no delimiters or flags; any global flags should be specified with the flags argument. For example, the expression /abc?def/i should be compiled by providing abc?def as the expression, and HS_FLAG_CASELESS as the flags. Meanwhile, the string content shall be fully parsed in a literal sense without any regular grammars. For example, the expression abc? simply means a char sequence of a, b, c, and ?. The ? here doesn’t mean 0 or 1 quantifier under regular semantics.

  • flags – Array of flags which modify the behaviour of each expression. Multiple flags may be used by ORing them together. Specifying the NULL pointer in place of an array will set the flags value for all patterns to zero. Compared to hs_compile_multi(), fewer valid values are provided:

    • HS_FLAG_CASELESS - Matching will be performed case-insensitively.

    • HS_FLAG_SINGLEMATCH - Only one match will be generated for the expression per stream.

    • HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset when a match is found.

  • ids – An array of integers specifying the ID number to be associated with the corresponding pattern in the expressions array. Specifying the NULL pointer in place of an array will set the ID value for all patterns to zero.

  • lens – Array of lengths of the text content of each pure literal expression. As the text content indicated by expression is treated as single character one by one, the special terminating character \0 should be allowed to appear in expression, and not treated as a terminator for a string. Thus, the end of a pure literal expression cannot be indicated by identifying \0, but by counting to the expression length.

  • elements – The number of elements in the input arrays.

  • mode – Compiler mode flags that affect the database as a whole. One of HS_MODE_STREAM or HS_MODE_BLOCK or HS_MODE_VECTORED must be supplied, to select between the generation of a streaming, block or vectored database. In addition, other flags (beginning with HS_MODE_) may be supplied to enable specific features. See Compile mode flags for more details.

  • platform – If not NULL, the platform structure is used to determine the target platform for the database. If NULL, a database suitable for running on the current host platform is produced.

  • db – On success, a pointer to the generated database will be returned in this parameter, or NULL on failure. The caller is responsible for deallocating the buffer using the hs_free_database() function.

  • error – If the compile fails, a pointer to a hs_compile_error_t will be returned, providing details of the error condition. The caller is responsible for deallocating the buffer using the hs_free_compile_error() function.

Returns

HS_SUCCESS is returned on successful compilation; HS_COMPILER_ERROR on failure, with details provided in the error parameter.

hs_error_t hs_free_compile_error(hs_compile_error_t *error)

Free an error structure generated by hs_compile(), hs_compile_multi() or hs_compile_ext_multi().

Parameters
Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_expression_info(const char *expression, unsigned int flags, hs_expr_info_t **info, hs_compile_error_t **error)

Utility function providing information about a regular expression. The information provided in hs_expr_info_t includes the minimum and maximum width of a pattern match.

Note: successful analysis of an expression with this function does not imply that compilation of the same expression (via hs_compile(), hs_compile_multi() or hs_compile_ext_multi()) would succeed. This function may return HS_SUCCESS for regular expressions that Hyperscan cannot compile.

Note: some per-pattern flags (such as HS_FLAG_ALLOWEMPTY, HS_FLAG_SOM_LEFTMOST) are accepted by this call, but as they do not affect the properties returned in the hs_expr_info_t structure, they will not affect the outcome of this function.

Parameters
  • expression – The NULL-terminated expression to parse. Note that this string must represent ONLY the pattern to be matched, with no delimiters or flags; any global flags should be specified with the flags argument. For example, the expression /abc?def/i should be compiled by providing abc?def as the expression, and HS_FLAG_CASELESS as the flags.

  • flags – Flags which modify the behaviour of the expression. Multiple flags may be used by ORing them together. Valid values are:

    • HS_FLAG_CASELESS - Matching will be performed case-insensitively.

    • HS_FLAG_DOTALL - Matching a . will not exclude newlines.

    • HS_FLAG_MULTILINE - ^ and $ anchors match any newlines in data.

    • HS_FLAG_SINGLEMATCH - Only one match will be generated by the expression per stream.

    • HS_FLAG_ALLOWEMPTY - Allow expressions which can match against an empty string, such as .*.

    • HS_FLAG_UTF8 - Treat this pattern as a sequence of UTF-8 characters.

    • HS_FLAG_UCP - Use Unicode properties for character classes.

    • HS_FLAG_PREFILTER - Compile pattern in prefiltering mode.

    • HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset when a match is found.

    • HS_FLAG_QUIET - This flag will be ignored.

  • info – On success, a pointer to the pattern information will be returned in this parameter, or NULL on failure. This structure is allocated using the allocator supplied in hs_set_allocator() (or malloc() if no allocator was set) and should be freed by the caller.

  • error – If the call fails, a pointer to a hs_compile_error_t will be returned, providing details of the error condition. The caller is responsible for deallocating the buffer using the hs_free_compile_error() function.

Returns

HS_SUCCESS is returned on successful compilation; HS_COMPILER_ERROR on failure, with details provided in the error parameter.

hs_error_t hs_expression_ext_info(const char *expression, unsigned int flags, const hs_expr_ext_t *ext, hs_expr_info_t **info, hs_compile_error_t **error)

Utility function providing information about a regular expression, with extended parameter support. The information provided in hs_expr_info_t includes the minimum and maximum width of a pattern match.

Note: successful analysis of an expression with this function does not imply that compilation of the same expression (via hs_compile(), hs_compile_multi() or hs_compile_ext_multi()) would succeed. This function may return HS_SUCCESS for regular expressions that Hyperscan cannot compile.

Note: some per-pattern flags (such as HS_FLAG_ALLOWEMPTY, HS_FLAG_SOM_LEFTMOST) are accepted by this call, but as they do not affect the properties returned in the hs_expr_info_t structure, they will not affect the outcome of this function.

Parameters
  • expression – The NULL-terminated expression to parse. Note that this string must represent ONLY the pattern to be matched, with no delimiters or flags; any global flags should be specified with the flags argument. For example, the expression /abc?def/i should be compiled by providing abc?def as the expression, and HS_FLAG_CASELESS as the flags.

  • flags – Flags which modify the behaviour of the expression. Multiple flags may be used by ORing them together. Valid values are:

    • HS_FLAG_CASELESS - Matching will be performed case-insensitively.

    • HS_FLAG_DOTALL - Matching a . will not exclude newlines.

    • HS_FLAG_MULTILINE - ^ and $ anchors match any newlines in data.

    • HS_FLAG_SINGLEMATCH - Only one match will be generated by the expression per stream.

    • HS_FLAG_ALLOWEMPTY - Allow expressions which can match against an empty string, such as .*.

    • HS_FLAG_UTF8 - Treat this pattern as a sequence of UTF-8 characters.

    • HS_FLAG_UCP - Use Unicode properties for character classes.

    • HS_FLAG_PREFILTER - Compile pattern in prefiltering mode.

    • HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset when a match is found.

    • HS_FLAG_QUIET - This flag will be ignored.

  • ext – A pointer to a filled hs_expr_ext_t structure that defines extended behaviour for this pattern. NULL may be specified if no extended parameters are needed.

  • info – On success, a pointer to the pattern information will be returned in this parameter, or NULL on failure. This structure is allocated using the allocator supplied in hs_set_allocator() (or malloc() if no allocator was set) and should be freed by the caller.

  • error – If the call fails, a pointer to a hs_compile_error_t will be returned, providing details of the error condition. The caller is responsible for deallocating the buffer using the hs_free_compile_error() function.

Returns

HS_SUCCESS is returned on successful compilation; HS_COMPILER_ERROR on failure, with details provided in the error parameter.

hs_error_t hs_populate_platform(hs_platform_info_t *platform)

Populates the platform information based on the current host.

Parameters
  • platform – On success, the pointed to structure is populated based on the current host.

Returns

HS_SUCCESS on success, other values on failure.

struct hs_compile_error
#include <hs_compile.h>

A type containing error details that is returned by the compile calls (hs_compile(), hs_compile_multi() and hs_compile_ext_multi()) on failure. The caller may inspect the values returned in this type to determine the cause of failure.

Common errors generated during the compile process include:

  • Invalid parameter

    An invalid argument was specified in the compile call.

  • Unrecognised flag

    An unrecognised value was passed in the flags argument.

  • Pattern matches empty buffer

    By default, Hyperscan only supports patterns that will always consume at least one byte of input. Patterns that do not have this property (such as /(abc)?/) will produce this error unless the HS_FLAG_ALLOWEMPTY flag is supplied. Note that such patterns will produce a match for every byte when scanned.

  • Embedded anchors not supported

    Hyperscan only supports the use of anchor meta-characters (such as ^ and $) in patterns where they could only match at the start or end of a buffer. A pattern containing an embedded anchor, such as /abc^def/, can never match, as there is no way for abc to precede the start of the data stream.

  • Bounded repeat is too large

    The pattern contains a repeated construct with very large finite bounds.

  • Unsupported component type

    An unsupported PCRE construct was used in the pattern.

  • Unable to generate bytecode

    This error indicates that Hyperscan was unable to compile a pattern that is syntactically valid. The most common cause is a pattern that is very long and complex or contains a large repeated subpattern.

  • Unable to allocate memory

    The library was unable to allocate temporary storage used during compilation time.

  • Allocator returned misaligned memory

    The memory allocator (either malloc() or the allocator set with hs_set_allocator()) did not correctly return memory suitably aligned for the largest representable data type on this platform.

  • Internal error

    An unexpected error occurred: if this error is reported, please contact the Hyperscan team with a description of the situation.

Public Members

char *message

A human-readable error message describing the error.

int expression

The zero-based number of the expression that caused the error (if this can be determined). If the error is not specific to an expression, then this value will be less than zero.

struct hs_platform_info
#include <hs_compile.h>

A type containing information on the target platform which may optionally be provided to the compile calls (hs_compile(), hs_compile_multi(), hs_compile_ext_multi()).

A hs_platform_info structure may be populated for the current platform by using the hs_populate_platform() call.

Public Members

unsigned int tune

Information about the target platform which may be used to guide the optimisation process of the compile.

Use of this field does not limit the processors that the resulting database can run on, but may impact the performance of the resulting database.

unsigned long long cpu_features

Relevant CPU features available on the target platform

This value may be produced by combining HS_CPU_FEATURE_* flags (such as HS_CPU_FEATURES_AVX2). Multiple CPU features may be or’ed together to produce the value.

unsigned long long reserved1

Reserved for future use.

unsigned long long reserved2

Reserved for future use.

struct hs_expr_info
#include <hs_compile.h>

A type containing information related to an expression that is returned by hs_expression_info() or hs_expression_ext_info.

Public Members

unsigned int min_width

The minimum length in bytes of a match for the pattern.

Note: in some cases when using advanced features to suppress matches (such as extended parameters or the HS_FLAG_SINGLEMATCH flag) this may represent a conservative lower bound for the true minimum length of a match.

unsigned int max_width

The maximum length in bytes of a match for the pattern. If the pattern has an unbounded maximum length, this will be set to the maximum value of an unsigned int (UINT_MAX).

Note: in some cases when using advanced features to suppress matches (such as extended parameters or the HS_FLAG_SINGLEMATCH flag) this may represent a conservative upper bound for the true maximum length of a match.

char unordered_matches

Whether this expression can produce matches that are not returned in order, such as those produced by assertions. Zero if false, non-zero if true.

char matches_at_eod

Whether this expression can produce matches at end of data (EOD). In streaming mode, EOD matches are raised during hs_close_stream(), since it is only when hs_close_stream() is called that the EOD location is known. Zero if false, non-zero if true.

Note: trailing \b word boundary assertions may also result in EOD matches as end-of-data can act as a word boundary.

char matches_only_at_eod

Whether this expression can only produce matches at end of data (EOD). In streaming mode, all matches for this expression are raised during hs_close_stream(). Zero if false, non-zero if true.

struct hs_expr_ext
#include <hs_compile.h>

A structure containing additional parameters related to an expression, passed in at build time to hs_compile_ext_multi() or hs_expression_ext_info.

These parameters allow the set of matches produced by a pattern to be constrained at compile time, rather than relying on the application to process unwanted matches at runtime.

Public Members

unsigned long long flags

Flags governing which parts of this structure are to be used by the compiler. See hs_expr_ext_t flags.

unsigned long long min_offset

The minimum end offset in the data stream at which this expression should match successfully. To use this parameter, set the HS_EXT_FLAG_MIN_OFFSET flag in the hs_expr_ext::flags field.

unsigned long long max_offset

The maximum end offset in the data stream at which this expression should match successfully. To use this parameter, set the HS_EXT_FLAG_MAX_OFFSET flag in the hs_expr_ext::flags field.

unsigned long long min_length

The minimum match length (from start to end) required to successfully match this expression. To use this parameter, set the HS_EXT_FLAG_MIN_LENGTH flag in the hs_expr_ext::flags field.

unsigned edit_distance

Allow patterns to approximately match within this edit distance. To use this parameter, set the HS_EXT_FLAG_EDIT_DISTANCE flag in the hs_expr_ext::flags field.

unsigned hamming_distance

Allow patterns to approximately match within this Hamming distance. To use this parameter, set the HS_EXT_FLAG_HAMMING_DISTANCE flag in the hs_expr_ext::flags field.

File: hs_runtime.h

The Hyperscan runtime API definition.

Hyperscan is a high speed regular expression engine.

This header contains functions for using compiled Hyperscan databases for scanning data at runtime.

Defines

HS_OFFSET_PAST_HORIZON

Callback ‘from’ return value, indicating that the start of this match was too early to be tracked with the requested SOM_HORIZON precision.

Typedefs

typedef struct hs_stream hs_stream_t

The stream identifier returned by hs_open_stream().

typedef struct hs_scratch hs_scratch_t

A Hyperscan scratch space.

typedef int (*match_event_handler)(unsigned int id, unsigned long long from, unsigned long long to, unsigned int flags, void *context)

Definition of the match event callback function type.

A callback function matching the defined type must be provided by the application calling the hs_scan(), hs_scan_vector() or hs_scan_stream() functions (or other streaming calls which can produce matches).

This callback function will be invoked whenever a match is located in the target data during the execution of a scan. The details of the match are passed in as parameters to the callback function, and the callback function should return a value indicating whether or not matching should continue on the target data. If no callbacks are desired from a scan call, NULL may be provided in order to suppress match production.

This callback function should not attempt to call Hyperscan API functions on the same stream nor should it attempt to reuse the scratch space allocated for the API calls that caused it to be triggered. Making another call to the Hyperscan library with completely independent parameters should work (for example, scanning a different database in a new stream and with new scratch space), but reusing data structures like stream state and/or scratch space will produce undefined behavior.

Param id

The ID number of the expression that matched. If the expression was a single expression compiled with hs_compile(), this value will be zero.

Param from

  • If a start of match flag is enabled for the current pattern, this argument will be set to the start of match for the pattern assuming that that start of match value lies within the current ‘start of match horizon’ chosen by one of the SOM_HORIZON mode flags.

  • If the start of match value lies outside this horizon (possible only when the SOM_HORIZON value is not HS_MODE_SOM_HORIZON_LARGE), the from value will be set to HS_OFFSET_PAST_HORIZON.

  • This argument will be set to zero if the Start of Match flag is not enabled for the given pattern.

Param to

The offset after the last byte that matches the expression.

Param flags

This is provided for future use and is unused at present.

Param context

The pointer supplied by the user to the hs_scan(), hs_scan_vector() or hs_scan_stream() function.

Return

Non-zero if the matching should cease, else zero. If scanning is performed in streaming mode and a non-zero value is returned, any subsequent calls to hs_scan_stream() for that stream will immediately return with HS_SCAN_TERMINATED.

Functions

hs_error_t hs_open_stream(const hs_database_t *db, unsigned int flags, hs_stream_t **stream)

Open and initialise a stream.

Parameters
  • db – A compiled pattern database.

  • flags – Flags modifying the behaviour of the stream. This parameter is provided for future use and is unused at present.

  • stream – On success, a pointer to the generated hs_stream_t will be returned; NULL on failure.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_scan_stream(hs_stream_t *id, const char *data, unsigned int length, unsigned int flags, hs_scratch_t *scratch, match_event_handler onEvent, void *ctxt)

Write data to be scanned to the opened stream.

This is the function call in which the actual pattern matching takes place as data is written to the stream. Matches will be returned via the match_event_handler callback supplied.

Parameters
  • id – The stream ID (returned by hs_open_stream()) to which the data will be written.

  • data – Pointer to the data to be scanned.

  • length – The number of bytes to scan.

  • flags – Flags modifying the behaviour of the stream. This parameter is provided for future use and is unused at present.

  • scratch – A per-thread scratch space allocated by hs_alloc_scratch().

  • onEvent – Pointer to a match event callback function. If a NULL pointer is given, no matches will be returned.

  • ctxt – The user defined pointer which will be passed to the callback function when a match occurs.

Returns

Returns HS_SUCCESS on success; HS_SCAN_TERMINATED if the match callback indicated that scanning should stop; other values on error.

hs_error_t hs_close_stream(hs_stream_t *id, hs_scratch_t *scratch, match_event_handler onEvent, void *ctxt)

Close a stream.

This function completes matching on the given stream and frees the memory associated with the stream state. After this call, the stream pointed to by id is invalid and can no longer be used. To reuse the stream state after completion, rather than closing it, the hs_reset_stream function can be used.

This function must be called for any stream created with hs_open_stream(), even if scanning has been terminated by a non-zero return from the match callback function.

Note: This operation may result in matches being returned (via calls to the match event callback) for expressions anchored to the end of the data stream (for example, via the use of the $ meta-character). If these matches are not desired, NULL may be provided as the match_event_handler callback.

If NULL is provided as the match_event_handler callback, it is permissible to provide a NULL scratch.

Parameters
  • id – The stream ID returned by hs_open_stream().

  • scratch – A per-thread scratch space allocated by hs_alloc_scratch(). This is allowed to be NULL only if the onEvent callback is also NULL.

  • onEvent – Pointer to a match event callback function. If a NULL pointer is given, no matches will be returned.

  • ctxt – The user defined pointer which will be passed to the callback function when a match occurs.

Returns

Returns HS_SUCCESS on success, other values on failure.

hs_error_t hs_reset_stream(hs_stream_t *id, unsigned int flags, hs_scratch_t *scratch, match_event_handler onEvent, void *context)

Reset a stream to an initial state.

Conceptually, this is equivalent to performing hs_close_stream() on the given stream, followed by a hs_open_stream(). This new stream replaces the original stream in memory, avoiding the overhead of freeing the old stream and allocating the new one.

Note: This operation may result in matches being returned (via calls to the match event callback) for expressions anchored to the end of the original data stream (for example, via the use of the $ meta-character). If these matches are not desired, NULL may be provided as the match_event_handler callback.

Note: the stream will also be tied to the same database.

Parameters
  • id – The stream (as created by hs_open_stream()) to be replaced.

  • flags – Flags modifying the behaviour of the stream. This parameter is provided for future use and is unused at present.

  • scratch – A per-thread scratch space allocated by hs_alloc_scratch(). This is allowed to be NULL only if the onEvent callback is also NULL.

  • onEvent – Pointer to a match event callback function. If a NULL pointer is given, no matches will be returned.

  • context – The user defined pointer which will be passed to the callback function when a match occurs.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_copy_stream(hs_stream_t **to_id, const hs_stream_t *from_id)

Duplicate the given stream. The new stream will have the same state as the original including the current stream offset.

Parameters
  • to_id – On success, a pointer to the new, copied hs_stream_t will be returned; NULL on failure.

  • from_id – The stream (as created by hs_open_stream()) to be copied.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_reset_and_copy_stream(hs_stream_t *to_id, const hs_stream_t *from_id, hs_scratch_t *scratch, match_event_handler onEvent, void *context)

Duplicate the given ‘from’ stream state onto the ‘to’ stream. The ‘to’ stream will first be reset (reporting any EOD matches if a non-NULL onEvent callback handler is provided).

Note: the ‘to’ stream and the ‘from’ stream must be open against the same database.

Parameters
  • to_id – On success, a pointer to the new, copied hs_stream_t will be returned; NULL on failure.

  • from_id – The stream (as created by hs_open_stream()) to be copied.

  • scratch – A per-thread scratch space allocated by hs_alloc_scratch(). This is allowed to be NULL only if the onEvent callback is also NULL.

  • onEvent – Pointer to a match event callback function. If a NULL pointer is given, no matches will be returned.

  • context – The user defined pointer which will be passed to the callback function when a match occurs.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_compress_stream(const hs_stream_t *stream, char *buf, size_t buf_space, size_t *used_space)

Creates a compressed representation of the provided stream in the buffer provided. This compressed representation can be converted back into a stream state by using hs_expand_stream() or hs_reset_and_expand_stream(). The size of the compressed representation will be placed into used_space.

If there is not sufficient space in the buffer to hold the compressed representation, HS_INSUFFICIENT_SPACE will be returned and used_space will be populated with the amount of space required.

Note: this function does not close the provided stream, you may continue to use the stream or to free it with hs_close_stream().

Parameters
  • stream – The stream (as created by hs_open_stream()) to be compressed.

  • buf – Buffer to write the compressed representation into. Note: if the call is just being used to determine the amount of space required, it is allowed to pass NULL here and buf_space as 0.

  • buf_space – The number of bytes in buf. If buf_space is too small, the call will fail with HS_INSUFFICIENT_SPACE.

  • used_space – Pointer to where the amount of used space will be written to. The used buffer space is always less than or equal to buf_space. If the call fails with HS_INSUFFICIENT_SPACE, this pointer will be used to write out the amount of buffer space required.

Returns

HS_SUCCESS on success, HS_INSUFFICIENT_SPACE if the provided buffer is too small.

hs_error_t hs_expand_stream(const hs_database_t *db, hs_stream_t **stream, const char *buf, size_t buf_size)

Decompresses a compressed representation created by hs_compress_stream() into a new stream.

Note: buf must correspond to a complete compressed representation created by hs_compress_stream() of a stream that was opened against db. It is not always possible to detect misuse of this API and behaviour is undefined if these properties are not satisfied.

Parameters
  • db – The compiled pattern database that the compressed stream was opened against.

  • stream – On success, a pointer to the expanded hs_stream_t will be returned; NULL on failure.

  • buf – A compressed representation of a stream. These compressed forms are created by hs_compress_stream().

  • buf_size – The size in bytes of the compressed representation.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_reset_and_expand_stream(hs_stream_t *to_stream, const char *buf, size_t buf_size, hs_scratch_t *scratch, match_event_handler onEvent, void *context)

Decompresses a compressed representation created by hs_compress_stream() on top of the ‘to’ stream. The ‘to’ stream will first be reset (reporting any EOD matches if a non-NULL onEvent callback handler is provided).

Note: the ‘to’ stream must be opened against the same database as the compressed stream.

Note: buf must correspond to a complete compressed representation created by hs_compress_stream() of a stream that was opened against db. It is not always possible to detect misuse of this API and behaviour is undefined if these properties are not satisfied.

Parameters
  • to_stream – A pointer to a valid stream state. A pointer to the expanded hs_stream_t will be returned; NULL on failure.

  • buf – A compressed representation of a stream. These compressed forms are created by hs_compress_stream().

  • buf_size – The size in bytes of the compressed representation.

  • scratch – A per-thread scratch space allocated by hs_alloc_scratch(). This is allowed to be NULL only if the onEvent callback is also NULL.

  • onEvent – Pointer to a match event callback function. If a NULL pointer is given, no matches will be returned.

  • context – The user defined pointer which will be passed to the callback function when a match occurs.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_scan(const hs_database_t *db, const char *data, unsigned int length, unsigned int flags, hs_scratch_t *scratch, match_event_handler onEvent, void *context)

The block (non-streaming) regular expression scanner.

This is the function call in which the actual pattern matching takes place for block-mode pattern databases.

Parameters
  • db – A compiled pattern database.

  • data – Pointer to the data to be scanned.

  • length – The number of bytes to scan.

  • flags – Flags modifying the behaviour of this function. This parameter is provided for future use and is unused at present.

  • scratch – A per-thread scratch space allocated by hs_alloc_scratch() for this database.

  • onEvent – Pointer to a match event callback function. If a NULL pointer is given, no matches will be returned.

  • context – The user defined pointer which will be passed to the callback function.

Returns

Returns HS_SUCCESS on success; HS_SCAN_TERMINATED if the match callback indicated that scanning should stop; other values on error.

hs_error_t hs_scan_vector(const hs_database_t *db, const char *const *data, const unsigned int *length, unsigned int count, unsigned int flags, hs_scratch_t *scratch, match_event_handler onEvent, void *context)

The vectored regular expression scanner.

This is the function call in which the actual pattern matching takes place for vectoring-mode pattern databases.

Parameters
  • db – A compiled pattern database.

  • data – An array of pointers to the data blocks to be scanned.

  • length – An array of lengths (in bytes) of each data block to scan.

  • count – Number of data blocks to scan. This should correspond to the size of of the data and length arrays.

  • flags – Flags modifying the behaviour of this function. This parameter is provided for future use and is unused at present.

  • scratch – A per-thread scratch space allocated by hs_alloc_scratch() for this database.

  • onEvent – Pointer to a match event callback function. If a NULL pointer is given, no matches will be returned.

  • context – The user defined pointer which will be passed to the callback function.

Returns

Returns HS_SUCCESS on success; HS_SCAN_TERMINATED if the match callback indicated that scanning should stop; other values on error.

hs_error_t hs_alloc_scratch(const hs_database_t *db, hs_scratch_t **scratch)

Allocate a “scratch” space for use by Hyperscan.

This is required for runtime use, and one scratch space per thread, or concurrent caller, is required. Any allocator callback set by hs_set_scratch_allocator() or hs_set_allocator() will be used by this function.

Parameters
  • db – The database, as produced by hs_compile().

  • scratch – On first allocation, a pointer to NULL should be provided so a new scratch can be allocated. If a scratch block has been previously allocated, then a pointer to it should be passed back in to see if it is valid for this database block. If a new scratch block is required, the original will be freed and the new one returned, otherwise the previous scratch block will be returned. On success, the scratch block will be suitable for use with the provided database in addition to any databases that original scratch space was suitable for.

Returns

HS_SUCCESS on successful allocation; HS_NOMEM if the allocation fails. Other errors may be returned if invalid parameters are specified.

hs_error_t hs_clone_scratch(const hs_scratch_t *src, hs_scratch_t **dest)

Allocate a scratch space that is a clone of an existing scratch space.

This is useful when multiple concurrent threads will be using the same set of compiled databases, and another scratch space is required. Any allocator callback set by hs_set_scratch_allocator() or hs_set_allocator() will be used by this function.

Parameters
  • src – The existing hs_scratch_t to be cloned.

  • dest – A pointer to the new scratch space will be returned here.

Returns

HS_SUCCESS on success; HS_NOMEM if the allocation fails. Other errors may be returned if invalid parameters are specified.

hs_error_t hs_scratch_size(const hs_scratch_t *scratch, size_t *scratch_size)

Provides the size of the given scratch space.

Parameters
  • scratch – A per-thread scratch space allocated by hs_alloc_scratch() or hs_clone_scratch().

  • scratch_size – On success, the size of the scratch space in bytes is placed in this parameter.

Returns

HS_SUCCESS on success, other values on failure.

hs_error_t hs_free_scratch(hs_scratch_t *scratch)

Free a scratch block previously allocated by hs_alloc_scratch() or hs_clone_scratch().

The free callback set by hs_set_scratch_allocator() or hs_set_allocator() will be used by this function.

Parameters
  • scratch – The scratch block to be freed. NULL may also be safely provided.

Returns

HS_SUCCESS on success, other values on failure.