Loaders

Uncompressed File Loaders

class svs.VectorDataLoader

Handle representing an uncompressed vector data file.

__init__(self: svs::python.VectorDataLoader, path: str, data_type: Optional[svs::python.DataType] = None, dims: Optional[int] = None) → None

Construct a new svs.VectorDataLoader.

Parameters:

path (str) –
The path to the file to load. This can either be:
- The path to the directory where a previous vector dataset was saved (preferred).
- The direct path to the vector data file itself. In this case, the type of the file will try to be inferred automatically. Recognized extensions: “.[b/i/f]vecs”, “.bin”, and “.svs”.
data_type (svs.DataType) – The native type of the elements in the dataset.
dims (int) – The expected dimsionality of the dataset. While this argument is generally optional, providing it may yield runtime speedups.

property data_type

Access the assigned data type.

Type:: Read/Write (svs.DataType)

property dims

Access the expected dimensionality.

Type:: Read/Write (int)

property filepath

Access the underlying file path.

Type:: Read/Write (str)

LVQ Loader

The LVQ loader provides lazy compression of uncompressed data and reloading of previously saved LVQ data.

class svs.LVQLoader

Generic LVQ Loader

__init__(*args, **kwargs)

Overloaded function.

__init__(self: svs::python.LVQLoader, datafile: svs::python.VectorDataLoader, primary: int, residual: int = 0, padding: int = 0, strategy: svs::python.LVQStrategy = <LVQStrategy.Auto: 0>) -> None

Construct a loader that will lazily compress the results of the data loader. Requires an appropriate back-end to be compiled for all combinations of primary and residual bits.

Parameters:

loader (svs.VectorDataLoader) – The uncompressed dataset to compress in-memory.
primary (int) – The number of bits to use for compression in the primary dataset.
residual (int) – The number of bits to use for compression in the residual dataset. Default: 0.
padding (int) – The value (in bytes) to align the beginning of each compressed vectors. Values of 32 or 64 may offer the best performance at the cost of a lower compression ratio. A value of 0 implies no special alignment.
strategy (svs.LVQStrategy) – The packing strategy to use for the compressed codes. See the associated documenation for that enum.

__init__(self: svs::python.LVQLoader, directory: str, padding: int = 0, strategy: svs::python.LVQStrategy = <LVQStrategy.Auto: 0>) -> None

Reload a compressed dataset from a previously saved dataset. Requires an appropriate back-end to be compiled for all combinations of primary and residual bits.

Parameters:

directory (str) – The directory where the dataset was previously saved.
primary (int) – The number of bits to use for compression in the primary dataset.
residual (int) – The number of bits to use for compression in the residual dataset. Default: 0>
dims (int) – The number of dimensions in the dataset. May provide a performance boost if given if a specialization has been compiled. Default: Dynamic (any dimension).
padding (int) – The value (in bytes) to align the beginning of each compressed vectors. Values of 32 or 64 may offer the best performance at the cost of a lower compression ratio. A value of 0 implies no special alignment. Default: 0.
strategy (svs.LVQStrategy) – The packing strategy to use for the compressed codes. See the associated documenation for that enum.

__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ4) -> None
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ8) -> None
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ4x4) -> None
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ4x8) -> None
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ8x8) -> None

property dims: The number of dimensions.

property primary_bits: The number of bits used for the primary encoding.

reload_from(self: svs::python.LVQLoader, directory: str) → svs::python.LVQLoader: Create a copy of the argument loader configured to reload a previously saved LVQ dataset from the given directory.

property residual_bits: The number of bits used for the residual encoding.

property strategy: The packing strategy to use.

Strategy Selection

The strategy argument of the LVQ loader provides a way of overriding the default selection of the packing strategy used by a LVQ backend.

Note that overriding the default strategy requires the corresponding backend to be compiled in the svs shared library component.

class svs.LVQStrategy

Select the packing mode for LVQ

Members:

Auto : Let SVS decide the best strategy.

Sequential : Use the Sequential packing strategy.

Turbo : Use the best Turbo packing strategy for this architecture.

LeanVecLoader

The LeanVec loader provides a way to use dimensionality reduction to improve performance on high dimensional datasets.

Internally, a LeanVec dataset consists of the dimensionality reduced primary dataset (over which the bulk of the index search is conducted) and a full dimensional secondary dataset used to rerank and refine candidates returned from the initial search.

svs allows selection of the storage format using the svs.LeanVecKind enum, enabling float16 and lvq compression for either of the primary and secondary datasets.

class svs.LeanVecLoader

Generic LeanVec Loader

__init__(*args, **kwargs)

Overloaded function.

__init__(self: svs::python.LeanVecLoader, datafile: svs::python.VectorDataLoader, leanvec_dims: int, primary_kind: svs::python.LeanVecKind = <LeanVecKind.lvq8: 2>, secondary_kind: svs::python.LeanVecKind = <LeanVecKind.lvq8: 2>, data_matrix: Optional[numpy.ndarray[numpy.float32]] = None, query_matrix: Optional[numpy.ndarray[numpy.float32]] = None, alignment: int = 32) -> None

Construct a loader that will lazily reduce the dimensionality of the data loader. Requires an appropriate back-end to be compiled for all combinations of primary and secondary types.

Parameters:

loader (svs.VectorDataLoader) – The uncompressed original dataset.
leanvec_dims (int) – resulting value of reduced dimensionality
primary (LeanVecKind) – Type of dataset used for Primary (Default: LVQ8)
secondary (LeanVecKind) – Type of dataset used for Secondary (Default: LVQ8)
data_matrix (Optional[numpy.ndarray[numpy.float32]]) – Matrix for data transformation [see note 1] (Default: None).
query_matrix (Optional[numpy.ndarray[numpy.float32]]) – Matrix for query transformation [see note 1] (Default: None).
alignment (int) – alignement/padding used in LVQ data types (Default: 32)

Note 1: The arguments data_matrix and data_matrix are optional and have the following requirements for valid combinations:

Neither matrix provided: Transform dataset and queries using a default PCA-based transformation.

Only data_matrix provided: The provided matrix is used to transform both the queries and the original dataset.

Both arguments are provided: Use the respective matrices for transformation.

__init__(self: svs::python.LeanVecLoader, directory: str, alignment: int = 32) -> None

Reload a LeanVec dataset from a previously saved dataset. Requires an appropriate back-end to be compiled for all combinations of primary and secondary types.

Parameters:

directory (str) – The directory where the dataset was previously saved.
leanvec_dims (int) – resulting value of reduced dimensionality. Default: Dynamic (any dimension).
dims (int) – The number of dimensions in the original dataset. Default: Dynamic (any dimension).
primary (LeanVecKind) – Type of dataset used for Primary Default: svs.LeanVecKind.lvq8.
secondary (LeanVecKind) – Type of dataset used for Secondary Default: svs.LeanVecKind.LVQ8.
alignment (int) – alignement/padding used in LVQ data types. Default: 32.

property alignment: The alignment to use for LVQ encoded data.

property dims: The full-dimensionality.

property leanvec_dims: The reduced dimensionality.

property primary_kind: The encoding of the reduced dimensional dataset.

reload_from(self: svs::python.LeanVecLoader, directory: str) → svs::python.LeanVecLoader: Create a copy of the argument loader configured to reload a previously saved LeanVec dataset from the given directory.

property secondary_kind: The encoding of the full-dimensional dataset.

class svs.LeanVecKind

LeanVec primary and secondary types

Members:

float32 : Uncompressed float32

float16 : Uncompressed float16

lvq8 : Compressed with LVQ 8bits

lvq4 : Compressed with LVQ 4bits