Loaders

Uncompressed File Loaders

class svs.VectorDataLoader

Handle representing an uncompressed vector data file.

__init__(self: svs::python.VectorDataLoader, path: str, data_type: Optional[svs::python.DataType] = None, dims: Optional[int] = None) None

Construct a new svs.VectorDataLoader.

Parameters:
  • path (str) –

    The path to the file to load. This can either be:

    • The path to the directory where a previous vector dataset was saved (preferred).

    • The direct path to the vector data file itself. In this case, the type of the file will try to be inferred automatically. Recognized extensions: “.[b/i/f]vecs”, “.bin”, and “.svs”.

  • data_type (svs.DataType) – The native type of the elements in the dataset.

  • dims (int) – The expected dimsionality of the dataset. While this argument is generally optional, providing it may yield runtime speedups.

property data_type

Access the assigned data type.

Type:

Read/Write (svs.DataType)

property dims

Access the expected dimensionality.

Type:

Read/Write (int)

property filepath

Access the underlying file path.

Type:

Read/Write (str)

LVQ Loader

The LVQ loader provides lazy compression of uncompressed data and reloading of previously saved LVQ data.

Strategy Selection

The strategy argument of the LVQ loader provides a way of overriding the default selection of the packing strategy used by a LVQ backend.

Note that overriding the default strategy requires the corresponding backend to be compiled in the svs shared library component.

LeanVecLoader

The LeanVec loader provides a way to use dimensionality reduction to improve performance on high dimensional datasets.

Internally, a LeanVec dataset consists of the dimensionality reduced primary dataset (over which the bulk of the index search is conducted) and a full dimensional secondary dataset used to rerank and refine candidates returned from the initial search.

svs allows selection of the storage format using the svs.LeanVecKind enum, enabling float16 and lvq compression for either of the primary and secondary datasets.