Loaders
Uncompressed File Loaders
- class svs.VectorDataLoader
Handle representing an uncompressed vector data file.
- __init__(self: svs::python.VectorDataLoader, path: str, data_type: Optional[svs::python.DataType] = None, dims: Optional[int] = None) None
Construct a new
svs.VectorDataLoader
.- Parameters:
path (str) –
The path to the file to load. This can either be:
The path to the directory where a previous vector dataset was saved (preferred).
The direct path to the vector data file itself. In this case, the type of the file will try to be inferred automatically. Recognized extensions: “.[b/i/f]vecs”, “.bin”, and “.svs”.
data_type (
svs.DataType
) – The native type of the elements in the dataset.dims (int) – The expected dimsionality of the dataset. While this argument is generally optional, providing it may yield runtime speedups.
- property data_type
Access the assigned data type.
- Type:
Read/Write (
svs.DataType
)
- property dims
Access the expected dimensionality.
- Type:
Read/Write (int)
- property filepath
Access the underlying file path.
- Type:
Read/Write (str)
LVQ Loader
The LVQ loader provides lazy compression of uncompressed data and reloading of previously saved LVQ data.
- class svs.LVQLoader
Generic LVQ Loader
- __init__(*args, **kwargs)
Overloaded function.
__init__(self: svs::python.LVQLoader, datafile: svs::python.VectorDataLoader, primary: int, residual: int = 0, padding: int = 0, strategy: svs::python.LVQStrategy = <LVQStrategy.Auto: 0>) -> None
Construct a loader that will lazily compress the results of the data loader. Requires an appropriate back-end to be compiled for all combinations of primary and residual bits.
- Parameters:
loader (
svs.VectorDataLoader
) – The uncompressed dataset to compress in-memory.primary (int) – The number of bits to use for compression in the primary dataset.
residual (int) – The number of bits to use for compression in the residual dataset. Default: 0.
padding (int) – The value (in bytes) to align the beginning of each compressed vectors. Values of 32 or 64 may offer the best performance at the cost of a lower compression ratio. A value of 0 implies no special alignment.
strategy (
svs.LVQStrategy
) – The packing strategy to use for the compressed codes. See the associated documenation for that enum.
__init__(self: svs::python.LVQLoader, directory: str, padding: int = 0, strategy: svs::python.LVQStrategy = <LVQStrategy.Auto: 0>) -> None
Reload a compressed dataset from a previously saved dataset. Requires an appropriate back-end to be compiled for all combinations of primary and residual bits.
- Parameters:
directory (str) – The directory where the dataset was previously saved.
primary (int) – The number of bits to use for compression in the primary dataset.
residual (int) – The number of bits to use for compression in the residual dataset. Default: 0>
dims (int) – The number of dimensions in the dataset. May provide a performance boost if given if a specialization has been compiled. Default: Dynamic (any dimension).
padding (int) – The value (in bytes) to align the beginning of each compressed vectors. Values of 32 or 64 may offer the best performance at the cost of a lower compression ratio. A value of 0 implies no special alignment. Default: 0.
strategy (
svs.LVQStrategy
) – The packing strategy to use for the compressed codes. See the associated documenation for that enum.
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ4) -> None
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ8) -> None
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ4x4) -> None
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ4x8) -> None
__init__(self: svs::python.LVQLoader, legacy: svs::python.LVQ8x8) -> None
- property dims
The number of dimensions.
- property primary_bits
The number of bits used for the primary encoding.
- reload_from(self: svs::python.LVQLoader, directory: str) svs::python.LVQLoader
Create a copy of the argument loader configured to reload a previously saved LVQ dataset from the given directory.
- property residual_bits
The number of bits used for the residual encoding.
- property strategy
The packing strategy to use.
Strategy Selection
The strategy argument of the LVQ loader provides a way of overriding the default selection of the packing strategy used by a LVQ backend.
Note that overriding the default strategy requires the corresponding backend to be compiled in the svs shared library component.
- class svs.LVQStrategy
Select the packing mode for LVQ
Members:
Auto : Let SVS decide the best strategy.
Sequential : Use the Sequential packing strategy.
Turbo : Use the best Turbo packing strategy for this architecture.
LeanVecLoader
The LeanVec loader provides a way to use dimensionality reduction to improve performance on high dimensional datasets.
Internally, a LeanVec dataset consists of the dimensionality reduced primary dataset (over which the bulk of the index search is conducted) and a full dimensional secondary dataset used to rerank and refine candidates returned from the initial search.
svs allows selection of the storage format using the svs.LeanVecKind
enum,
enabling float16 and lvq compression for either of the primary and secondary datasets.
- class svs.LeanVecLoader
Generic LeanVec Loader
- __init__(*args, **kwargs)
Overloaded function.
__init__(self: svs::python.LeanVecLoader, datafile: svs::python.VectorDataLoader, leanvec_dims: int, primary_kind: svs::python.LeanVecKind = <LeanVecKind.lvq8: 2>, secondary_kind: svs::python.LeanVecKind = <LeanVecKind.lvq8: 2>, data_matrix: Optional[numpy.ndarray[numpy.float32]] = None, query_matrix: Optional[numpy.ndarray[numpy.float32]] = None, alignment: int = 32) -> None
Construct a loader that will lazily reduce the dimensionality of the data loader. Requires an appropriate back-end to be compiled for all combinations of primary and secondary types.
- Parameters:
loader (
svs.VectorDataLoader
) – The uncompressed original dataset.leanvec_dims (int) – resulting value of reduced dimensionality
primary (LeanVecKind) – Type of dataset used for Primary (Default: LVQ8)
secondary (LeanVecKind) – Type of dataset used for Secondary (Default: LVQ8)
data_matrix (Optional[numpy.ndarray[numpy.float32]]) – Matrix for data transformation [see note 1] (Default: None).
query_matrix (Optional[numpy.ndarray[numpy.float32]]) – Matrix for query transformation [see note 1] (Default: None).
alignment (int) – alignement/padding used in LVQ data types (Default: 32)
Note 1: The arguments
data_matrix
anddata_matrix
are optional and have the following requirements for valid combinations:Neither matrix provided: Transform dataset and queries using a default PCA-based transformation.
Only
data_matrix
provided: The provided matrix is used to transform both the queries and the original dataset.Both arguments are provided: Use the respective matrices for transformation.
__init__(self: svs::python.LeanVecLoader, directory: str, alignment: int = 32) -> None
Reload a LeanVec dataset from a previously saved dataset. Requires an appropriate back-end to be compiled for all combinations of primary and secondary types.
- Parameters:
directory (str) – The directory where the dataset was previously saved.
leanvec_dims (int) – resulting value of reduced dimensionality. Default: Dynamic (any dimension).
dims (int) – The number of dimensions in the original dataset. Default: Dynamic (any dimension).
primary (LeanVecKind) – Type of dataset used for Primary Default:
svs.LeanVecKind.lvq8
.secondary (LeanVecKind) – Type of dataset used for Secondary Default:
svs.LeanVecKind.LVQ8
.alignment (int) – alignement/padding used in LVQ data types. Default: 32.
- property alignment
The alignment to use for LVQ encoded data.
- property dims
The full-dimensionality.
- property leanvec_dims
The reduced dimensionality.
- property primary_kind
The encoding of the reduced dimensional dataset.
- reload_from(self: svs::python.LeanVecLoader, directory: str) svs::python.LeanVecLoader
Create a copy of the argument loader configured to reload a previously saved LeanVec dataset from the given directory.
- property secondary_kind
The encoding of the full-dimensional dataset.
- class svs.LeanVecKind
LeanVec primary and secondary types
Members:
float32 : Uncompressed float32
float16 : Uncompressed float16
lvq8 : Compressed with LVQ 8bits
lvq4 : Compressed with LVQ 4bits