In-Memory Representation of Data

Vector data is at the core of the SVS similarity search library. Several specific classes are provided to implement in-memory vector datasets with different semantics.

Detailed documentation for these classes is given below.

template<typename T, size_t Extent = Dynamic, typename Alloc = lib::Allocator<T>>
class SimpleData

The following properties hold:

  • Vectors are stored contiguously in memory.

  • All vectors have the same length.

Public Types

using allocator_type = Alloc

The allocator type used for this instance.

using element_type = T

The data type used to encode each dimension of the stored vectors.

using value_type = std::span<element_type, Extent>

The type used to return a mutable handle to stored vectors.

using const_value_type = std::span<const element_type, Extent>

The type used to return a constant handle to stored vectors.

Public Functions

inline const allocator_type &get_allocator() const

Return the underlying allocator.

inline explicit SimpleData(AnonymousArray<2> array)

Construct a view over the array using a checked cast.

inline size_t size() const

Return the number of entries in the dataset.

inline size_t capacity() const

Return the maximum number of entries this dataset can hold.

inline size_t dimensions() const

Return the number of dimensions for each entry in the dataset.

inline const_value_type get_datum(size_t i) const

Return a constant handle to vector stored as position i.

Preconditions:

inline value_type get_datum(size_t i)

Return a mutable handle to vector stored as position i.

NOTE: Mutating the returned value directly may have unintended consequences. Perform with care.

Preconditions:

inline void prefetch(size_t i) const

Prefetch the vector at position i into the L1 cache.

template<typename U, size_t N>
inline void set_datum(size_t i, std::span<U, N> datum)

Overwrite the contents of the vector at position i.

If U is the same type as element_type, then this operation is simply a memory copy. Otherwise, lib::narrow will be used to convert each element of datum which may error if the conversion is not exact.

Preconditions:

Parameters:
  • i – The index at which to store the new data.

  • datum – The new vector in R^n to store.

inline const T *data() const

Return the base pointer to the data.

inline T *data()

Return the base pointer to the data.

inline ConstSimpleDataView<T, Extent> cview() const

Return a ConstSimpleDataView over this data.

inline ConstSimpleDataView<T, Extent> view() const

Return a ConstSimpleDataView over this data.

inline SimpleDataView<T, Extent> view()

Return a SimpleDataView over this data.

inline void resize(size_t new_size)

Resize the dataset to the new size.

Causes a reallocation if new_size > capacity(). Growing and shrinking are performed at the end the valid range.

NOTE: Resizing that triggers a reallocation will invalidate all previously obtained pointers!.

inline void shrink_to_fit()

Requests the removal of unused capacity.

It is a non-binding request to reduce capacity() to size(). If relocation occurs, all iterators and previously obtained datums are invalidated.

Public Static Functions

static inline SimpleData load(const lib::LoadTable &table, const allocator_type &allocator = {})

Reload a previously saved dataset.

This method is implicitly called when using

svs::lib::load_from_disk<svs::data::SimpleData<T, Extent>>("directory");

Parameters:
  • table – The table containing saved hyper parameters.

  • allocator – Allocator instance to use upon reloading.

static inline SimpleData load(const std::filesystem::path &path, const allocator_type &allocator = {})

Try to automatically load the dataset.

The argument path can point to:

  • The directory previously used to save a dataset (or the config file of such a directory).

  • A “.[f/b/i]vecs” file.

Parameters:
  • path – The filepath to a dataset on disk.

  • allocator – The allocator instance to use when constructing this class.

Public Static Attributes

static constexpr size_t extent = Extent

The static dimensionality of the underlying data.

static constexpr bool is_memory_map_compatible = true

The various instantiations of SimpleData are expected to have dense layouts. Therefore, they are directly memory map compatible from appropriate files.

However, some specializations (such as the blocked dataset) are not necessarily memory map compatible.

static constexpr bool is_view = is_view_type_v<Alloc>

Return whether or not this is a non-owning view of the underlying data.

static constexpr bool is_const = std::is_const_v<T>

Return whether or not this class is allowed to mutate its backing data.

template<typename T, size_t Extent = Dynamic>
using svs::data::ConstSimpleDataView = SimpleData<const T, Extent, View<const T>>
template<typename T, size_t Extent = Dynamic>
using svs::data::SimpleDataView = SimpleData<T, Extent, View<T>>

Data Loading

The svs::VectorDataLoader class provides a way to instantiate a svs::data::SimplePolymorphicData object from multiple different kinds of file types.

template<typename T, size_t Extent = Dynamic, typename Allocator = HugepageAllocator<T>>
class VectorDataLoader

Loader for uncompressed vector datasets.

Template Parameters:
  • T – The element type of the encoded vectors. Typically, this will be a floating point type like float or svs::Float16 but may be an integer type as well for certain datsets.

  • Extent – The compile-time dimensionality of the vectors to be read. May provide a performance boost if given. Default: svs::Dynamic.

  • Allocator – The allocator to use for the memory backing the data when loaded.

Public Types

using return_type = data::SimpleData<T, Extent, Allocator>

The full type of the loaded dataset.

Public Functions

inline VectorDataLoader(const std::filesystem::path &path, const Allocator &allocator)

Construct a new VectorDataLoader.

Typically, path should point to a directory generated by one of the index save methods. This will provide the most error checking. However, the path can also point directly to the following files:

  • Any “*.svs” file, which is the native file path used by the SVS library.

  • Any “[f/b/i]vecs” file typically used by similarity search libraries.

Parameters:
  • path – The path to the dataset on disk. See detailed notes.

  • allocator – The allocator to be used.

inline return_type load() const

Load the dataset from disk.

inline const std::filesystem::path &get_path() const

Return the file path given when this class was constructed.

Note

The various data implementations given above are all instances of the more general concept svs::data::ImmutableMemoryDataset. Where possible, this concept is use to constrain template arguments, allowing for future custom implementations.