In-Memory Representation of Data
Vector data is at the core of the SVS similarity search library. Several specific classes are provided to implement in-memory vector datasets with different semantics.
svs::data::SimpleData
- General allocator-aware dense representation of embedding vectors.svs::data::SimpleDataView
- Non-owning view over a dense representation of embedding vectors.svs::data::ConstSimpleDataView
- Constant version ofsvs::data::SimpleDataView
. Useful for crossing virtual function boundaries where templates can’t be used.
Detailed documentation for these classes is given below.
-
template<typename T, size_t Extent = Dynamic, typename Alloc = lib::Allocator<T>>
class SimpleData The following properties hold:
Vectors are stored contiguously in memory.
All vectors have the same length.
Public Types
-
using value_type = std::span<element_type, Extent>
The type used to return a mutable handle to stored vectors.
-
using const_value_type = std::span<const element_type, Extent>
The type used to return a constant handle to stored vectors.
Public Functions
-
inline const allocator_type &get_allocator() const
Return the underlying allocator.
-
inline explicit SimpleData(AnonymousArray<2> array)
Construct a view over the array using a checked cast.
-
inline size_t size() const
Return the number of entries in the dataset.
-
inline size_t capacity() const
Return the maximum number of entries this dataset can hold.
-
inline size_t dimensions() const
Return the number of dimensions for each entry in the dataset.
-
inline const_value_type get_datum(size_t i) const
Return a constant handle to vector stored as position
i
.Preconditions:
0 <= i < size()
-
inline value_type get_datum(size_t i)
Return a mutable handle to vector stored as position
i
.NOTE: Mutating the returned value directly may have unintended consequences. Perform with care.
Preconditions:
0 <= i < size()
-
inline void prefetch(size_t i) const
Prefetch the vector at position
i
into the L1 cache.
-
template<typename U, size_t N>
inline void set_datum(size_t i, std::span<U, N> datum) Overwrite the contents of the vector at position
i
.If
U
is the same type aselement_type
, then this operation is simply a memory copy. Otherwise,lib::narrow
will be used to convert each element ofdatum
which may error if the conversion is not exact.Preconditions:
datum.size() == dimensions()
0 <= i < size()
- Parameters:
i – The index at which to store the new data.
datum – The new vector in R^n to store.
-
inline ConstSimpleDataView<T, Extent> cview() const
Return a ConstSimpleDataView over this data.
-
inline ConstSimpleDataView<T, Extent> view() const
Return a ConstSimpleDataView over this data.
-
inline SimpleDataView<T, Extent> view()
Return a SimpleDataView over this data.
-
inline void resize(size_t new_size)
Resize the dataset to the new size.
Causes a reallocation if
new_size > capacity()
. Growing and shrinking are performed at the end the valid range.NOTE: Resizing that triggers a reallocation will invalidate all previously obtained pointers!.
-
inline void shrink_to_fit()
Requests the removal of unused capacity.
It is a non-binding request to reduce
capacity()
tosize()
. If relocation occurs, all iterators and previously obtained datums are invalidated.
Public Static Functions
-
static inline SimpleData load(const lib::LoadTable &table, const allocator_type &allocator = {})
Reload a previously saved dataset.
This method is implicitly called when using
svs::lib::load_from_disk<svs::data::SimpleData<T, Extent>>("directory");
- Parameters:
table – The table containing saved hyper parameters.
allocator – Allocator instance to use upon reloading.
-
static inline SimpleData load(const std::filesystem::path &path, const allocator_type &allocator = {})
Try to automatically load the dataset.
The argument
path
can point to:The directory previously used to save a dataset (or the config file of such a directory).
A “.[f/b/i]vecs” file.
- Parameters:
path – The filepath to a dataset on disk.
allocator – The allocator instance to use when constructing this class.
Public Static Attributes
-
static constexpr bool is_memory_map_compatible = true
The various instantiations of
SimpleData
are expected to have dense layouts. Therefore, they are directly memory map compatible from appropriate files.However, some specializations (such as the blocked dataset) are not necessarily memory map compatible.
-
template<typename T, size_t Extent = Dynamic>
using svs::data::ConstSimpleDataView = SimpleData<const T, Extent, View<const T>>
-
template<typename T, size_t Extent = Dynamic>
using svs::data::SimpleDataView = SimpleData<T, Extent, View<T>>
Data Loading
The svs::VectorDataLoader
class provides a way to instantiate a svs::data::SimplePolymorphicData
object from multiple different kinds of file types.
-
template<typename T, size_t Extent = Dynamic, typename Allocator = HugepageAllocator<T>>
class VectorDataLoader Loader for uncompressed vector datasets.
- Template Parameters:
T – The element type of the encoded vectors. Typically, this will be a floating point type like
float
orsvs::Float16
but may be an integer type as well for certain datsets.Extent – The compile-time dimensionality of the vectors to be read. May provide a performance boost if given. Default:
svs::Dynamic
.Allocator – The allocator to use for the memory backing the data when loaded.
Public Types
-
using return_type = data::SimpleData<T, Extent, Allocator>
The full type of the loaded dataset.
Public Functions
-
inline VectorDataLoader(const std::filesystem::path &path, const Allocator &allocator)
Construct a new VectorDataLoader.
Typically,
path
should point to a directory generated by one of the index save methods. This will provide the most error checking. However, the path can also point directly to the following files:Any “*.svs” file, which is the native file path used by the SVS library.
Any “[f/b/i]vecs” file typically used by similarity search libraries.
- Parameters:
path – The path to the dataset on disk. See detailed notes.
allocator – The allocator to be used.
-
inline return_type load() const
Load the dataset from disk.
-
inline const std::filesystem::path &get_path() const
Return the file path given when this class was constructed.
Note
The various data implementations given above are all instances of the more general concept svs::data::ImmutableMemoryDataset
.
Where possible, this concept is use to constrain template arguments, allowing for future custom implementations.