Vector Index API Reference
The Vector Index is responsible for storing and searching vector embeddings in EngramDB. This document provides a detailed reference for the Vector Index API. EngramDB supports two primary vector index implementations:- Linear Index (
VectorIndex): Simple brute-force search for small to medium datasets - HNSW Index (
HnswIndex): Hierarchical Navigable Small World graph for fast approximate search in large datasets
VectorSearchIndex trait, allowing them to be used interchangeably.
VectorSearchIndex Trait
TheVectorSearchIndex trait defines the common interface for all vector index implementations.
Common Methods
All vector indices implement these methods:add(node): Adds a memory node to the indexremove(id): Removes a memory node from the indexupdate(node): Updates a memory node in the indexsearch(query, limit, threshold): Searches for similar vectorslen(): Returns the number of vectors in the indexis_empty(): Checks if the index is emptyget(id): Gets a reference to a vector by its ID
VectorIndex
TheVectorIndex class provides a simple linear search implementation for vector embeddings.
Creating a Vector Index
VectorIndex::new()
Creates a new empty vector index.
Returns:
- A new
VectorIndexinstance
Managing Vectors
add(node)
Adds a memory node to the index.
Parameters:
node:&MemoryNode- The memory node to add
Result<()>- Success or an error
remove(id)
Removes a memory node from the index.
Parameters:
id:Uuid- The UUID of the memory node to remove
Result<()>- Success or an error
update(node)
Updates a memory node in the index.
Parameters:
node:&MemoryNode- The memory node to update
Result<()>- Success or an error
Searching
search(query, limit, threshold)
Performs a similarity search to find the most similar vectors.
Parameters:
query:&[f32]- The query vectorlimit:usize- Maximum number of results to returnthreshold:f32- Minimum similarity threshold (0.0 to 1.0)
Result<Vec<(Uuid, f32)>>- A vector of (UUID, similarity) pairs, sorted by descending similarity
Utility Methods
len()
Returns the number of vectors in the index.
Returns:
usize- The number of vectors
is_empty()
Checks if the index is empty.
Returns:
bool- True if the index is empty, false otherwise
get(id)
Gets a reference to a vector by its ID.
Parameters:
id:Uuid- The UUID of the memory node
Option<&Vec<f32>>- A reference to the vector if found, otherwise None
Similarity Functions
EngramDB provides several similarity functions for comparing vectors.cosine_similarity(a, b)
Computes the cosine similarity between two vectors.
Parameters:
a:&[f32]- First vectorb:&[f32]- Second vector
Option<f32>- The cosine similarity value between the two vectors (between -1 and 1), or None if either vector is empty
dot_product(a, b)
Computes the dot product between two vectors.
Parameters:
a:&[f32]- First vectorb:&[f32]- Second vector
f32- The dot product value, or 0.0 if the vectors have different lengths
euclidean_distance(a, b)
Computes the Euclidean distance between two vectors.
Parameters:
a:&[f32]- First vectorb:&[f32]- Second vector
Option<f32>- The Euclidean distance between the two vectors, or None if the vectors have different lengths
HnswIndex
TheHnswIndex class provides a Hierarchical Navigable Small World (HNSW) graph implementation for fast approximate nearest neighbor search. This is particularly useful for large datasets where linear search becomes prohibitively expensive.
Creating an HNSW Index
HnswIndex::new()
Creates a new HNSW index with default parameters.
Returns:
- A new
HnswIndexinstance
HnswIndex::with_config(config)
Creates a new HNSW index with custom parameters.
Parameters:
config:HnswConfig- Configuration parameters for the HNSW algorithm
- A new
HnswIndexinstance with the specified configuration
HNSW Configuration
TheHnswConfig struct allows you to configure the HNSW algorithm:
m: Maximum number of connections per node per layer (default: 16)ef_construction: Size of the dynamic candidate list during index construction (default: 100)ef: Size of the dynamic candidate list during search (default: 10)level_multiplier: Base level for the multi-layer construction (default: 1.0/ln(2))max_level: Maximum level in the hierarchy (default: 16)
m and ef_construction improves search quality at the cost of increased memory usage and indexing time. Increasing ef improves search quality at the cost of search speed.
Using Vector Indices with Database
EngramDB’sDatabase class supports different vector index algorithms through its configuration.
Creating a Database with Linear Search (Default)
Creating a Database with HNSW Index
Custom Vector Index Configuration
Performance Comparison
The HNSW index provides significant performance improvements over linear search, especially for larger datasets:| Dataset Size | Linear Search (ms) | HNSW Search (ms) | Speedup |
|---|---|---|---|
| 1,000 | 5 | 1 | 5× |
| 10,000 | 50 | 2 | 25× |
| 100,000 | 500 | 5 | 100× |
| 1,000,000 | 5,000 | 10 | 500× |

