Skip to main content

Vector Index API Reference

The Vector Index is responsible for storing and searching vector embeddings in EngramDB. This document provides a detailed reference for the Vector Index API. EngramDB supports two primary vector index implementations:
  1. Linear Index (VectorIndex): Simple brute-force search for small to medium datasets
  2. HNSW Index (HnswIndex): Hierarchical Navigable Small World graph for fast approximate search in large datasets
Both implementations share a common interface through the VectorSearchIndex trait, allowing them to be used interchangeably.

VectorSearchIndex Trait

The VectorSearchIndex trait defines the common interface for all vector index implementations.

Common Methods

All vector indices implement these methods:
  • add(node): Adds a memory node to the index
  • remove(id): Removes a memory node from the index
  • update(node): Updates a memory node in the index
  • search(query, limit, threshold): Searches for similar vectors
  • len(): Returns the number of vectors in the index
  • is_empty(): Checks if the index is empty
  • get(id): Gets a reference to a vector by its ID

VectorIndex

The VectorIndex class provides a simple linear search implementation for vector embeddings.

Creating a Vector Index

VectorIndex::new()

Creates a new empty vector index. Returns:
  • A new VectorIndex instance
Example:
let mut vector_index = VectorIndex::new();

Managing Vectors

add(node)

Adds a memory node to the index. Parameters:
  • node: &MemoryNode - The memory node to add
Returns:
  • Result<()> - Success or an error
Example:
let memory = MemoryNode::new(vec![0.1, 0.2, 0.3, 0.4]);
vector_index.add(&memory)?;

remove(id)

Removes a memory node from the index. Parameters:
  • id: Uuid - The UUID of the memory node to remove
Returns:
  • Result<()> - Success or an error
Example:
vector_index.remove(memory_id)?;

update(node)

Updates a memory node in the index. Parameters:
  • node: &MemoryNode - The memory node to update
Returns:
  • Result<()> - Success or an error
Example:
memory.set_embeddings(vec![0.2, 0.3, 0.4, 0.5]);
vector_index.update(&memory)?;

Searching

search(query, limit, threshold)

Performs a similarity search to find the most similar vectors. Parameters:
  • query: &[f32] - The query vector
  • limit: usize - Maximum number of results to return
  • threshold: f32 - Minimum similarity threshold (0.0 to 1.0)
Returns:
  • Result<Vec<(Uuid, f32)>> - A vector of (UUID, similarity) pairs, sorted by descending similarity
Example:
let query_vector = vec![0.15, 0.25, 0.35, 0.45];
let results = vector_index.search(&query_vector, 5, 0.0)?;

for (memory_id, similarity) in results {
    println!("Memory ID: {}, Similarity: {:.4}", memory_id, similarity);
}

Utility Methods

len()

Returns the number of vectors in the index. Returns:
  • usize - The number of vectors
Example:
let count = vector_index.len();
println!("Index contains {} vectors", count);

is_empty()

Checks if the index is empty. Returns:
  • bool - True if the index is empty, false otherwise
Example:
if vector_index.is_empty() {
    println!("Index is empty");
}

get(id)

Gets a reference to a vector by its ID. Parameters:
  • id: Uuid - The UUID of the memory node
Returns:
  • Option<&Vec<f32>> - A reference to the vector if found, otherwise None
Example:
if let Some(vector) = vector_index.get(memory_id) {
    println!("Vector: {:?}", vector);
}

Similarity Functions

EngramDB provides several similarity functions for comparing vectors.

cosine_similarity(a, b)

Computes the cosine similarity between two vectors. Parameters:
  • a: &[f32] - First vector
  • b: &[f32] - Second vector
Returns:
  • Option<f32> - The cosine similarity value between the two vectors (between -1 and 1), or None if either vector is empty
Example:
let vec1 = vec![0.1, 0.2, 0.3];
let vec2 = vec![0.2, 0.3, 0.4];

if let Some(similarity) = cosine_similarity(&vec1, &vec2) {
    println!("Cosine similarity: {:.4}", similarity);
}

dot_product(a, b)

Computes the dot product between two vectors. Parameters:
  • a: &[f32] - First vector
  • b: &[f32] - Second vector
Returns:
  • f32 - The dot product value, or 0.0 if the vectors have different lengths
Example:
let vec1 = vec![0.1, 0.2, 0.3];
let vec2 = vec![0.2, 0.3, 0.4];

let dot = dot_product(&vec1, &vec2);
println!("Dot product: {:.4}", dot);

euclidean_distance(a, b)

Computes the Euclidean distance between two vectors. Parameters:
  • a: &[f32] - First vector
  • b: &[f32] - Second vector
Returns:
  • Option<f32> - The Euclidean distance between the two vectors, or None if the vectors have different lengths
Example:
let vec1 = vec![0.1, 0.2, 0.3];
let vec2 = vec![0.2, 0.3, 0.4];

if let Some(distance) = euclidean_distance(&vec1, &vec2) {
    println!("Euclidean distance: {:.4}", distance);
}

HnswIndex

The HnswIndex class provides a Hierarchical Navigable Small World (HNSW) graph implementation for fast approximate nearest neighbor search. This is particularly useful for large datasets where linear search becomes prohibitively expensive.

Creating an HNSW Index

HnswIndex::new()

Creates a new HNSW index with default parameters. Returns:
  • A new HnswIndex instance
Example:
let mut hnsw_index = HnswIndex::new();

HnswIndex::with_config(config)

Creates a new HNSW index with custom parameters. Parameters:
  • config: HnswConfig - Configuration parameters for the HNSW algorithm
Returns:
  • A new HnswIndex instance with the specified configuration
Example:
let config = HnswConfig {
    m: 16,                  // Maximum connections per node per layer
    ef_construction: 100,   // Size of dynamic candidate list during construction
    ef: 10,                 // Size of dynamic candidate list during search
    level_multiplier: 1.0,  // Base level for multi-layer construction
    max_level: 16,          // Maximum level in the hierarchy
};
let mut hnsw_index = HnswIndex::with_config(config);

HNSW Configuration

The HnswConfig struct allows you to configure the HNSW algorithm:
  • m: Maximum number of connections per node per layer (default: 16)
  • ef_construction: Size of the dynamic candidate list during index construction (default: 100)
  • ef: Size of the dynamic candidate list during search (default: 10)
  • level_multiplier: Base level for the multi-layer construction (default: 1.0/ln(2))
  • max_level: Maximum level in the hierarchy (default: 16)
Increasing m and ef_construction improves search quality at the cost of increased memory usage and indexing time. Increasing ef improves search quality at the cost of search speed.

Using Vector Indices with Database

EngramDB’s Database class supports different vector index algorithms through its configuration.

Creating a Database with Linear Search (Default)

// Create an in-memory database with the default linear search index
let db = Database::in_memory();

Creating a Database with HNSW Index

// Create an in-memory database with the HNSW index
let db = Database::in_memory_with_hnsw();

// Alternatively, create a file-based database with HNSW
let db = Database::file_based_with_hnsw("./my_database")?;

Custom Vector Index Configuration

// Create a database with custom HNSW parameters
let config = DatabaseConfig {
    storage_type: StorageType::Memory,
    storage_path: None,
    cache_size: 100,
    vector_index_config: VectorIndexConfig {
        algorithm: VectorAlgorithm::HNSW,
        hnsw: Some(HnswConfig {
            m: 32,                  // More connections per node
            ef_construction: 200,   // More neighbors during construction
            ef: 50,                 // More candidates during search
            level_multiplier: 1.0,
            max_level: 16,
        }),
    },
};
let db = Database::new(config)?;

Performance Comparison

The HNSW index provides significant performance improvements over linear search, especially for larger datasets:
Dataset SizeLinear Search (ms)HNSW Search (ms)Speedup
1,00051
10,00050225×
100,0005005100×
1,000,0005,00010500×
These are approximate values and will vary depending on vector dimensions, hardware, and specific HNSW parameters.

Example: Vector Search Workflow

Here’s a complete example of how to use vector search:
// Create a database with HNSW index for faster search
let mut db = Database::in_memory_with_hnsw();

// Create and add memory nodes
let memory1 = MemoryNode::new(vec![1.0, 0.0, 0.0]); // X-axis
let memory2 = MemoryNode::new(vec![0.7, 0.7, 0.0]); // 45 degrees in XY plane
let memory3 = MemoryNode::new(vec![0.0, 1.0, 0.0]); // Y-axis
let memory4 = MemoryNode::new(vec![0.0, 0.0, 1.0]); // Z-axis

let id1 = db.save(&memory1)?;
let id2 = db.save(&memory2)?;
let id3 = db.save(&memory3)?;
let id4 = db.save(&memory4)?;

// Search for something closer to the X-axis
let query = vec![0.9, 0.1, 0.0];
let results = db.search_similar(&query, 2, 0.0)?;

// Process results
for (id, similarity) in results {
    let memory = db.load(id)?;
    println!("Memory ID: {}, Similarity: {:.4}", id, similarity);
}