May 26, 2024

A Visual Guide to the Hierarchical Navigable Small World Algorithm

This post is part two in this series about similarity search. To read the first part of this series, check out A Visual Guide to Vector Search and K-Nearest Neighbors.

Error

This article is a work in progress. The text, code examples, and the visualizations are works-in-progress and will be updated.

We’ll dive into the inner workings of an approximate nearest neighbor (ANN) algorithm called Hierarchical Navigable Small Worlds (HNSW). HNSW is commonly used algorithm in vector databases to enhance the speed of similarity search. Throughout this article, we’ll use interactive visuals and demos to demonstrate the fundamental concepts of the HNSW algorithm.

Introduction to HNSW

Hierarchical NSW is one of several approximate nearest neighbor (ANN) algorithms developed to speed up similarity search. Initially proposed by Yu. A. Malkov and D. A. Yashunin in 2016, HNSW builds on the concept of two other algorithms:

Navigable Small World (NSW) graphs - a type of graph that allows for fast approximate nearest neighbor search
Skip Lists - a data structure allowing fast insertion and search via a multi-layer linked list structure.

Combining both, HNSW generates a multi-layer NSW graph that enables a logarithmic time complexity for approximate nearest neighbor search. Its performance was significantly better compared to previous state-of-the-art ANN algorithms.

Mastering the concepts of skip lists and navigable small world data structures is a prerequisite for comprehending the inner workings of HNSW. This understanding will serve as a solid foundation for our exploration of HNSW.

Skip Lists

Skip Lists enhance singly linked lists by introducing multiple layers, achieving O(log n) average time complexity for searches and insertions. This multi-layered approach allows for efficient element “skipping” during searches.

The node’s height, indicating its layer stack, ranges from a dense, fully connected, bottom layer to a sparse top layer.

An example below demonstrates searching for a node valued 5 in a Skip List.

Skip Lists enhance singly linked lists by introducing multiple layers, with each node having a randomly assigned height. Unlike traditional linked lists where each node points to the next, Skip List nodes have multiple pointers for different layers.

Node heights in a skip list are probabilistically assigned on insertion of a node, with each additional layer usually having a halved chance of inclusion, with the probability p being set to a default value of 0.5. This means a node’s height is akin to flipping a coin: a head adds another layer, while a tail stops the process, making taller nodes less common as layers increase.

This results in a structure with multiple levels, with the top level containing the sparsest amount of nodes and the bottom level being the most dense. The lowest level contains all the elements like a regular linked list, except ordered. Inserts and searches on a skip list have an average time complexity of O(log n), compared to O(n) linear time for a linked list.

This unique structure enables the “skipping” of elements during search, which is the property that gives skip lists performance advantages over traditional linked lists and a similar performance profile to balanced trees.

Play with the interactive demo below to see how this skipping works in practice to speed up search:

The animation above shows a Skip List search for a node with value 5. The search starts at the top level, comparing the target value with adjacent nodes. If the target matches or is less than the next node, the search moves to that node. Otherwise, if the next node’s value is greater or it’s the end of the list, the search drops to the next lower level. This process repeats until the desired node is found at the lowest level.

This method is significantly faster than a standard linear search in a linked list. The Hierarchical Navigable Small World (HNSW) algorithm applies a similar concept to gain performance improvements during search and insertion, but using a graph instead of a linked list.

Navigable Small World (NSW) provides a method for constructing and traversing a graph that facilitates fast approximate nearest neighbor search. A greedy search of this graph has a polylogarithmic time complexity of O(logᵏn), which can be faster than the linear time complexity of a naive KNN search.

The NSW graph employs a strategy known as greedy routing during search and insertion. The search process in NSW commences by selecting an entry point.

Typically the entry point is the first node ever inserted. Early nodes typically remain a starting point because long-range edges are likely to be created during the initial graph construction phase. These edges play a vital role in graph navigation. They serve as bridges between network hubs, maintaining overall graph connectivity and enabling logarithmic scaling of the number of hops during greedy routing.

The algorithm then takes the set of all of the neighbors of the current node (a.k.a the neighborhood) plus the current node itself and computes the distances between the elements of this set with the query vector (the vector we are trying to find similar vectors to). If a node closer to the query vector is found among that set, that node becomes the current node, and the process continues at that node. The search procedure concludes when the algorithm cannot find a neighbor node closer to the query than the current node itself. This node is then returned as the response to the query.

However, this greedy strategy does not guarantee the discovery of the best nearest neighbor as it uses only local information at each step to make decisions. One of the properties of this algorithm is early stopping, which happens when an iteration finds no better neighbor nodes than the current one. This property is what gives NSW its improved performance compared to naive search, but runs the risk that the node returned is only a local minimum instead of the global minimum.

The insertion process mirrors the search process, tracing the same path it would have taken if it were a search query. It establishes connections to its K nearest vertices upon reaching a local minimum.

The demo below demonstrates this process. It begins running automatically, but you can pause it to manually add nodes and perform searches to better understand how the algorithm operates.

Building Graph: adding new nodes to the graph.

Using the demo, here’s an interesting challenge: see if you can generate an example of early stopping, where the node returned from the search is not actually the closet node but a local minima.

Hierarchical Navigable Small World (HNSW)

The HNSW is essentially a multi layer NSW. Similar to a skip list, whenever a node is added to the graph, it is assigned a height probabilisticly with decaying probability as the height of the node increases. This means that each node will not just have one set of neighbors, but actually a set of neighbors for each level they are on.

Simplisticly, during the search, the inital node is chosen by selecting the last inserted node with the tallest height. From that node, a greedy search is conducted, starting with that initial tallest node and at the top level. The greedy search is constrained at that layer until a local minimum/nearest neighbor is found for that specific layer. For all the layers above the bottom layer (index 0), only one node is selected.

Once a node is selected and there are no closer nodes at that layer, the search continues with that node, except at the next layer down. Once the bottom layer has been reached, we extend the search to return k nearest neighbors instead of just the single closest one.

This architecture and strategy brings down the complexity of the search and insert operations to O(log(n)) on average.

The below interactive demonstration of HNSW shows how search is conducted. Pay attention to how search is conducted at each layer, with a neighborhood being constructed to find closer nodes at the current layer. The query node is highlighted in yellow. The current node with be the one with the animated dark blue/red border. Neighborhood search of nodes will highlight neigbors with a light blue fill. The most similar nodes returned from the search will be highlighted yellow.

k = 2
Auto-running graph traversal

HNSW Insertion and Search Deep Dive

The Hierarchical Navigable Small World (HNSW) algorithm enhances the Navigable Small World (NSW) model with several significant changes:

ef Parameter for Enhanced Search Quality: Unlike NSW, which relies on multi-search counts, HNSW employs the ef (effective search parameter) for refined control over search precision.
Heuristic-Based Neighbor Selection: HNSW introduces a heuristic approach to selecting neighbors, instead of merely choosing the closest neighbors. This method considers the spatial distribution of neighbors to prevent clustering.
Optimization Parameters mL and Mmax0: With the introduction of mL for level generation and Mmax0 for maximum connections, HNSW optimizes both graph construction and maintains navigability.

These enhancements collectively boost HNSW’s performance in finding nearest neighbors.

Graph Construction and Insertion

To add a new point ‘q’ to our system, we start at the top layer of our multi-layer graph. We use the ‘ef’ parameter, initially set to 1, to perform a simple greedy search for the nearest neighbors to ‘q’. The ‘ef’ parameter controls the search quality.

Once we find the closest neighbors at the top layer, we move to the next lower layer, using these neighbors as new starting points. This process repeats until we reach the bottom layer. When we reach a layer equal to or lower than ‘l’, we make two key changes:

Increase ‘ef’ from 1 to ‘efConstruction’ to improve the search recall.
Consider the closest neighbors at each layer as candidates for connecting to ‘q’.

During this search, we maintain a priority queue (pq) of the ‘ef’ closest neighbors at each layer. This list is updated as we find closer neighbors using a greedy search algorithm.

A stop condition improves efficiency: if a candidate point is farther from ‘q’ than the furthest point in our list, it is ignored. This keeps the search focused on close neighbors.

This method, with ‘ef’ adjustments and priority queues, allows precise tuning of the search process, improving performance and accuracy.

Search

The search algorithm mirrors the insertion logic but aims to return the closest neighbors. It similarly adjusts the ef parameter to modulate search quality, ensuring an efficient and effective neighbor search.

Nearest Neighbor Heuristic and Neighbor Selection

HNSW improves neighbor selection by not only choosing the closest $M$ neighbors but also considering their spread to avoid clustering. This is important because it ensures the graph remains navigable and efficient, even with complex or densely clustered data. The algorithm uses two main strategies: adding more neighbors in dense areas and limiting connections per point with $M_{\text{max}}$ . At the base layer, a specific limit $M_{\text{max0}}$ applies, and exceeding this triggers pruning.

The height of a node in the graph is determined probabilistically, similar to a skip list. The average layer count for a node is given by:

E[l+1] = E[-\ln(\text{uniform}(0,1)) \cdot m_L]

Key parameters include $m_L$ , which affects node distribution across layers, and $M_{\text{max0}}$ , which defines the maximum connections at the base layer. These parameters are crucial because they balance the graph’s navigability and search efficiency.

Setting $m_L$ to zero and $M_{\text{max0}}$ to $M$ creates directed k-NN graphs with power-law search complexity, meaning the search time grows polynomially with the number of nodes.
Setting $m_L$ to zero and $M_{\text{max0}}$ to infinity results in NSW graphs with polylogarithmic complexity, meaning the search time grows logarithmically.
A non-zero $m_L$ creates hierarchical graphs with logarithmic search complexity, making searches faster as the graph grows.

To optimize performance, it’s essential to minimize overlap between neighbors across layers by reducing $m_L$ . However, this increases the average hops per layer, suggesting an optimal $m_L$ value. A good choice for $m_L$ is $1/\ln(M)$ , similar to the skip list parameter $p = 1/M$ .

The parameter $M$ represents the number of connections per element, with an optimal range of 5 to 48. It affects memory use and performance. The $efConstruction$ parameter should ensure a K-ANNS recall close to 1, typically 0.95, and can be auto-configured using sample data.

Using the heuristic for neighbor selection generally leads to better or similar search performance compared to simply connecting to the nearest neighbors. This is significant because it ensures the algorithm remains efficient and effective across different types of data and search requirements.

Complexity Analysis

Search Complexity

Single Search Steps: On average, finding the closest element in a layer takes a constant number of steps, due to the fixed probability $p = e^{-mL}$ of moving to the next layer.
Layer Search Bound: The expected steps in a layer are bounded by $S = \frac{1}{1 - e^{-mL}}$ , independent of dataset size.
Distance Evaluations: The average number of distance evaluations per layer is $C \cdot S$ , where $C$ is the average node degree.
Overall Complexity: With the maximum layer index scaling as $O(\log(N))$ , the overall search complexity is $O(\log(N))$ .

Construction Complexity

Insertion Process: Each element insertion involves K-ANN searches across layers, with complexity fixed by $efConstruction$ .
Layer Count: The average number of layers for an element is $E[l+1] = -\ln(\text{uniform}(0,1)) \cdot mL$ .
Insertion Scaling: For low-dimensional datasets, the insertion complexity scales as $O(N \cdot \log(N))$ .

Memory Cost

Graph Connections: Memory usage is dominated by graph connections, with $M_{\text{max0}}$ connections at the base layer and $M_{\text{max}}$ at other layers.
Average Memory: The average memory per element is $(M_{\text{max0}} + mL \cdot M_{\text{max}}) \cdot \text{bytes\_per\_link}$ .
Typical Requirements: For $M$ values between 6 and 48, memory requirements are about 60-450 bytes per object, excluding data size.

Conclusion

Hierarchical NSW (HNSW) improves on K-ANN search by leveraging structure decomposition and smart neighbor selection. This approach not only addresses the limitations of basic NSW but also sets new benchmarks in performance, especially with high-dimensional data.

By advancing the state-of-the-art in K-ANN search, HNSW enables efficient approximate nearest neighbor search for a wide range of applications. It remains the default algorithm for many vector databases including PGVector.