A Visual Guide to the Hierarchical Navigable Small World Algorithm
This post is part two in this series about similarity search. To read the first part of this series, check out A Visual Guide to Vector Search and K-Nearest Neighbors.
We’ll dive into the inner workings of an approximate nearest neighbor (ANN) algorithm called Hierarchical Navigable Small Worlds (HNSW). HNSW is commonly used algorithm in vector databases to enhance the speed of similarity search. Throughout this article, we’ll use interactive visuals and demos to demonstrate the fundamental concepts of the HNSW algorithm.
Introduction to HNSW
Hierarchical NSW is one of several approximate nearest neighbor (ANN) algorithms developed to speed up similarity search. Initially proposed by Yu. A. Malkov and D. A. Yashunin in 2016, HNSW builds on the concept of two other algorithms:
- Navigable Small World (NSW) graphs - a type of graph that allows for fast approximate nearest neighbor search
- Skip Lists - a data structure allowing fast insertion and search via a multi-layer linked list structure.
Combining both, HNSW generates a multi-layer NSW graph that enables a logarithmic time complexity for approximate nearest neighbor search. Its performance was significantly better compared to previous state-of-the-art ANN algorithms.
Mastering the concepts of skip lists and navigable small world data structures is a prerequisite for comprehending the inner workings of HNSW. This understanding will serve as a solid foundation for our exploration of HNSW.
Skip Lists
An example below demonstrates searching for a node valued 5 in a Skip List.
Skip Lists enhance singly linked lists by introducing multiple layers, with each node having a randomly assigned height. Unlike traditional linked lists where each node points to the next, Skip List nodes have multiple pointers for different layers. The node’s height, indicating its layer stack, ranges from a dense, fully connected, bottom layer to a sparse top layer. This multi-layered approach allows for efficient element “skipping” during searches
Node heights in a skip list are probabilistically assigned on insertion of a node, with each additional layer usually having a halved chance of inclusion, with the probability p being set to a default value of 0.5. This means a node’s height is akin to flipping a coin: a head adds another layer, while a tail stops the process, making taller nodes less common as layers increase.
This results in a structure with multiple levels, with the top level containing the sparsest amount of nodes and the bottom level being the most dense.
The lowest level contains all the elements like a regular linked list, except ordered. Inserts and searches on a skip list have an average time complexity of O(log n), compared to O(n) linear time for a linked list.
This unique structure enables the “skipping” of elements during search, which is the property that gives skip lists performance advantages over traditional linked lists and a similar performance profile to balanced trees.
Play with the interactive demo below to see how this skipping works in practice to speed up search:
The animation above shows a Skip List search for a node with value 5. The search starts at the top level, comparing the target value with adjacent nodes. If the target matches or is less than the next node, the search moves to that node. Otherwise, if the next node’s value is greater or it’s the end of the list, the search drops to the next lower level. This process repeats until the desired node is found at the lowest level.
This method is significantly faster than a standard linear search in a linked list. The Hierarchical Navigable Small World (HNSW) algorithm applies a similar concept to gain performance improvements during search and insertion, but using a graph instead of a linked list.
Navigation Small World (NSW) Graphs
Navigable Small World (NSW) provides a method for constructing and traversing a graph that facilitates fast approximate nearest neighbor search. A greedy search of this graph has a polylogarithmic time complexity of O(logᵏn), which can be faster than the linear time complexity of a naive KNN search.
The NSW graph employs a strategy known as greedy routing during search and insertion. The search process in NSW commences by selecting an entry point.
Typically the entry point is the first node ever inserted. Early nodes typically remain a starting point because long-range edges are likely to be created during the initial graph construction phase. These edges play a vital role in graph navigation. They serve as bridges between network hubs, maintaining overall graph connectivity and enabling logarithmic scaling of the number of hops during greedy routing.
The algorithm then takes the set of all of the neighbors of the current node (a.k.a the neighborhood) plus the current node itself and computes the distances between the elements of this set with the query vector (the vector we are trying to find similar vectors to). If a node closer to the query vector is found among that set, that node becomes the current node, and the process continues at that node. The search procedure concludes when the algorithm cannot find a neighbor node closer to the query than the current node itself. This node is then returned as the response to the query.
However, this greedy strategy does not guarantee the discovery of the best nearest neighbor as it uses only local information at each step to make decisions. One of the properties of this algorithm is early stopping, which happens when an iteration finds no better neighbor nodes than the current one. This property is what gives NSW its improved performance compared to naive search, but runs the risk that the node returned is only a local minimum instead of the global minimum.
To mitigate this local minima problem, NSW can employ multiple searches from different random starting points (called multi-search counts), combining results to find better approximate nearest neighbors. However, this approach requires running the search algorithm multiple times, which increases computational cost.
The insertion process mirrors the search process, tracing the same path it would have taken if it were a search query. It establishes connections to its K nearest vertices upon reaching a local minimum.
The demo below demonstrates this process. It begins running automatically, but you can pause it to manually add nodes and perform searches to better understand how the algorithm operates.
Using the demo, here’s an interesting challenge: see if you can generate an example of early stopping, where the node returned from the search is not actually the closet node but a local minima.
Hierarchical Navigable Small World (HNSW)
The HNSW is essentially a NSW and a skip list combined, forming a multi layer NSW. Similar to a skip list, whenever a node is added to the graph, it is assigned a height probabilistically. Having multiple layers means that each node in the graph will not just have one set of neighbors, but actually a set of neighbors for each level they are on.
The below interactive demonstration of HNSW shows how search is conducted. Pay attention to how search is conducted at each layer, with a neighborhood being constructed to find closer nodes at the current layer. The query node is highlighted in yellow. The current node will be the one with the animated dark blue/red border. Neighborhood search of nodes will highlight neighbors with a light blue fill. The most similar nodes returned from the search will be highlighted yellow.
HNSW Insertion and Search Deep Dive
The Hierarchical Navigable Small World (HNSW) algorithm enhances the Navigable Small World (NSW) model with several key improvements. Unlike NSW, which relies on multi-search counts to improve accuracy, HNSW employs the ef parameter (effective search size) for more refined control over search precision. The algorithm also introduces a heuristic approach to selecting neighbors that considers spatial distribution to prevent clustering, rather than merely choosing the closest neighbors. Finally, HNSW adds optimization parameters like for level generation and for maximum connections, which together optimize graph construction while maintaining navigability.
Search Algorithm
When searching for k nearest neighbors in an existing HNSW graph, the algorithm starts at the entry point (the node with the highest layer) and performs a greedy graph traversal downward through the layers. At each layer, it explores neighbors of the current node, always moving to whichever neighbor is closest to the query point. This greedy search continues until no neighbor is closer than the current node—a local minimum at that layer.
At upper layers (all layers above layer 0), the algorithm uses ef=1, meaning it maintains only the single closest node found during traversal. This closest node at each layer becomes the starting point for the search at the next layer down. Once a local minimum is found at a layer, the search descends to the next layer down and continues the greedy traversal.
At layer 0 (the bottom layer), the search strategy changes. The ef parameter increases to the desired search quality value, allowing the algorithm to maintain a list of multiple candidates instead of just one. The larger the ef value, the more candidates the algorithm explores, improving accuracy at the cost of speed. The search returns the k closest nodes found. This architecture brings down the complexity of search operations to O(log(n)) on average.
Graph Construction and Insertion
While search and insertion follow similar traversal patterns, they differ in how they use the ef parameter. When inserting a new node and building the graph, the algorithm again begins at the entry point at the highest layer and performs greedy graph traversal downward. However, the ef parameter behavior is different from search.
The traversal starts at the top layer and works downward. At each layer, the algorithm explores the graph by examining neighbors of the current node, always moving to whichever neighbor is closest to the new node being inserted. This greedy search continues until no neighbor is closer than the current node—a local minimum at that layer.
The search strategy changes depending on which layers the new node will occupy. Each node is assigned a maximum layer during insertion through probabilistic selection. If a new node is assigned to layer 2, it will exist in layers 0, 1, and 2. For layers above where the new node will exist, the algorithm uses ef=1, meaning it only tracks the single closest node found during traversal. This closest node at each layer becomes the starting point for the search at the next layer down.
Once the traversal reaches a layer where the new node will exist, the search strategy shifts. The ef parameter increases to efConstruction (a larger value), allowing the algorithm to maintain a list of multiple candidate neighbors instead of just one. The algorithm continues this broader search through each remaining layer down to layer 0.
At each layer where the new node exists, it gets connected to M neighbors selected from the candidates found during traversal. The selection of which M neighbors to connect is where the simple versus heuristic approaches differ. This architecture brings down the complexity of insertion operations to O(log(n)) on average.
Choosing Which Neighbors to Connect
The HNSW paper presents two methods for selecting which M neighbors to connect when inserting a new node.
The simple selection method chooses the M closest nodes from the candidates found during traversal. This approach is computationally efficient and works well for most datasets. Our interactive demonstration uses this method.
The heuristic method aims to improve graph connectivity by considering both distance and diversity when selecting neighbors. Rather than always choosing the M closest nodes, it may select some slightly farther nodes if they provide connections to different regions of the graph. This helps prevent isolated clusters and dead ends. The heuristic also uses different connection limits: for upper layers and (typically ) for the base layer, with pruning when these limits are exceeded.
Graph Construction Parameters
Beyond neighbor selection, several key parameters control how the hierarchical graph structure is built. Like skip lists, each node’s layer assignment is determined probabilistically when inserted. A node inserted at layer appears in all layers from 0 up to . The layer assignment follows an exponential decay distribution:
The parameter acts as a normalization factor controlling how many nodes appear in higher layers. A smaller means fewer nodes in upper layers, creating a steeper hierarchy. Different parameter combinations affect search complexity:
- Setting with creates a flat k-NN graph where search time grows polynomially with dataset size.
- Setting with produces an NSW graph with polylogarithmic complexity.
- Using creates the full HNSW structure with logarithmic complexity, providing the best scaling for large datasets.
The choice of involves a trade-off. Smaller values reduce overlap between neighbor lists across layers, improving graph navigability. However, this also means fewer nodes per layer, requiring more traversal hops. A good choice for is , analogous to the skip list parameter .
Another parameter controls how many bi-directional connections each node can have. The maximum connections at layer 0 is controlled by , a separate parameter commonly set to to maintain strong graph connectivity at the base layer. Typical values range from 5 to 48, with higher values improving recall at the cost of memory and insertion time. During construction, the parameter controls search quality - higher values produce better graphs but take longer to build.
Complexity Analysis
Search Complexity
- Single Search Steps: On average, finding the closest element in a layer takes a constant number of steps, due to the fixed probability of moving to the next layer.
- Layer Search Bound: The expected steps in a layer are bounded by , independent of dataset size.
- Distance Evaluations: The average number of distance evaluations per layer is , where is the average node degree.
- Overall Complexity: With the maximum layer index scaling as , the overall search complexity is .
Construction Complexity
- Insertion Process: Each element insertion involves K-ANN searches across layers, with complexity fixed by .
- Layer Count: The average number of layers for an element is .
- Insertion Scaling: For low-dimensional datasets, the insertion complexity scales as .
Memory Cost
- Graph Connections: Memory usage is dominated by graph connections, with connections at the base layer and at other layers.
- Average Memory: The average memory per element is .
- Typical Requirements: For values between 6 and 48, memory requirements are about 60-450 bytes per object, excluding data size.
Conclusion
Hierarchical NSW (HNSW) improves on K-ANN search by leveraging structure decomposition and smart neighbor selection. This approach not only addresses the limitations of basic NSW but also sets new benchmarks in performance, especially with high-dimensional data.
By advancing the state-of-the-art in K-ANN search, HNSW enables efficient approximate nearest neighbor search for a wide range of applications. It remains the default algorithm for many vector databases including PGVector.