A Visual Guide to the Hierarchical Navigable Small World Algorithm


Error
This article is a work in progress and is not ready to read. The text, code examples, and the visualizations are works-in-progress and will be updated.

We’ll dive into the inner workings of an approximate nearest neighbor (ANN) algorithm called Hierarchical Navigable Small Worlds (HNSW). HNSW is commonly used in vector databases to enhance the speed of similarity search. The article uses interactive visuals and demos to demonstrate the fundamental concepts of the HNSW algorithm and includes a live demo of HNSW itself.

Introduction to HNSW

Hierarchical NSW is one of several approximate nearest neighbor (ANN) algorithms developed to speed up similarity search. Initially proposed by Yu. A. Malkov and D. A. Yashunin in 2016, HNSW builds on the concept of two other algorithms:

  • Navigable Small World (NSW) graphs - a type of graph that allows for fast approximate nearest neighbor search
  • Skip Lists - a data structure allowing fast insertion and search via a multi-layer linked list structure.

Combining both benefits, HNSW generates a multi-layer NSW graph that enables a logarithmic time complexity for approximate nearest neighbor search. Its performance was significantly better compared to previous state-of-the-art ANN algorithms.

Mastering the concepts of skip lists and navigable small world data structures is a prerequisite for comprehending the inner workings of HNSW. This understanding will serve as a solid foundation for our exploration of HNSW.

Skip Lists

Skip Lists enhance singly linked lists by introducing multiple layers, achieving O(log n) average time complexity for searches and insertions. This multi-layered approach allows for efficient element “skipping” during searches. Unlike traditional linked lists where each node points to the next, Skip List nodes have multiple pointers for different layers, facilitating quicker navigation. The node’s height, indicating its layer stack, ranges from a dense, fully connected, bottom layer to a sparse top layer.

An example below demonstrates searching for a node valued 5 in a Skip List.

Node heights in a skip list are probabilistically assigned, with each additional layer usually having a halved chance of inclusion, with the probability p being set to a default value of 0.5. This means a node’s height is akin to flipping a coin: a head adds another layer, while a tail stops the process, making taller nodes less common as layers increase.

This results in a structure with multiple levels, with the top level containing the sparsest amount of nodes and the bottom level being the most dense. The lowest level contains all the elements like a regular linked list, except ordered. Inserts and searches on a skip list have an average time complexity of O(log n), compared to O(n) linear time for a linked list.

This unique structure enables the “skipping” of elements during search, which is the property that gives skip lists performance advantages over traditional linked lists and a similar performance profile to balanced trees.

Play with the interactive demo below to see how this skipping works in practice to speed up search:

The animation above shows a Skip List search for a node with value 5. The search starts at the top level, comparing the target value with adjacent nodes. If the target matches or is less than the next node, the search moves to that node. Otherwise, if the next node’s value is greater or it’s the end of the list, the search drops to the next lower level. This process repeats until the desired node is found at the lowest level.

This method is significantly faster than a standard linear search in a linked list. The Hierarchical Navigable Small World (HNSW) algorithm applies a similar concept to gain performance improvements during search and insertion, but using a graph instead of a linked list.

Navigable Small World (NSW) provides a method for constructing and traversing a graph that facilitates fast approximate nearest neighbor search. A greedy search of this graph has a polylogarithmic time complexity of O(logᵏn), which can be faster than the linear time complexity of a naive KNN search.

The NSW graph employs a strategy known as greedy routing during search and insertion. The search process in NSW commences by selecting an entry point.

Typically the entry point is the first node ever inserted. Early nodes typically remain a starting point because long-range edges are likely to be created during the initial graph construction phase. These edges play a vital role in graph navigation. They serve as bridges between network hubs, maintaining overall graph connectivity and enabling logarithmic scaling of the number of hops during greedy routing.

The algorithm then takes the set of all of the neighbors of the current node (a.k.a the neighborhood) plus the current node itself and computes the distances between the elements of this set with the query vector (the vector we are trying to find similar vectors to). If a node closer to the query vector is found among that set, that node becomes the current node, and the process continues at that node. The search procedure concludes when the algorithm cannot find a neighbor node closer to the query than the current node itself. This node is then returned as the response to the query.

However, this greedy strategy does not guarantee the discovery of the best nearest neighbor as it uses only local information at each step to make decisions. One of the properties of this algorithm is early stopping, which happens when an iteration finds no better neighbor nodes than the current one. This property is what gives NSW its improved performance compared to naive search, but runs the risk that the node returned is only a local minimum instead of the global minimum.

The insertion process mirrors the search process, tracing the same path it would have taken if it were a search query. It establishes connections to its K nearest vertices upon reaching a local minimum.

The demo below demonstrates this process. It begins running automatically, but you can pause it to manually add nodes and perform searches to better understand how the algorithm operates.

Building Graph: adding new nodes to the graph.

Using the demo, here’s an interesting challenge: see if you can generate an example of early stopping, where the node returned from the search is not actually the closet node but a local minima.

Hierarchical Navigable Small World (HNSW)

The HNSW is essentially a multi layer NSW. Similar to a skip list, whenever a node is added to the graph, it is assigned a height probabilisticly with decaying probability as the height of the node increases. This means that each node will not just have one set of neighbors, but actually a set of neighbors for each level they are on.

Simplisticly, during the search, the inital node is chosen by selecting the last inserted node with the tallest height. From that node, a greedy search is conducted, starting with that initial tallest node and at the top level. The greedy search is constrained at that layer until a local minimum/nearest neighbor is found for that specific layer. For all the layers above the bottom layer (index 0), only one node is selected.

Once a node is selected and there are no closer nodes at that layer, the search continues with that node, except at the next layer down. Once the bottom layer has been reached, we extend the search to return k nearest neighbors instead of just the single closest one.

This architecture and strategy brings down the complexity of the search and insert operations to O(log(n)) on average.

The below interactive demonstration of HNSW shows how search is conducted. Pay attention to how search is conducted at each layer, with a neighborhood being constructed to find closer nodes at the current layer. The query node is highlighted in yellow. The current node with be the one with the animated dark blue/red border. Neighborhood search of nodes will highlight neigbors with a light blue fill. The most similar nodes returned from the search will be highlighted yellow.

  • k = 2
  • Auto-running graph traversal

HNSW Insertion and Search Deep Dive

The Hierarchical Navigable Small World (HNSW) algorithm enhances the Navigable Small World (NSW) model with several significant changes:

  1. ef Parameter for Enhanced Search Quality: Unlike NSW, which relies on multi-search counts, HNSW employs the ef (effective search parameter) for refined control over search precision.

  2. Heuristic-Based Neighbor Selection: HNSW introduces a heuristic approach to selecting neighbors, instead of merely choosing the closest neighbors. This method considers the spatial distribution of neighbors to prevent clustering.

  3. Optimization Parameters mL and Mmax0: With the introduction of mL for level generation and Mmax0 for maximum connections, HNSW optimizes both graph construction and maintains navigability.

These enhancements collectively boost HNSW’s performance in finding nearest neighbors.

Graph Construction and Insertion

To add a new point, or ‘q’, to our system, we initiate the process at the top layer of our multi-layered graph. Initially, we set the ‘ef’ parameter to 1, conducting a simple greedy search to locate the closest neighbors to ‘q’ without introducing additional parameters. ‘Ef’ plays a crucial role in controlling the quality of our search, ensuring we identify the most relevant points.

Upon identifying these neighbors at the top, we proceed to the next layer, employing the previously discovered neighbors as our new starting points for further search. This iterative process continues down each layer until we reach the bottom. It’s during the transition to a layer that is equal or less than ‘l’ that we initiate the second phase of the construction algorithm. This phase is distinguished by two key adjustments:

  1. the ‘ef’ parameter is increased from 1 to ‘efConstruction’, enhancing the recall of the greedy search, and
  2. the closest neighbors found on each layer are also considered as candidates for the connections of the inserted element ‘q’.

Throughout this search, we maintain a dynamic list, known as a priority queue (pq), of the ‘ef’ closest neighbors at each layer. This list, starting with our initial entry points, is continually updated as we discover closer neighbors. The update mechanism relies on a greedy search algorithm, which evaluates the neighbors of the closest point we have yet to examine.

A specialized stop condition enhances the efficiency of this process. Should a candidate point for evaluation be further from ‘q’ than the furthest point in our list, it is disregarded. This criterion ensures our search remains narrowly focused on genuinely close neighbors, preventing it from becoming overly broad.

This refined method, leveraging ‘ef’ adjustments and priority queues, represents a significant improvement over previous models. It enables precise tuning of our search process, yielding better performance and accuracy.

The search algorithm mirrors the insertion logic but aims to return the closest neighbors. It similarly adjusts the ef parameter to modulate search quality, ensuring an efficient and effective neighbor search.

Nearest Neighbor Heuristic and Neighbor Selection

HNSW improves neighbor selection over NSW by not only choosing the closest MM neighbors but also considering their spatial distribution to avoid clustering. This approach introduces two arguments: extendCandidatesextendCandidates, which adds more potential neighbors in densely populated areas, and keepPrunedConnectionskeepPrunedConnections, which enforces a limit on the number of connections each point can have, defined by MmaxM_{\text{max}}. At the network’s base layer, a specific limit, Mmax0M_{\text{max0}}, applies. Exceeding neighbor limits triggers pruning based on these criteria.

This enhanced method of neighbor selection is effective in various scenarios, including simple data landscapes, precise searches within complex datasets, or densely clustered points. It excels in clustered environments by preventing the algorithm from being confined to cluster edges.

The probabilistic function for determining a node’s graph height slightly alters the conventional skip list model, with the average layer count for adding an element being a constant dependent on mLm_L:

E[l+1]=E[ln(uniform(0,1))mL]E[l+1] = E[-\ln(\text{uniform}(0,1)) \cdot m_L]

Algorithm construction parameters such as mLm_L, which influences the distribution of nodes across different layers, and \(M_{\text{max0}}\), defining the maximum number of connections a node can have at the base layer, play crucial roles in maintaining the navigability of the small world graphs constructed.

Setting mLm_L to zero, which corresponds to a single-layer graph, and \(M_{\text{max0}}\) to (M), results in the creation of directed k-NN graphs known for their power-law search complexity, as previously studied.

Conversely, setting mLm_L to zero and \(M_{\text{max0}}\) to infinity yields NSW graphs characterized by polylogarithmic complexity. Introducing a non-zero value for mLm_L leads to the emergence of controllable hierarchy graphs, enabling logarithmic search complexity through the introduction of layers. To optimize the performance benefits of the controllable hierarchy, it’s essential to minimize the overlap between neighbors across different layers, which is achieved by reducing mLm_L. However, decreasing mLm_L also increases the average number of hops required during a greedy search on each layer, adversely affecting performance. This trade-off suggests an optimal value exists for the mLm_L parameter. A simple yet effective choice for mLm_L is \(1/\ln(M)\), analogous to the skip list parameter \(p = 1/M\), ensuring minimal overlap between layers.

The remaining key construction parameter, denoted as MM, represents the number of connections established for each element in the graph. It has an optimal range between 5 and 48. The selection of MM significantly influences the algorithm’s memory consumption and its performance across different recalls and data dimensionalities. The selection of the \(efConstruction\) parameter should ensure a K-ANNS recall close to unity during the construction process, with 0.95 being sufficient for most applications. This parameter, as suggested, could potentially be auto-configured using sample data.

In all of the considered cases, use of the heuristic for proximity graph neighbors selection leads to a higher or similar search performance compared to the naïve connection to the nearest neighbors.