One of the key components of Microsoft’s Copilot Runtime edge AI development platform for Windows is a new vector search technology, DiskANN (Disk Accelerated Nearest Neighbors). Building on a long-running Microsoft Research project, DiskANN is a way of building and managing vector indexes inside your applications. It uses a mix of in-memory and disk storage to map an in-memory quantized vector graph to a high-precision graph help on disk.
What is DiskANN?
Although it’s not an exact match, you can think of DiskANN as the vector index equivalent of tools like SQLite. Added to your code, it gives you a straightforward way to search across a vector index made up of semantic embeddings from a small language model (SLM) such as the Copilot Runtime’s Phi Silica.
It’s important to understand that DiskANN is not a database; it’s a set of algorithms delivered as a tool for adding vector indexes to other stores that aren’t designed to support vector searches. This makes it an ideal companion to other embedded stores, whether relational or a NoSQL key value store.
The requirement for in-memory and disk storage helps explain some of the hardware specifications for Copilot+ PCs, with double the previous Windows base memory requirements as well as larger, faster SSDs. Usefully, there’s a lower CPU requirement over other vector search algorithms, with at-scale implementations in Azure services requiring only 5% of the CPU traditional methods use.
You’ll need a separate store for the data that’s being indexed. Having separate stores for both your indexes and the source of your embeddings does have its issues. If you’re working with personally identifiable information or other regulated data, you can’t neglect ensuring that the source data is encrypted. This can add overhead on queries, but interestingly Microsoft is working on software-based secure enclaves that can both encrypt data at rest and in use, reducing the risk of PII leaking or prompts being manipulated by malware.
DiskANN is an implementation of an approximate nearest neighbor search, using a Vamana graph index. It’s designed to work with data that changes frequently, which makes it a useful tool for agent-like AI applications that need to index local files or data held in services like Microsoft 365, such as email or Teams chats.
Getting started with diskannpy
A useful quick start comes in the shape of the diskannpy Python implementation. This provides classes for building indexes and for searching. There’s the option to use numerical analysis Python libraries such as NumPy to build and work with indexes, tying it into existing data science tools. It also allows you to use Jupyter notebooks in Visual Studio Code to test indexes before building applications around them. Taking a notebook-based approach to prototyping will allow you to develop elements of an SLM-based application separately, passing results between cells.
Start by using either of the two Index Builder classes to build either a hybrid or in-memory vector index from the contents of a NumPy array or a DiskANN format vector file. The diskannpy library contains tools that can build this file from an array, which is a useful way of adding embeddings to an index quickly. Index files are saved to a specified directory, ready for searching. Other features let you update indexes, supporting dynamic operations.
Searching is again a simple class, with a query array containing the search embedding, along with parameters that define the number of neighbors to be returned, along with the complexity of the list. A bigger list will take longer to deliver but will be more accurate. The trade-off between accuracy and latency makes it essential to run experiments before committing to final code. Other options allow you to improve performance by batching up queries. You’re able to define the complexity of the index, as well as the type of distance metric used for searches. Larger values for complexity and graph degree are better, but the resulting indexes do take longer to create.
Diskannpy is a useful tool for learning how to use DiskANN. It’s likely that as the Copilot Runtime evolves, Microsoft will deliver a set of wrappers that provides a high-level abstraction, much like the one it’s delivering for Cosmos DB. There’s a hint of how this might work in the initial Copilot Runtime announcement, with reference to a Vector Embeddings API used to build retrieval-autmented generation (RAG)-based applications. This is planned for a future update to the Copilot Runtime.
Why DiskANN?
Exploring the GitHub repository for the project, it’s easy to see why Microsoft picked DiskANN to be one of the foundational technologies in the Copilot Runtime, as it’s optimized for both SSD and in-memory operations, and it can provide a hybrid approach that indexes a lot of data economically. The initial DiskANN paper from Microsoft Research suggests that a hybrid SSD/RAM index can index five to ten times as many vectors as the equivalent pure in-memory algorithm, able to address about a billion vectors with high search accuracy and with 5ms latency.
In practice, of course, an edge-hosted SLM application isn’t likely to need to index that much data, so performance and accuracy should be higher.
If you’re building a semantic AI application on an SLM, you need to focus on throughput, using a small number of tokens for each operation. If you can keep the search needed to build grounded prompts for a RAG application as fast as possible, you reduce the risk of unhappy users waiting for what might be a simple answer.
By loading an in-memory index at launch, you can simplify searches so that your application only needs to access source data when it’s needed to construct a grounded prompt for your SLM. One useful option is the ability to add filters to a search, refining the results and providing more accurate grounding for your application.
We’re in the early days of the Copilot Runtime, and some key pieces of the puzzle are still missing. One essential for using DiskANN indexes is tools for encoding your source data as vector embeddings. This is required to build a vector search, either as part of your code or to ship a base set of vector indexes with an application.
DiskANN elsewhere in Microsoft
Outside of the Copilot Runtime, Microsoft is using DiskANN to add fast vector search to Cosmos DB. Other services that use it include Microsoft 365 and Bing. In Cosmos DB it’s adding vector search to its NoSQL API, where you are likely to work with large amounts of highly distributed data. Here DiskANN’s support for rapidly changing data works alongside Cosmos DB’s dynamic scaling, adding a new index to each new partition. Queries can then be passed to all available partition indexes in parallel.
Microsoft Research has been working on tools like DiskANN for some time now, and it’s good to see them jump from pure research to product, especially products as widely used as Cosmos DB and Windows. Having a fast and accurate vector index as part of the Copilot Runtime will reduce the risks associated with generative AI and will keep your indexes on your PC, keeping the source data private and grounding SLMs. Combined with confidential computing techniques in Windows, Microsoft looks like it could be ready to deliver secure, private AI on our own devices.