Softlandia background



Deploying Qdrant with gRPC & Auth on Azure: A FastAPI Singleton Client Guide

This tutorial continues my earlier blog post Unlocking Qdrant + Python on Azure: A Guide to Deployment and Authentication written last year. Qdrant and and Qdrant’s Python client have undergone major changes since then, and this blog post will present an upgraded approach to deploying and integrating Qdrant client into a Qdrant instance running on Azure App Service. Please remember that this approach is suitable for small to medium-sized projects, and will not scale into huge amounts of vectors.

What’s new?

In the previous tutorial, we deployed Qdrant on the Azure app service, set up Azure’s built-in authentication, and used the Python qdrant-client to connect to that instance via HTTPS in synchronous mode via simple Python scripts.

In this updated version, we will deploy Qdrant to Azure and enable HTTP/2, which allows us to use the gRPC version of the client in asynchronous mode. Additionally, we will see an example of a singleton instance of the Qdrant client via FastAPI dependency injection.

Softlandia’s recent contribution to the Qdrant Python client makes it easy to use the client as a singleton instance when using bearer token authentication. We can now pass a Bearer token provider function directly to the Qdrant client. This way we can avoid re-instantiating the client whenever the access token expires.

The code, ARM template, and more info can be found in the GitHub repository.

gRPC on Azure App Service

In the previous example, we used a synchronous HTTPS-based Qdrant client to access the Qdrant instance via its HTTP API. The Qdrant Python client got a proper asynchronous HTTP client in version 1.6.1 which was not available when the tutorial was written.

Please refer to the previous tutorial for more detailed instructions about deployment. For clarity, we focus only on the settings needed to enable gRPC on Azure App Service. The full ARM template is available in the Github repository.

To enable gRPC in Azure App Service, you need to turn on some special settings. First, in Configuration -> General Settings, enable HTTP 2.0 and gRPC features. See the screenshot below.

Enabling HTTP/2 and gRPC on Azure App Service

Then, go to environment variables, and add the following settings:



These properties are required to route the incoming traffic inside the app service to the ports where Qdrant docker container is listening for HTTP and gRPC calls. Now we have a Qdrant instance to which we can connect via gRPC or HTTPS.

Qdrant client and bearer token authentication provider

The latest Python Qdrant client, version 1.9.0, has a new parameter called auth_token_provider that is used to pass an authentication token provider callback function to the client. This function is called before each request and it should return a valid bearer access token that is added to the call headers. Softlandia contributed this feature to the client.

Azure App Service authentication can be set up like in the previous example. After that, we can try to connect to the instance with the latest version of the Qdrant client. Here’s the sample code to do that.

pip install qdrant-client azure-identity aiohttp

import qdrant_client

from azure.identity.aio import DefaultAzureCredential

# replace with Managed Identity Credential in production
credential = DefaultAzureCredential(exclude_interactive_browser_credential=False)

async def get_token():
    result = await credential.get_token("your_qdrant_app_scope")
    return result.token

client = qdrant_client.AsyncQdrantClient(
    gRPC_port=443    prefer_gRPC=True,

async def main():
    async with credential:
        await client.get_collections()

if __name__ == "__main__":

If that works, congratulations! You have successfully set up a gRPC-enabled instance of Qdrant to Azure App Service! Next, we will integrate a Qdrant client into a FastAPI application via a singleton pattern.

Qdrant client as a singleton dependency in FastAPI app

As we use gRPC, it’s important to remember that the Qdrant client should not be re-created for each request. During initialization, the client creates gRPC channels for communicating with the gRPC server. The channel creation is an expensive process. Creating the client again for each request would result in suboptimal performance. It is often common to use HTTP clients as singletons as well.

In many frameworks, for example in ASP.NET, it's easy to manage the service lifetimes and tell the framework to register some service as a singleton. Unfortunately, transient dependency behavior is the default in FastAPI dependencies, and handling service lifetimes is not as straightforward as in other more mature frameworks and programming languages.

Transient lifetime means the client would be created each time it’s injected into some service in your FastAPI app. You can work around this by caching the instance, but even in that case the service lifetime is scoped just to the duration of the request and the client would be disposed of after that. To make the client singleton, we must register it during the startup of FastAPI. This can be accomplished using the lifespan function, and adding a dependency override. The full FastAPI example app can be seen below.

pip install uvicorn[standard] fastapi

from contextlib import asynccontextmanager
from fastapi import Depends, FastAPI
from qdrant_client import AsyncQdrantClient
from azure.identity.aio import DefaultAzureCredential

# replace with Managed Identity Credential in production
credential = DefaultAzureCredential(exclude_interactive_browser_credential=False)

async def get_token():
    result = await credential.get_token(
    return result.token

class Qdrant:
    def __init__(self):
        self._client = AsyncQdrantClient(

    async def get_collections(self):
        return await self._client.get_collections()

async def lifespan(app: FastAPI):
    qdrant = Qdrant()
    # singleton
    app.dependency_overrides[Qdrant] = lambda: qdrant
    await qdrant._client.close()
    await credential.close()

app = FastAPI(lifespan=lifespan)

async def get_collections(qdrant_client: Qdrant = Depends(Qdrant)):
    results = await qdrant_client.get_collections()
    return results

We have now wrapped the Qdrant client into a simple class that encapsulates the client object and instantiates an instance of the class in the lifespan function. Then we add it to the dependency overrides. This way the Qdrant client instance stays the same during the whole lifecycle of the application. Whenever the access token is invalid, the Azure Identity library will refresh it. Otherwise, a cached token is returned.


We demonstrated a secure and performant method for deploying the Qdrant vector database on Azure. This approach is suited for small to medium-sized projects, ensuring effective management and interaction with your vector data. By leveraging Azure's authentication and the latest Python qdrant-client features contributed by Softlandia, we can now use a singleton instance of the client with FastAPI without worrying about expired authentication tokens and failed requests.

Softlandia is committed to open source and we contribute continuously to highly valued libraries, especially in the applied AI ecosystem. Some of our earlier contributions can be found, for example, in Llamaindex, Guardrails and opencv-python. This means that you probably use code written by some of our employees, whether you know it or not.

Send us a message if you are interested in utilizing our top-notch expertise in applied AI and check out our AI engine YOKOT.AI as well!