Can we have a scalable fastapi service with common cache across multiple workers or threads ?
So as it goes, we were using fastapi for one of the apps and a single instance of the app uses a lot of memory(for ml models).
Premise: I wanted to launch multiple instances of the app as python is single threaded and also be able to have a common cache across.
We can use uvicorn for launching multiple workers of fastapi.But uvicorn doesn’t support preload option that is we wanted to load the main app only once and still have multiple workers.
So I had to look at gunicorn and as gunicorn is a wsgi server, we had to use worker type as uvicorn and launch fastapi.
We can use preload option of gunicorn so that the app loads only once with multiple workers for handling the load.Check this.
Okk ! kool.
But I wanted to have a common datastructure(cache) across all the workers, so I instead went with the multiple threads option with just one worker.
Oops, but if we use any worker class other than gthread, gunicorn ignores it as in this case I had to use uvicorn worker for asgi interface between guvicorn and fastapi.
From here
Threads is only meaningful with the threaded worker. Every other worker type ignores that setting and runs one thread per process.
So, I cannot use threads.
Also if you are using an async framework such as fastapi, using threads is a bit orthogonal.
Ok, can I use multiple workers with the preload option and have a common datastructure which is loaded in the app as a module level variable.
Oops, but as per this a mutable cache is not possible between workers.
With or without the preload option you will end up with one background thread in each worker because when a process forks it forks all its threads. Whether the threads are created before the fork or after does not matter. In both cases the processes are independent once forked and do not share data structures. If you populate the data at module load time, that initial data will be visible to every worker. Future modifications will not be because they happen in separate processes. To share memory between processes (workers) you need to use a construct for explicitly sharing memory (/dev/shm, filesystem, network cache, db, etc).
You might be able to do the below, but you cannot change the data in itself as a common data structure.
So its not possible to have a common mutable cache across workers atleast in a straightforward way unless you employ other techniques.
Just to end this, lets see how does the preload option itself work.I came across the below blog which explains it well.
https://www.joelsleppy.com/blog/gunicorn-application-preloading/
P.S: References
https://github.com/tiangolo/fastapi/issues/2425
https://levelup.gitconnected.com/supercharging-pythons-scalability-1eec2f501dd5
https://stackoverflow.com/questions/38425620/gunicorn-workers-and-threads