The First Deadlock I Wrote

Ziming Wang | Jan 25, 2026 min read

The First Deadlock I Wrote

It must be so memorable for every programmer to write their first legendary bug. Here is the story of my first deadlock.

What I Was Building

I was working on a Python application designed to support both on-demand and scheduled report generation. It consisted of a FastAPI web server, which handled user on-demand requests for immediate report generation, and a background apscheduler thread that executed cron-based jobs for scheduled reports.

Both report types share the same generation logic and common in-memory state. In addition, I also planned to scale up report sending with a thread pool. Therefore, proper concurrency control is a must. Note that report generation is IO-intensive, threading in Python is okay here despite GIL.

The Buggy Code

This is a simplified version of my actual code to illustrate the problem.

I used several global singleton classes to manage data access—for example, a RedisClient class and a ReportDataAccess class that wraps CRUD operations. To avoid boilerplate, I implemented a singleton metaclass.

Here is a thread-unsafe version:

# Not thread-safe
class SingletonMeta(type):
    _instances: dict = {}

    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            instance = super().__call__(*args, **kwargs)
            cls._instances[cls] = instance
        return cls._instances[cls]

The basic implementation isn’t thread-safe. To fix this, I added a class-level threading.Lock:

import threading

class SingletonMeta(type):
    _instances: dict = {}
    _lock: threading.Lock = threading.Lock()

    def __call__(cls, *args, **kwargs):
        with cls._lock:
            if cls not in cls._instances:
                instance = super().__call__(*args, **kwargs)
                cls._instances[cls] = instance
        return cls._instances[cls]

Note that in reality, my application already had some locking on higher level to ensure report generations are serialized, so the singleton lock might seem redundant. However

  • SingletonMeta is a reusable utility class—I couldn’t guarantee it would only be used within this protected context in the future. Defensive locking here ensures thread-safe initialization regardless of how callers (could be someone else) use it.
  • I am planning to use threadpool for concurrent report generation, this means I will need to relax the higher-level locking to ensure better throughput in the future. Report serialization will no longer be a thing. Therefore, it is best to implement this as robust as possible from the start.

The Deadlock

After adding the lock, my program got stuck immediately upon startup. There were no logs printed out.

What went wrong? The problem is nested singleton initialization.

I was using the singleton class in a nested manner. For example, ReportDataAccess depends on RedisClient to get a Redis client and then abstracts data access operations on it. Here is a simplified version.

class RedisClient(metaclass=SingletonMeta):
    def __init__(self):
        self.client = redis.Redis(...)

class ReportDataAccess(metaclass=SingletonMeta):
    def __init__(self):
        self._redis = RedisClient()  # Nested singleton access!

    def get_my_key_random_crud(self):
        return self._redis.client.get("some_key")
  1. ReportDataAccess is initialized. It acquires _lock through SingletonMeta.__call__.
  2. Inside ReportDataAccess.__init__, it tries to instantiate RedisClient().
  3. RedisClient also uses SingletonMeta, so it tries to acquire the same _lock.

Since threading.Lock in Python is not reentrant, the thread blocks waiting for a lock it already holds. The result is a classic self-entrant deadlock.

Debugging with py-spy

How did I debug this?

  • pdb can’t attach to an already-running process so I need to insert the code somewhere before hanging - but I didn’t know where it hangs in the first place
  • faulthandler can dump stack traces, but also requires injecting code before the hang occurs.
  • debugpy + VS Code debugger works for live debugging, but it takes ages to start up and I need to configure bunch of things. Big hater of it.

Time for my favorite tool: py-spy! It can be attached to a live Python process and shows you exactly what’s running—stack traces, local variables, and more—with very low overhead.

I often use its top feature to do live linux-top style profiling, or use dump to dump stack trace immediately. I would go with top first since it shows live updates. It worked many times for me before, but somehow this time it showed empty stacks in my work dev box, might be a env issue + this deadlock situation. So I went with dump

Here is a minimal reproduction script. We don’t even need multiple threads as the main thread can already lock itself out.

import threading

class SingletonMeta(type):
    _instances: dict = {}
    _lock: threading.Lock = threading.Lock()

    def __call__(cls, *args, **kwargs):
        with cls._lock:
            if cls not in cls._instances:
                instance = super().__call__(*args, **kwargs)
                cls._instances[cls] = instance
        return cls._instances[cls]

class RedisClient(metaclass=SingletonMeta):
    def __init__(self):
        self.client = "I am a client"

class ReportDataAccess(metaclass=SingletonMeta):
    def __init__(self):
        self._redis = RedisClient()  # Deadlock happens here!

if __name__ == "__main__":
    b = ReportDataAccess()  # This will hang

Running py-spy dump -p <pid> -l on the stuck process:

ziming@ZW-G15 [~/dev]
>>> py-spy dump -p 756 -l
Process 756: python3 test-deadlock.py
Python v3.10.12 (/usr/bin/python3.10)

Thread 756 (idle): "MainThread"
    __call__ (test-deadlock.py:8) # RedisClient can't acquire the lock because ReportDataAccess already holds it
        Arguments:
            cls: <SingletonMeta at 0x555c4345c600>
        Locals:
            args: ()
            kwargs: {}
    __init__ (test-deadlock.py:20) # ReportDataAccess is trying to create RedisClient()
        Arguments:
            self: <ReportDataAccess at 0x7fef8d123df0>
    __call__ (test-deadlock.py:10) # ReportDataAccess acquired the lock upon its creation through _call__
        Arguments:
            cls: <SingletonMeta at 0x555c4345c9c0>
        Locals:
            args: ()
            kwargs: {}
    <module> (test-deadlock.py:23)

The stack trace shows the thread entered SingletonMeta.__call__ for ReportDataAccess (holding _lock), then inside ReportDataAccess.__init__, it tries to create RedisClient(), which attempts to acquire the same _lock — deadlock happened.

The Solution: Use RLock

The fix is simple: replace threading.Lock with threading.RLock (Reentrant Lock). An RLock allows the same thread to acquire the lock multiple times without blocking.

class SingletonMeta(type):
    _instances: dict = {}
    _lock: threading.RLock = threading.RLock()  # RLock instead of Lock

    def __call__(cls, *args, **kwargs):
        with cls._lock:
            if cls not in cls._instances:
                instance = super().__call__(*args, **kwargs)
                cls._instances[cls] = instance
        return cls._instances[cls]

Thoughts

Did I know reentrant lock was a thing before this? Yes, I systematically learned concurrency in Java (which is even more comprehensive than Python’s concurrency model). But did I even have this in mind when writing this code? NOPE, lol.

I guess it’s a common experience: making mistakes teaches more effectively than reading textbooks and having courses. Now this concept is forged in my brain.

Ads Time

[Link] Java Concurrency and Multithreading Course This course is amazing. I highly recommend it if you want to systematically learn concurrency. But just don’t be like me and forget this reentrant lock thing after studying this great course :)

  • Even if you don’t write Java often, I still recommend learning these concepts in Java (or C/C++) rather than Python, Go, etc. (Please don’t mention JS lol). Python’s threading is very limited by GIL, and it is very beginger-friendly so it misses many important low-level concepts. However, asyncio is such a good library and I am a huge fan.

[Link] Java Virtual Threads This is an HUGE feature introduced in Java 21. I feel like it completely boosted Java to another level. Virtual threads are lightweight threads managed by the Java runtime rather than the OS (unlike traditional “Platform Threads”). This allows you to create millions of threads without the heavy overhead. Java is more performant than ever now.

[Link] py-spy My go-to Python profiler and live debugger.