asmqdm: Python Progress Bars Created in x86-64 Assembly

Sole developer · Personal project

systemsassemblyperformancepython ~5ns per update in async mode via LOCK XADDRender thread on a separate core via clone() and sched_setaffinity

Problem / Context

Progress bars in Python add per-iteration overhead that can matter in tight loops. I wanted to see how low that overhead could go if the update path was a single atomic instruction in assembly, with rendering handled by a separate thread on a different CPU core.

Approach

Implemented the core in x86-64 NASM as a shared library (libasmqdm.so). All I/O, memory allocation, timing, and thread management use raw Linux syscalls.
Built two modes: sync (renders inline with a 50ms throttle) and async (spawns a render thread via clone() that polls at ~60fps while the main thread updates a counter with LOCK XADD).
In async mode, the render thread is pinned to a different core than Python using sched_setaffinity, so the two never contend for the same CPU.
Wrapped the assembly API in a Python package using ctypes. The Python layer handles iterator protocol, context manager, and argument validation. The assembly layer handles everything that touches the terminal or the clock.
All arithmetic is integer. Time is tracked in nanoseconds, rates in iterations per second, percentages via integer division. No floating-point state to save or restore.

Results

Async update overhead is roughly 5-10 nanoseconds per call, dominated by the LOCK XADD instruction. Sync mode runs at ~500-1000ns with the render throttle.
The Python API is compatible with tqdm’s iterator and context manager patterns, so switching is a one-line change.
Each progress bar allocates ~650 bytes (state + render buffer) via mmap. Async mode adds a ~65KB thread stack.

What I’d Do Next

Add nested progress bar support with cursor positioning.
Benchmark against tqdm on real workloads (data loading, model training) to measure the practical gap.
Port the async mode to io_uring for render writes to further reduce syscall overhead.