Systems / Performance Project

Lock-Free Ring Buffer Benchmark

A lock-free single-producer, single-consumer ring buffer in C++ with a comprehensive benchmark suite for analyzing contention strategies, queueing behavior, and latency tradeoffs.

Ring buffer producer consumer diagram

Key Features

  • Lock-free SPSC implementation using atomics
  • Four wait strategies for contention analysis
  • High-resolution latency metrics
  • CPU affinity support for consistent scheduling
  • Capacity sweep for backpressure effects

Architecture

  • Fixed-size circular buffer backed by contiguous storage
  • Separate producer and consumer indices for SPSC ownership
  • Acquire and release memory ordering protects data visibility
  • Cache-line alignment reduces false sharing between hot atomics
  • Benchmark harness is separated from queue implementation logic

How It Works

  • Producer writes an item into the current tail slot
  • Producer release-stores the new tail so the consumer can see the write
  • Consumer acquire-loads the tail before reading from the buffer
  • Consumer reads from the current head slot and advances the head
  • Failed pushes mean the buffer is full, while failed pops mean it is empty
Benchmark Output
=== Throughput: Wait Strategy Comparison ===
Benchmark                  msgs/s(M)      fail_push      fail_pop
-----------------------------------------------------------------
Spin                       16.41          19             1921676
Yield                      27.38          204            32985
SpinThenYield              24.55          1499           551720
ExponentialBackoff         20.21          2645           327247

=== Latency: Wait Strategy Comparison ===
Benchmark                  avg(ns)        p50        p95        p99        max
-------------------------------------------------------------------------------
Spin                       93846.92       92441.60   106425.20  115458.20  134788.60
Yield                      78351.52       77100.00   88166.40   96366.60   139583.20
SpinThenYield              92853.83       92175.00   100424.80  110566.40  128641.80
ExponentialBackoff         98410.68       95683.60   120675.00  131950.00  174291.60

=== Throughput: Capacity Sweep ===
Capacity                   msgs/s(M)      fail_push      fail_pop
-----------------------------------------------------------------
64                         16.54          24904          1261885
256                        17.12          18212          813192
1024                       22.49          1261           694032
8192                       27.24          3653           207194

=== Latency: Capacity Sweep ===
Capacity                   avg(ns)        p50        p95        p99        max
-------------------------------------------------------------------------------
64                         6313.10        5966.80    7533.20    8774.80    75908.40
256                        23229.12       25991.20   25991.80   36441.60   40866.60
1024                       92822.79       92116.80   99266.60   116816.80  148625.00
8192                       765703.84      741950.00  934133.40  1067608.60 1311267.00

Highlights

  • Lock-free SPSC queue built with C++20 atomics
  • Benchmarks compare Spin, Yield, SpinThenYield, and ExponentialBackoff strategies
  • Measures throughput, failed push/pop attempts, and tail latency percentiles
  • Uses CPU affinity to reduce scheduling noise during benchmark runs
  • Explores how queue capacity affects throughput and tail latency
  • Demonstrates acquire/release synchronization in a real producer-consumer system
  • Separates benchmark logic, wait strategies, formatting, and queue implementation

Tech

C++20 Atomics Threads CMake Linux
Performance icon

Performance Analysis

  • Compare wait strategies under contention
  • Measure throughput and tail latency
  • Understand CPU scheduling impact
  • Identify tradeoffs between latency and throughput
Learning icon

What I Learned

  • Yielding improves throughput under contention
  • Larger buffers increase latency due to queueing
  • Backoff strategies are not always beneficial
  • Thread affinity exposes true system behavior
Tech stack icon

Tech Stack

  • C++20 (atomics, threads, chrono)
  • CMake
  • Linux (sched_setaffinity)
  • Custom benchmark harness