Memory Ordering and Atomic Operations
Relevant source files
This document covers the atomic operations and memory ordering semantics used in the kspin crate's spinlock implementation. It details how the BaseSpinLock
uses platform-specific atomic primitives to ensure thread safety in multi-core environments, and how these operations are optimized away for single-core systems.
For information about the overall spinlock architecture and guard types, see Core Implementation Architecture. For details about the SMP feature flag system, see SMP vs Single-Core Implementation.
Conditional Atomic Operations
The kspin crate uses conditional compilation to include atomic operations only when targeting multi-core systems. The smp
feature flag controls whether atomic synchronization primitives are compiled into the final binary.
flowchart TD subgraph subGraph2["Lock Operations"] AtomicOps["compare_exchange_weakcompare_exchangeload/store operations"] NoOps["Always succeedsNo atomic operationsOptimized away"] end subgraph subGraph1["Data Structures"] AtomicLock["lock: AtomicBool"] NoLockField["No lock field"] end subgraph subGraph0["Compilation Paths"] SMPEnabled["SMP Feature Enabled"] SMPDisabled["SMP Feature Disabled"] end AtomicLock --> AtomicOps NoLockField --> NoOps SMPDisabled --> NoLockField SMPEnabled --> AtomicLock
The BaseSpinLock
struct conditionally includes an AtomicBool
field based on the smp
feature:
Compilation Mode | Lock Field | Behavior |
---|---|---|
SMP enabled | lock: AtomicBool | Full atomic synchronization |
SMP disabled | No lock field | All lock operations succeed immediately |
Sources: src/base.rs(L13 - L14) src/base.rs(L29 - L30) src/base.rs(L111 - L117)
Memory Ordering Semantics
The spinlock implementation uses three specific memory orderings to ensure correct synchronization semantics while minimizing performance overhead:
flowchart TD subgraph Operations["Operations"] CompareExchange["compare_exchange(_weak)Acquire + Relaxed"] Store["store(false)Release"] Load["load()Relaxed"] end subgraph subGraph0["Memory Ordering Types"] Acquire["Ordering::AcquireLock acquisition"] Release["Ordering::ReleaseLock release"] Relaxed["Ordering::RelaxedStatus checks"] end Acquire --> CompareExchange Relaxed --> CompareExchange Relaxed --> Load Release --> Store
Memory Ordering Usage Patterns
Operation | Success Ordering | Failure Ordering | Purpose |
---|---|---|---|
compare_exchange_weak | Acquire | Relaxed | Lock acquisition with retry |
compare_exchange | Acquire | Relaxed | Single attempt lock acquisition |
store(false) | Release | N/A | Lock release |
load() | Relaxed | N/A | Non-blocking status check |
The Acquire
ordering on successful lock acquisition ensures that all subsequent reads and writes cannot be reordered before the lock acquisition. The Release
ordering on lock release ensures that all previous reads and writes complete before the lock is released.
Sources: src/base.rs(L85) src/base.rs(L131) src/base.rs(L161) src/base.rs(L224) src/base.rs(L113)
Atomic Operation Flow
The spinlock uses a two-phase approach for lock acquisition: an optimistic compare-and-swap followed by a passive wait loop.
flowchart TD Start["lock() called"] AcquireGuard["G::acquire()"] TryLock["compare_exchange_weak(false, true)"] CheckResult["Success?"] CreateGuard["Create BaseSpinLockGuard"] CheckLocked["is_locked()"] SpinLoop["core::hint::spin_loop()"] StillLocked["Still locked?"] AcquireGuard --> TryLock CheckLocked --> StillLocked CheckResult --> CheckLocked CheckResult --> CreateGuard SpinLoop --> CheckLocked Start --> AcquireGuard StillLocked --> SpinLoop StillLocked --> TryLock TryLock --> CheckResult
Lock Acquisition Pattern
The lock()
method implements an efficient two-stage spinning strategy:
- Active spinning: Attempts to acquire the lock using
compare_exchange_weak
- Passive waiting: When acquisition fails, enters a read-only spin loop checking
is_locked()
- CPU optimization: Uses
core::hint::spin_loop()
to signal the processor during busy waiting
Sources: src/base.rs(L77 - L101) src/base.rs(L83 - L92) src/base.rs(L89 - L91)
Try-Lock Pattern
The try_lock()
method uses a single-shot approach with compare_exchange
(strong semantics) rather than the weak variant used in the spinning loop:
flowchart TD TryLock["try_lock()"] AcquireGuard["G::acquire()"] CompareExchange["compare_exchange(false, true)"] CheckResult["Success?"] ReturnSome["Some(guard)"] ReturnNone["None"] AcquireGuard --> CompareExchange CheckResult --> ReturnNone CheckResult --> ReturnSome CompareExchange --> CheckResult TryLock --> AcquireGuard
The strong compare-exchange is used because there's no retry loop, making the single operation more likely to succeed on architectures where weak operations can fail spuriously.
Sources: src/base.rs(L122 - L149) src/base.rs(L129 - L132)
Lock Release Mechanism
Lock release occurs automatically through the RAII Drop
implementation of BaseSpinLockGuard
. The release process ensures proper memory ordering and guard state cleanup:
sequenceDiagram participant BaseSpinLockGuard as "BaseSpinLockGuard" participant AtomicBoollock as "AtomicBool lock" participant GBaseGuard as "G: BaseGuard" Note over BaseSpinLockGuard: Guard goes out of scope BaseSpinLockGuard ->> AtomicBoollock: store(false, Ordering::Release) BaseSpinLockGuard ->> GBaseGuard: G::release(irq_state) Note over BaseSpinLockGuard: Guard destroyed
The release ordering ensures that all memory operations performed while holding the lock are visible to other threads before the lock becomes available.
Sources: src/base.rs(L218 - L227) src/base.rs(L224) src/base.rs(L225)
Performance Optimizations
CPU Spin Loop Hints
The implementation uses core::hint::spin_loop()
to provide architecture-specific optimizations during busy waiting:
- x86/x86_64: Translates to the
PAUSE
instruction, reducing power consumption - ARM: May use
YIELD
or similar instructions - RISC-V: Architecture-specific power management hints
Weak vs Strong Compare-Exchange
The spinlock strategically chooses between weak and strong compare-exchange operations:
Context | Operation | Rationale |
---|---|---|
Spinning loop | compare_exchange_weak | Acceptable spurious failures in retry context |
Single attempt | compare_exchange | Must succeed if possible, no retry mechanism |
Sources: src/base.rs(L85) src/base.rs(L131) src/base.rs(L127 - L128)
Single-Core Optimization
When compiled without the smp
feature, all atomic operations are eliminated:
flowchart TD subgraph Benefits["Benefits"] ZeroCost["Zero-cost abstraction"] NoMemoryBarriers["No memory barriers"] OptimizedBinary["Smaller binary size"] end subgraph subGraph0["SMP Disabled Path"] LockCall["lock()"] GuardAcquire["G::acquire()"] CreateGuard["Create guard immediately"] NoAtomics["No atomic operations"] end CreateGuard --> NoAtomics GuardAcquire --> CreateGuard LockCall --> GuardAcquire NoAtomics --> NoMemoryBarriers NoAtomics --> OptimizedBinary NoAtomics --> ZeroCost
This optimization is particularly important for embedded systems where multi-core synchronization overhead would be unnecessary.
Sources: src/base.rs(L79 - L93) src/base.rs(L126 - L136) src/base.rs(L160 - L161) src/base.rs(L223 - L224)