Performance Benchmarks

Relevant source files

This document covers the performance benchmark suite for the allocator crate, which provides standardized testing methodology to compare the performance characteristics of different memory allocation implementations. The benchmarks evaluate allocation patterns commonly found in real-world applications and measure relative performance against the system allocator baseline.

For information about the individual allocator implementations being tested, see Allocator Implementations. For details about the integration test suite, see Integration Tests.

Benchmark Architecture

The benchmark infrastructure is built using the Criterion benchmarking framework and provides a controlled testing environment for evaluating allocator performance across different workload patterns.

Benchmark Framework Overview

flowchart TD
subgraph subGraph3["Allocator Instances"]
    SYS_ALLOC["std::alloc::System"]
    TLSF_ALLOC["AllocatorRc"]
    SLAB_ALLOC["AllocatorRc"]
    BUDDY_ALLOC["AllocatorRc"]
end
subgraph subGraph2["Memory Pool Management"]
    POOL_NEW["MemoryPool::new()"]
    POOL_SLICE["MemoryPool::as_slice()"]
    ALLOC_ZEROED["std::alloc::alloc_zeroed()"]
    LAYOUT["Layout::from_size_align(size, 4096)"]
end
subgraph subGraph1["Test Infrastructure"]
    POOL["MemoryPool::new(POOL_SIZE)"]
    BENCH_FN["bench(c, alloc_name, alloc)"]
    CRITERION["Criterion::benchmark_group()"]
end
subgraph subGraph0["Benchmark Entry Point"]
    MAIN["criterion_main!(benches)"]
    GROUP["criterion_group!(benches, criterion_benchmark)"]
end

BENCH_FN --> BUDDY_ALLOC
BENCH_FN --> CRITERION
BENCH_FN --> POOL
BENCH_FN --> SLAB_ALLOC
BENCH_FN --> SYS_ALLOC
BENCH_FN --> TLSF_ALLOC
GROUP --> BENCH_FN
MAIN --> GROUP
POOL --> POOL_NEW
POOL --> POOL_SLICE
POOL_NEW --> ALLOC_ZEROED
POOL_NEW --> LAYOUT
POOL_SLICE --> BUDDY_ALLOC
POOL_SLICE --> SLAB_ALLOC
POOL_SLICE --> TLSF_ALLOC

The benchmark system creates a 128MB memory pool (POOL_SIZE = 1024 * 1024 * 128) for testing custom allocators, while the system allocator operates directly on system memory. Each allocator is wrapped in AllocatorRc to provide the standard library Allocator trait interface.

Sources: benches/collections.rs(L16)  benches/collections.rs(L80 - L98)  benches/utils/mod.rs(L9 - L18) 

Test Scenario Implementation

flowchart TD
subgraph subGraph3["btree_map Implementation"]
    BT_NEW["BTreeMap::new_in(alloc.clone())"]
    BT_RNG["SmallRng::seed_from_u64(0xdead_beef)"]
    BT_INSERT["m.insert(key, value)"]
    BT_REMOVE["m.pop_first()"]
    BT_CLEAR["m.clear()"]
end
subgraph subGraph2["vec_rand_free Implementation"]
    VR_ALLOC["Vec::with_capacity_in(blk_size, alloc)"]
    VR_RNG["SmallRng::seed_from_u64(0xdead_beef)"]
    VR_SHUFFLE["index.shuffle(&mut rng)"]
    VR_FREE["v[i] = Vec::new_in(alloc)"]
end
subgraph subGraph1["vec_push Implementation"]
    VP_NEW["Vec::new_in(alloc.clone())"]
    VP_LOOP["for _ in 0..n"]
    VP_PUSH["v.push(0xdead_beef)"]
    VP_DROP["drop(v)"]
end
subgraph subGraph0["Benchmark Functions"]
    VEC_PUSH["vec_push(n, alloc)"]
    VEC_RAND["vec_rand_free(n, blk_size, alloc)"]
    BTREE_MAP["btree_map(n, alloc)"]
end

BTREE_MAP --> BT_NEW
BTREE_MAP --> BT_RNG
BT_INSERT --> BT_CLEAR
BT_REMOVE --> BT_CLEAR
BT_RNG --> BT_INSERT
BT_RNG --> BT_REMOVE
VEC_PUSH --> VP_NEW
VEC_RAND --> VR_ALLOC
VEC_RAND --> VR_RNG
VP_LOOP --> VP_PUSH
VP_NEW --> VP_LOOP
VP_PUSH --> VP_DROP
VR_RNG --> VR_SHUFFLE
VR_SHUFFLE --> VR_FREE

Each benchmark function implements a specific allocation pattern designed to stress different aspects of allocator performance. The functions use deterministic random seeds to ensure reproducible results across benchmark runs.

Sources: benches/collections.rs(L18 - L24)  benches/collections.rs(L26 - L44)  benches/collections.rs(L46 - L61) 

Test Scenarios

The benchmark suite includes four distinct test scenarios that evaluate different aspects of allocator performance:

BenchmarkFunctionParametersPurpose
vec_push_3Mvec_pushn=3,000,000Sequential allocation stress test
vec_rand_free_25K_64vec_rand_freen=25,000, blk_size=64Small block fragmentation test
vec_rand_free_7500_520vec_rand_freen=7,500, blk_size=520Large block fragmentation test
btree_map_50Kbtree_mapn=50,000Mixed allocation/deallocation pattern

Sequential Allocation Test

The vec_push_3M benchmark evaluates pure allocation performance by pushing 3 million u32 values into a vector. This test measures:

  • Allocation throughput for growing data structures
  • Memory reallocation efficiency during vector growth
  • Allocator overhead for sequential memory requests

Fragmentation Tests

The vec_rand_free tests evaluate allocator behavior under fragmentation stress:

Small Block Test (vec_rand_free_25K_64):

  • Allocates 25,000 blocks of 64 bytes each
  • Randomly deallocates blocks using shuffled indices
  • Tests small allocation efficiency and fragmentation handling

Large Block Test (vec_rand_free_7500_520):

  • Allocates 7,500 blocks of 520 bytes each
  • Randomly deallocates blocks using shuffled indices
  • Tests allocator behavior with larger allocation sizes

Mixed Workload Test

The btree_map_50K benchmark simulates realistic application behavior:

  • Performs 50,000 operations on a BTreeMap
  • 20% probability of removal operations (rng.next_u32() % 5 == 0)
  • 80% probability of insertion operations
  • Uses string keys with dynamic allocation
  • Tests allocator performance under mixed allocation patterns

Sources: benches/collections.rs(L65 - L77)  benches/collections.rs(L26 - L44)  benches/collections.rs(L46 - L61) 

Allocator Testing Matrix

The benchmark system tests four different allocators to provide comprehensive performance comparison:

Allocator Configuration

flowchart TD
subgraph subGraph3["Allocator Initialization"]
    TLSF_NEW["TlsfByteAllocator::new()"]
    SLAB_NEW["SlabByteAllocator::new()"]
    BUDDY_NEW["BuddyByteAllocator::new()"]
end
subgraph subGraph2["Memory Pool"]
    SYS["std::alloc::System"]
    POOL_MEM["MemoryPool (128MB)"]
    POOL_SLICE["&mut [u8] slice"]
end
subgraph subGraph1["Custom Allocators"]
    TLSF_RC["AllocatorRc"]
    SLAB_RC["AllocatorRc"]
    BUDDY_RC["AllocatorRc"]
end
subgraph subGraph0["System Allocator"]
    SYS["std::alloc::System"]
    SYS_DESC["Direct system memoryBaseline reference"]
    POOL_MEM["MemoryPool (128MB)"]
end

BUDDY_NEW --> BUDDY_RC
POOL_MEM --> POOL_SLICE
POOL_SLICE --> BUDDY_RC
POOL_SLICE --> SLAB_RC
POOL_SLICE --> TLSF_RC
SLAB_NEW --> SLAB_RC
SYS --> SYS_DESC
TLSF_NEW --> TLSF_RC

Each custom allocator is initialized with the same 128MB memory pool to ensure fair comparison. The AllocatorRc wrapper provides reference counting and the standard library Allocator trait implementation.

Sources: benches/collections.rs(L82 - L97)  benches/utils/mod.rs(L10 - L14) 

Memory Pool Utility

The MemoryPool utility provides controlled memory management for benchmark testing:

MemoryPool Implementation

The MemoryPool struct manages a fixed-size memory region for allocator testing:

// Memory pool allocation with 4KB alignment
let layout = Layout::from_size_align(size, 4096).unwrap();
let ptr = NonNull::new(unsafe { std::alloc::alloc_zeroed(layout) }).unwrap();

Key characteristics:

  • Size: 128MB (POOL_SIZE = 1024 * 1024 * 128)
  • Alignment: 4KB page alignment for optimal performance
  • Initialization: Zero-filled memory using alloc_zeroed
  • Lifetime: Automatic cleanup through Drop implementation

The pool provides a mutable slice interface (as_slice()) that allocators use for their internal memory management, ensuring all custom allocators operate within the same controlled memory environment.

Sources: benches/utils/mod.rs(L9 - L18)  benches/utils/mod.rs(L21 - L25)  benches/collections.rs(L16) 

Running Benchmarks

The benchmark suite uses the Criterion framework for statistical analysis and result reporting. To execute the benchmarks:

cargo bench --features full

The --features full flag enables all allocator implementations for comprehensive testing. The benchmark configuration includes:

  • Sample size: 10 iterations per test for statistical significance
  • Measurement: Wall clock time for complete test scenarios
  • Output: Statistical analysis including mean, standard deviation, and confidence intervals

Individual allocator benchmarks can be run using Criterion's filtering capability:

cargo bench -- tlsf        # Run only TLSF allocator benchmarks
cargo bench -- vec_push    # Run only vector push benchmarks

Sources: benches/collections.rs(L68)  benches/collections.rs(L100 - L101) 

Benchmark Results Interpretation

The benchmark results provide comparative performance metrics across allocators:

Performance Metrics

Throughput Comparison: Results show relative performance against the system allocator baseline, helping identify which allocators perform best for specific workload patterns.

Fragmentation Behavior: The random free tests reveal how effectively each allocator handles memory fragmentation and reuses freed blocks.

Mixed Workload Performance: The BTreeMap benchmark demonstrates allocator behavior under realistic application usage patterns with mixed allocation and deallocation operations.

Expected Performance Characteristics

  • TLSF: Consistent O(1) allocation/deallocation with good fragmentation resistance
  • Slab: Optimal for fixed-size allocations, may show overhead for variable sizes
  • Buddy: Good general-purpose performance with power-of-two block sizes
  • System: Reference baseline representing operating system allocator performance

Sources: benches/collections.rs(L63 - L78)