Supported Platforms

Relevant source files

This document provides a comprehensive overview of the CPU architectures and platforms supported by the percpu crate ecosystem. It covers architecture-specific per-CPU register usage, implementation details, and platform-specific considerations for integrating per-CPU data management.

For information about setting up these platforms in your build environment, see Installation and Setup. For details about the architecture-specific code generation mechanisms, see Architecture-Specific Code Generation.

Supported Architecture Overview

The percpu crate supports four major CPU architectures, each utilizing different registers for per-CPU data access:

Architectureper-CPU RegisterRegister TypeSupported Variants
x86_64GS_BASESegment base registerStandard x86_64
AArch64TPIDR_ELxThread pointer registerEL1, EL2 modes
RISC-VgpGlobal pointer register32-bit, 64-bit
LoongArch$r21General purpose register64-bit

Platform Register Usage

flowchart TD
subgraph subGraph2["Access Methods"]
    X86_ASM["mov gs:[offset]"]
    ARM_ASM["mrs/msr instructions"]
    RISCV_ASM["lui/addi with gp"]
    LOONG_ASM["lu12i.w/ori with $r21"]
end
subgraph subGraph1["Per-CPU Registers"]
    GS["GS_BASESegment Register"]
    TPIDR["TPIDR_ELxThread Pointer"]
    GP["gpGlobal Pointer"]
    R21["$r21General Purpose"]
end
subgraph subGraph0["Architecture Support"]
    X86["x86_64"]
    ARM["AArch64"]
    RISCV["RISC-V"]
    LOONG["LoongArch"]
end

ARM --> TPIDR
GP --> RISCV_ASM
GS --> X86_ASM
LOONG --> R21
R21 --> LOONG_ASM
RISCV --> GP
TPIDR --> ARM_ASM
X86 --> GS

Sources: README.md(L19 - L31)  percpu_macros/src/arch.rs(L21 - L46) 

Platform-Specific Implementation Details

x86_64 Architecture

The x86_64 implementation uses the GS_BASE model-specific register to store the base address of the per-CPU data area. This approach leverages the x86_64 segment architecture for efficient per-CPU data access.

Key characteristics:

  • Uses GS_BASE MSR for per-CPU base address storage
  • Accesses data via gs:[offset] addressing mode
  • Requires offset values ≤ 0xffff_ffff for 32-bit displacement
  • Supports optimized single-instruction memory operations
flowchart TD
subgraph subGraph1["Generated Assembly"]
    MOV_READ["mov reg, gs:[offset VAR]Direct read"]
    MOV_WRITE["mov gs:[offset VAR], regDirect write"]
    ADDR_CALC["mov reg, offset VARadd reg, gs:[__PERCPU_SELF_PTR]"]
end
subgraph subGraph0["x86_64 Implementation"]
    GS_BASE["GS_BASE MSRBase Address"]
    OFFSET["Symbol OffsetCalculated at compile-time"]
    ACCESS["gs:[offset] AccessSingle instruction"]
end

ACCESS --> ADDR_CALC
ACCESS --> MOV_READ
ACCESS --> MOV_WRITE
GS_BASE --> ACCESS
OFFSET --> ACCESS

Sources: percpu_macros/src/arch.rs(L66 - L76)  percpu_macros/src/arch.rs(L131 - L150)  percpu_macros/src/arch.rs(L232 - L251) 

AArch64 Architecture

The AArch64 implementation uses thread pointer registers that vary based on the exception level. The system supports both EL1 (kernel) and EL2 (hypervisor) execution environments.

Key characteristics:

  • Uses TPIDR_EL1 by default, TPIDR_EL2 when arm-el2 feature is enabled
  • Requires mrs/msr instructions for register access
  • Offset calculations limited to 16-bit immediate values (≤ 0xffff)
  • Two-instruction sequence for per-CPU data access

The register selection is controlled at compile time:

flowchart TD
subgraph subGraph2["Access Pattern"]
    MRS["mrs reg, TPIDR_ELxGet base address"]
    CALC["add reg, offsetCalculate final address"]
    ACCESS["ldr/str operationsMemory access"]
end
subgraph subGraph1["Register Usage"]
    EL1["TPIDR_EL1Kernel Mode"]
    EL2["TPIDR_EL2Hypervisor Mode"]
end
subgraph subGraph0["AArch64 Register Selection"]
    FEATURE["arm-el2 feature"]
    DEFAULT["Default Mode"]
end

CALC --> ACCESS
DEFAULT --> EL1
EL1 --> MRS
EL2 --> MRS
FEATURE --> EL2
MRS --> CALC

Sources: percpu_macros/src/arch.rs(L55 - L62)  percpu_macros/src/arch.rs(L79 - L80)  README.md(L33 - L35) 

RISC-V Architecture

The RISC-V implementation uses the global pointer (gp) register for per-CPU data base addressing. This is a deviation from typical RISC-V conventions where gp is used for global data access.

Key characteristics:

  • Uses gp register instead of standard thread pointer (tp)
  • Supports both 32-bit and 64-bit variants
  • Uses lui/addi instruction sequence for address calculation
  • Offset values limited to 32-bit signed immediate range

Important note: The tp register remains available for thread-local storage, while gp is repurposed for per-CPU data.

flowchart TD
subgraph subGraph2["Memory Operations"]
    LOAD["ld/lw/lh/lb instructionsBased on data type"]
    STORE["sd/sw/sh/sb instructionsBased on data type"]
end
subgraph subGraph1["Address Calculation"]
    LUI["lui reg, %hi(VAR)Load upper immediate"]
    ADDI["addi reg, reg, %lo(VAR)Add lower immediate"]
    FINAL["add reg, reg, gpAdd base address"]
end
subgraph subGraph0["RISC-V Register Usage"]
    GP["gp registerPer-CPU base"]
    TP["tp registerThread-local storage"]
    OFFSET["Symbol offset%hi/%lo split"]
end

ADDI --> FINAL
FINAL --> LOAD
FINAL --> STORE
GP --> FINAL
LUI --> ADDI
OFFSET --> LUI

Sources: percpu_macros/src/arch.rs(L81 - L82)  percpu_macros/src/arch.rs(L33 - L39)  README.md(L28 - L31) 

LoongArch Architecture

The LoongArch implementation uses the $r21 general-purpose register for per-CPU data base addressing. This architecture provides native support for per-CPU data patterns.

Key characteristics:

  • Uses $r21 general-purpose register for base addressing
  • Supports 64-bit architecture (LoongArch64)
  • Uses lu12i.w/ori instruction sequence for address calculation
  • Provides specialized load/store indexed instructions
flowchart TD
subgraph subGraph1["Instruction Sequences"]
    LU12I["lu12i.w reg, %abs_hi20(VAR)Load upper 20 bits"]
    ORI["ori reg, reg, %abs_lo12(VAR)OR lower 12 bits"]
    LDX["ldx.d/ldx.w/ldx.h/ldx.buLoad indexed"]
    STX["stx.d/stx.w/stx.h/stx.bStore indexed"]
end
subgraph subGraph0["LoongArch Implementation"]
    R21["$r21 registerPer-CPU base"]
    CALC["Address calculationlu12i.w + ori"]
    INDEXED["Indexed operationsldx/stx instructions"]
end

CALC --> LU12I
LU12I --> ORI
ORI --> LDX
ORI --> STX
R21 --> INDEXED

Sources: percpu_macros/src/arch.rs(L83 - L84)  percpu_macros/src/arch.rs(L40 - L46)  percpu_macros/src/arch.rs(L114 - L129) 

Continuous Integration and Testing Coverage

The percpu crate maintains comprehensive testing across all supported platforms through automated CI/CD pipelines. The testing matrix ensures compatibility and correctness across different target environments.

CI Target Matrix:

TargetArchitectureEnvironmentTest Coverage
x86_64-unknown-linux-gnux86_64Linux userspaceFull tests + unit tests
x86_64-unknown-nonex86_64Bare metal/no_stdBuild + clippy only
riscv64gc-unknown-none-elfRISC-V 64Bare metal/no_stdBuild + clippy only
aarch64-unknown-none-softfloatAArch64Bare metal/no_stdBuild + clippy only
loongarch64-unknown-none-softfloatLoongArch64Bare metal/no_stdBuild + clippy only

Testing Strategy:

flowchart TD
subgraph subGraph2["Feature Combinations"]
    SP_NAIVE["sp-naive featureSingle-core testing"]
    PREEMPT["preempt featurePreemption safety"]
    ARM_EL2["arm-el2 featureHypervisor mode"]
end
subgraph subGraph1["Test Types"]
    BUILD["Build TestsAll targets"]
    UNIT["Unit Testsx86_64 Linux only"]
    LINT["Code QualityAll targets"]
end
subgraph subGraph0["CI Pipeline"]
    MATRIX["Target Matrix5 platforms tested"]
    FEATURES["Feature Testingpreempt, arm-el2"]
    TOOLS["Quality Toolsclippy, rustfmt"]
end

BUILD --> ARM_EL2
BUILD --> PREEMPT
BUILD --> SP_NAIVE
FEATURES --> ARM_EL2
FEATURES --> PREEMPT
FEATURES --> SP_NAIVE
MATRIX --> BUILD
MATRIX --> LINT
MATRIX --> UNIT

Sources: .github/workflows/ci.yml(L8 - L32) 

Platform-Specific Limitations and Considerations

macOS Development Limitation

The crate includes a development-time limitation for macOS hosts, where architecture-specific assembly code is disabled and replaced with unimplemented stubs. This affects local development but not target deployment.

Offset Size Constraints

Each architecture imposes different constraints on the maximum offset values for per-CPU variables:

  • x86_64: 32-bit signed displacement (≤ 0xffff_ffff)
  • AArch64: 16-bit immediate value (≤ 0xffff)
  • RISC-V: 32-bit signed immediate (split hi/lo)
  • LoongArch: 32-bit absolute address (split hi20/lo12)

Register Conflicts and Conventions

  • RISC-V: Repurposes gp register, deviating from standard ABI conventions
  • AArch64: Requires careful exception level management for register selection
  • x86_64: Depends on GS segment register configuration by system initialization

Sources: percpu_macros/src/arch.rs(L4 - L13)  percpu_macros/src/arch.rs(L23 - L46)  README.md(L28 - L35)