Cross-Platform Abstraction

Relevant source files

This document explains how the percpu crate provides a unified interface across different CPU architectures while leveraging each platform's specific per-CPU register mechanisms. The abstraction layer enables portable per-CPU data management by generating architecture-specific assembly code at compile time and providing runtime functions that adapt to each platform's register conventions.

For details about memory layout and initialization processes, see Memory Layout and Initialization. For implementation specifics of the code generation process, see Architecture-Specific Code Generation.

Architecture Support Matrix

The percpu system supports four major CPU architectures, each using different registers for per-CPU data access:

ArchitecturePer-CPU RegisterRegister PurposeFeature Requirements
x86_64GS_BASEMSR-based segment registerMSR read/write access
AArch64TPIDR_EL1/EL2Thread pointer registerEL1 or EL2 privilege
RISC-VgpGlobal pointer registerCustom convention
LoongArch64$r21General purpose registerNative ISA support

Architecture Register Abstraction

flowchart TD
subgraph subGraph2["Runtime Selection"]
    CFGIF["cfg_if! macrotarget_arch conditions"]
    ASM["Inline Assemblycore::arch::asm!"]
end
subgraph subGraph1["Architecture-Specific Implementation"]
    X86["x86_64IA32_GS_BASE MSRrdmsr/wrmsr"]
    ARM["AArch64TPIDR_EL1/EL2mrs/msr"]
    RISCV["RISC-Vgp registermv instruction"]
    LOONG["LoongArch64$r21 registermove instruction"]
end
subgraph subGraph0["Unified API"]
    API["read_percpu_reg()write_percpu_reg()init_percpu_reg()"]
end

API --> CFGIF
ARM --> ASM
CFGIF --> ARM
CFGIF --> LOONG
CFGIF --> RISCV
CFGIF --> X86
LOONG --> ASM
RISCV --> ASM
X86 --> ASM

Sources: README.md(L19 - L36)  percpu/src/imp.rs(L88 - L156) 

Runtime Register Management

The runtime system provides architecture-agnostic functions that internally dispatch to platform-specific register access code. Each architecture implements the same interface using its native register access mechanisms.

Runtime Register Access Functions

flowchart TD
subgraph subGraph4["LoongArch64 Implementation"]
    LA_READ["move {}, $r21"]
    LA_WRITE["move $r21, {}"]
end
subgraph subGraph3["RISC-V Implementation"]
    RV_READ["mv {}, gp"]
    RV_WRITE["mv gp, {}"]
end
subgraph subGraph2["AArch64 Implementation"]
    ARM_READ["mrs TPIDR_EL1/EL2"]
    ARM_WRITE["msr TPIDR_EL1/EL2"]
end
subgraph subGraph1["x86_64 Implementation"]
    X86_READ["rdmsr(IA32_GS_BASE)or SELF_PTR.read_current_raw()"]
    X86_WRITE["wrmsr(IA32_GS_BASE)+ SELF_PTR.write_current_raw()"]
end
subgraph subGraph0["Public API"]
    READ["read_percpu_reg()"]
    WRITE["write_percpu_reg()"]
    INIT["init_percpu_reg()"]
end

INIT --> WRITE
READ --> ARM_READ
READ --> LA_READ
READ --> RV_READ
READ --> X86_READ
WRITE --> ARM_WRITE
WRITE --> LA_WRITE
WRITE --> RV_WRITE
WRITE --> X86_WRITE

Sources: percpu/src/imp.rs(L91 - L168) 

Compile-Time Code Generation Abstraction

The macro system generates architecture-specific assembly code for accessing per-CPU variables. Each architecture requires different instruction sequences and addressing modes, which are abstracted through the code generation functions in percpu_macros/src/arch.rs.

Code Generation Pipeline by Architecture

flowchart TD
subgraph subGraph5["LoongArch64 Assembly"]
    LA_OFFSET["lu12i.w {0}, %abs_hi20({VAR})ori {0}, {0}, %abs_lo12({VAR})"]
    LA_PTR["move {}, $r21"]
    LA_READ["lu12i.w {0}, %abs_hi20({VAR})ori {0}, {0}, %abs_lo12({VAR})ldx.hu {0}, {0}, $r21"]
    LA_WRITE["lu12i.w {0}, %abs_hi20({VAR})ori {0}, {0}, %abs_lo12({VAR})stx.h {1}, {0}, $r21"]
end
subgraph subGraph4["RISC-V Assembly"]
    RV_OFFSET["lui {0}, %hi({VAR})addi {0}, {0}, %lo({VAR})"]
    RV_PTR["mv {}, gp"]
    RV_READ["lui {0}, %hi({VAR})add {0}, {0}, gplhu {0}, %lo({VAR})({0})"]
    RV_WRITE["lui {0}, %hi({VAR})add {0}, {0}, gpsh {1}, %lo({VAR})({0})"]
end
subgraph subGraph3["AArch64 Assembly"]
    ARM_OFFSET["movz {0}, #:abs_g0_nc:{VAR}"]
    ARM_PTR["mrs {}, TPIDR_EL1/EL2"]
    ARM_FALLBACK["*self.current_ptr()"]
end
subgraph subGraph2["x86_64 Assembly"]
    X86_OFFSET["mov {0:e}, offset {VAR}"]
    X86_PTR["mov {0}, gs:[offset __PERCPU_SELF_PTR]add {0}, offset {VAR}"]
    X86_READ["mov {0:x}, word ptr gs:[offset {VAR}]"]
    X86_WRITE["mov word ptr gs:[offset {VAR}], {0:x}"]
end
subgraph subGraph1["Generation Functions"]
    OFFSET["gen_offset()"]
    CURRENTPTR["gen_current_ptr()"]
    READRAW["gen_read_current_raw()"]
    WRITERAW["gen_write_current_raw()"]
end
subgraph subGraph0["Macro Input"]
    DEFPERCPU["#[def_percpu]static VAR: T = init;"]
end

CURRENTPTR --> ARM_PTR
CURRENTPTR --> LA_PTR
CURRENTPTR --> RV_PTR
CURRENTPTR --> X86_PTR
DEFPERCPU --> CURRENTPTR
DEFPERCPU --> OFFSET
DEFPERCPU --> READRAW
DEFPERCPU --> WRITERAW
OFFSET --> ARM_OFFSET
OFFSET --> LA_OFFSET
OFFSET --> RV_OFFSET
OFFSET --> X86_OFFSET
READRAW --> ARM_FALLBACK
READRAW --> LA_READ
READRAW --> RV_READ
READRAW --> X86_READ
WRITERAW --> ARM_FALLBACK
WRITERAW --> LA_WRITE
WRITERAW --> RV_WRITE
WRITERAW --> X86_WRITE

Sources: percpu_macros/src/arch.rs(L15 - L263) 

Feature Flag Configuration

The system uses Cargo features to adapt behavior for different deployment scenarios and platform capabilities:

FeaturePurposeEffect on Abstraction
sp-naiveSingle-core systemsDisables per-CPU registers, uses global variables
preemptPreemptible kernelsAddsNoPreemptGuardintegration
arm-el2ARM hypervisorsUsesTPIDR_EL2instead ofTPIDR_EL1

The arm-el2 feature specifically affects the AArch64 register selection:

// From percpu_macros/src/arch.rs:55-61
let aarch64_tpidr = if cfg!(feature = "arm-el2") {
    "TPIDR_EL2"
} else {
    "TPIDR_EL1"
};

Feature-Based Configuration Flow

flowchart TD
subgraph subGraph2["Implementation Selection"]
    REGSEL["Register Selection"]
    GUARDSEL["Guard Selection"]
    FALLBACK["Fallback Mechanisms"]
end
subgraph subGraph1["Feature Effects"]
    SPNAIVE["sp-naive→ Global variables→ No register access"]
    PREEMPT["preempt→ NoPreemptGuard→ kernel_guard crate"]
    ARMEL2["arm-el2→ TPIDR_EL2→ Hypervisor mode"]
end
subgraph subGraph0["Build Configuration"]
    FEATURES["Cargo.toml[features]"]
    CFGMACROS["cfg!() macros"]
    CONDITIONAL["Conditional compilation"]
end

ARMEL2 --> REGSEL
CFGMACROS --> CONDITIONAL
CONDITIONAL --> ARMEL2
CONDITIONAL --> PREEMPT
CONDITIONAL --> SPNAIVE
FEATURES --> CFGMACROS
PREEMPT --> GUARDSEL
SPNAIVE --> FALLBACK

Sources: README.md(L69 - L79)  percpu_macros/src/arch.rs(L55 - L61)  percpu/src/imp.rs(L105 - L108) 

Platform-Specific Implementation Details

Each architecture has unique characteristics that the abstraction layer must accommodate:

x86_64 Specifics

  • Uses Model-Specific Register (MSR) IA32_GS_BASE for per-CPU base pointer
  • Requires special handling for Linux userspace via arch_prctl syscall
  • Maintains SELF_PTR variable in per-CPU area for efficient access
  • Supports direct GS-relative addressing in assembly

AArch64 Specifics

  • Uses Thread Pointer Identification Register (TPIDR_EL1/EL2)
  • EL1 for kernel mode, EL2 for hypervisor mode (controlled by arm-el2 feature)
  • Limited offset range requires base+offset addressing for larger structures
  • Falls back to pointer arithmetic for complex access patterns

RISC-V Specifics

  • Repurposes Global Pointer (gp) register for per-CPU base
  • Thread Pointer (tp) remains available for thread-local storage
  • Uses lui/addi instruction pairs for address calculation
  • Supports direct load/store with calculated offsets

LoongArch64 Specifics

  • Uses general-purpose register $r21 by convention
  • Native instruction support with lu12i.w/ori for address formation
  • Indexed load/store instructions for efficient per-CPU access
  • Full 32-bit offset support for large per-CPU areas

Sources: percpu/src/imp.rs(L94 - L156)  percpu_macros/src/arch.rs(L21 - L263)  README.md(L28 - L35)