Architecture and Design
Relevant source files
This document provides a comprehensive overview of the percpu crate's system architecture, including memory layout strategies, cross-platform abstraction mechanisms, and the compile-time code generation pipeline. It focuses on the core design principles that enable efficient per-CPU data management across multiple architectures while maintaining a unified programming interface.
For implementation-specific details of individual architectures, see Architecture-Specific Code Generation. For basic usage patterns and examples, see Getting Started.
System Architecture Overview
The percpu system is built around a dual-crate architecture that separates compile-time code generation from runtime memory management. This design enables both high-performance per-CPU data access and flexible cross-platform support.
Core Architecture
flowchart TD
subgraph subGraph4["Hardware Registers"]
X86_GS["x86_64: GS_BASE"]
ARM_TPIDR["AArch64: TPIDR_ELx"]
RISCV_GP["RISC-V: gp"]
LOONG_R21["LoongArch: $r21"]
end
subgraph subGraph3["Memory Layout"]
PERCPU_SECTION[".percpu section"]
AREA_0["CPU 0 Data Area"]
AREA_N["CPU N Data Area"]
end
subgraph subGraph2["percpu Runtime Crate"]
INIT_FUNC["init()"]
AREA_BASE["percpu_area_base()"]
REG_FUNCS["read_percpu_reg()/write_percpu_reg()"]
INIT_REG["init_percpu_reg()"]
end
subgraph subGraph1["percpu_macros Crate"]
DEF_PERCPU["def_percpu macro"]
ARCH_GEN["arch::gen_current_ptr()"]
SYMBOL_OFFSET["percpu_symbol_offset!"]
end
subgraph subGraph0["User Code Layer"]
USER_VAR["#[def_percpu] static VAR: T"]
USER_ACCESS["VAR.read_current()"]
USER_INIT["percpu::init()"]
end
ARCH_GEN --> REG_FUNCS
DEF_PERCPU --> PERCPU_SECTION
INIT_FUNC --> AREA_0
INIT_FUNC --> AREA_BASE
INIT_FUNC --> AREA_N
INIT_REG --> ARM_TPIDR
INIT_REG --> LOONG_R21
INIT_REG --> RISCV_GP
INIT_REG --> X86_GS
REG_FUNCS --> ARM_TPIDR
REG_FUNCS --> LOONG_R21
REG_FUNCS --> RISCV_GP
REG_FUNCS --> X86_GS
SYMBOL_OFFSET --> AREA_BASE
USER_ACCESS --> ARCH_GEN
USER_INIT --> INIT_FUNC
USER_VAR --> DEF_PERCPU
Sources: README.md(L9 - L17) percpu/src/imp.rs(L1 - L179) percpu_macros/src/arch.rs(L1 - L264)
Component Responsibilities
| Component | Primary Responsibility | Key Functions |
|---|---|---|
| percpu_macros | Compile-time code generation | def_percpu,gen_current_ptr,gen_read_current_raw |
| percpuruntime | Memory management and register access | init,percpu_area_base,read_percpu_reg |
| Linker integration | Memory layout definition | _percpu_start,_percpu_end,.percpusection |
| Architecture abstraction | Platform-specific register handling | write_percpu_reg,init_percpu_reg |
Sources: percpu/src/imp.rs(L46 - L86) percpu_macros/src/arch.rs(L54 - L88)
Memory Layout and Initialization Architecture
The percpu system implements a template-based memory layout where a single .percpu section in the binary serves as a template that gets replicated for each CPU at runtime.
Memory Organization
flowchart TD
subgraph subGraph5["Initialization Process"]
INIT_START["init()"]
SIZE_CALC["percpu_area_size()"]
NUM_CALC["percpu_area_num()"]
COPY_LOOP["copy_nonoverlapping loop"]
end
subgraph subGraph4["Runtime Memory Areas"]
BASE_CALC["percpu_area_base(cpu_id)"]
subgraph subGraph3["CPU N Area"]
CPUN_BASE["Base: _percpu_start + N * align_up_64(size)"]
CPUN_VAR1["Variable 1 (copy)"]
CPUN_VAR2["Variable 2 (copy)"]
end
subgraph subGraph2["CPU 1 Area"]
CPU1_BASE["Base: _percpu_start + align_up_64(size)"]
CPU1_VAR1["Variable 1 (copy)"]
CPU1_VAR2["Variable 2 (copy)"]
end
subgraph subGraph1["CPU 0 Area"]
CPU0_BASE["Base: _percpu_start"]
CPU0_VAR1["Variable 1"]
CPU0_VAR2["Variable 2"]
end
end
subgraph subGraph0["Binary Layout"]
PERCPU_TEMPLATE[".percpu Section Template"]
LOAD_START["_percpu_load_start"]
LOAD_END["_percpu_load_end"]
end
BASE_CALC --> CPU0_BASE
BASE_CALC --> CPU1_BASE
BASE_CALC --> CPUN_BASE
COPY_LOOP --> CPU1_VAR1
COPY_LOOP --> CPU1_VAR2
COPY_LOOP --> CPUN_VAR1
COPY_LOOP --> CPUN_VAR2
INIT_START --> COPY_LOOP
INIT_START --> NUM_CALC
INIT_START --> SIZE_CALC
PERCPU_TEMPLATE --> CPU0_VAR1
PERCPU_TEMPLATE --> CPU0_VAR2
Sources: percpu/src/imp.rs(L46 - L86) percpu/src/imp.rs(L21 - L44) README.md(L54 - L67)
Initialization Sequence
The initialization process follows a specific sequence to set up per-CPU memory areas:
- Size Calculation: The
percpu_area_size()function calculates the template size using linker symbols percpu/src/imp.rs(L25 - L30) - Area Allocation:
percpu_area_num()determines how many CPU areas can fit in the reserved space percpu/src/imp.rs(L21 - L23) - Template Copying: The
init()function copies the template data to each CPU's area percpu/src/imp.rs(L76 - L84) - Alignment: Each area is aligned to 64-byte boundaries using
align_up_64()percpu/src/imp.rs(L5 - L8)
Sources: percpu/src/imp.rs(L46 - L86)
Cross-Platform Register Abstraction
The system abstracts different CPU architectures' per-CPU register mechanisms through a unified interface while generating architecture-specific assembly code.
Register Mapping Strategy
flowchart TD
subgraph subGraph3["Unified Interface"]
READ_REG["read_percpu_reg()"]
WRITE_REG["write_percpu_reg()"]
INIT_REG["init_percpu_reg()"]
end
subgraph subGraph2["Access Pattern Generation"]
X86_ASM["mov gs:[offset VAR]"]
ARM_ASM["mrs reg, TPIDR_ELx"]
RISCV_ASM["mv reg, gp + offset"]
LOONG_ASM["move reg, $r21 + offset"]
end
subgraph subGraph1["Register Assignment"]
X86_MAPPING["x86_64 → GS_BASE (MSR)"]
ARM_MAPPING["aarch64 → TPIDR_EL1/EL2"]
RISCV_MAPPING["riscv → gp register"]
LOONG_MAPPING["loongarch64 → $r21"]
end
subgraph subGraph0["Architecture Detection"]
TARGET_ARCH["cfg!(target_arch)"]
FEATURE_FLAGS["cfg!(feature)"]
end
ARM_ASM --> READ_REG
ARM_MAPPING --> ARM_ASM
FEATURE_FLAGS --> ARM_MAPPING
LOONG_ASM --> READ_REG
LOONG_MAPPING --> LOONG_ASM
READ_REG --> WRITE_REG
RISCV_ASM --> READ_REG
RISCV_MAPPING --> RISCV_ASM
TARGET_ARCH --> ARM_MAPPING
TARGET_ARCH --> LOONG_MAPPING
TARGET_ARCH --> RISCV_MAPPING
TARGET_ARCH --> X86_MAPPING
WRITE_REG --> INIT_REG
X86_ASM --> READ_REG
X86_MAPPING --> X86_ASM
Sources: percpu/src/imp.rs(L91 - L168) percpu_macros/src/arch.rs(L54 - L88)
Platform-Specific Implementation Details
| Architecture | Register | Assembly Pattern | Special Considerations |
|---|---|---|---|
| x86_64 | GS_BASE | mov gs:[offset VAR] | MSR access,SELF_PTRindirection |
| AArch64 | TPIDR_EL1/EL2 | mrs reg, TPIDR_ELx | EL1/EL2 mode detection viaarm-el2feature |
| RISC-V | gp | mv reg, gp | Usesgpinstead oftpregister |
| LoongArch | $r21 | move reg, $r21 | Direct register access |
Sources: README.md(L19 - L36) percpu/src/imp.rs(L94 - L156)
Code Generation Pipeline Architecture
The macro expansion system transforms high-level per-CPU variable definitions into optimized, architecture-specific access code through a multi-stage pipeline.
Macro Expansion Workflow
flowchart TD
subgraph subGraph4["Assembly Output"]
X86_OUTPUT["x86: mov gs:[offset VAR]"]
ARM_OUTPUT["arm: mrs + movz"]
RISCV_OUTPUT["riscv: lui + addi"]
LOONG_OUTPUT["loong: lu12i.w + ori"]
end
subgraph subGraph3["Architecture Code Gen"]
GEN_OFFSET["gen_offset()"]
GEN_CURRENT_PTR["gen_current_ptr()"]
GEN_READ_RAW["gen_read_current_raw()"]
GEN_WRITE_RAW["gen_write_current_raw()"]
end
subgraph subGraph2["Method Generation"]
OFFSET_METHOD["offset() -> usize"]
CURRENT_PTR["current_ptr() -> *const T"]
READ_CURRENT["read_current() -> T"]
WRITE_CURRENT["write_current(val: T)"]
REMOTE_ACCESS["remote_ptr(), remote_ref()"]
end
subgraph subGraph1["Symbol Generation"]
INNER_SYMBOL["__PERCPU_VAR"]
WRAPPER_TYPE["VAR_WRAPPER"]
PUBLIC_STATIC["VAR: VAR_WRAPPER"]
end
subgraph subGraph0["Input Processing"]
USER_DEF["#[def_percpu] static VAR: T = init"]
SYN_PARSE["syn::parse_macro_input"]
EXTRACT["Extract: name, type, initializer"]
end
CURRENT_PTR --> GEN_CURRENT_PTR
EXTRACT --> INNER_SYMBOL
EXTRACT --> PUBLIC_STATIC
EXTRACT --> WRAPPER_TYPE
GEN_CURRENT_PTR --> X86_OUTPUT
GEN_OFFSET --> ARM_OUTPUT
GEN_OFFSET --> LOONG_OUTPUT
GEN_OFFSET --> RISCV_OUTPUT
GEN_OFFSET --> X86_OUTPUT
GEN_READ_RAW --> X86_OUTPUT
GEN_WRITE_RAW --> X86_OUTPUT
OFFSET_METHOD --> GEN_OFFSET
READ_CURRENT --> GEN_READ_RAW
SYN_PARSE --> EXTRACT
USER_DEF --> SYN_PARSE
WRAPPER_TYPE --> CURRENT_PTR
WRAPPER_TYPE --> OFFSET_METHOD
WRAPPER_TYPE --> READ_CURRENT
WRAPPER_TYPE --> REMOTE_ACCESS
WRAPPER_TYPE --> WRITE_CURRENT
WRITE_CURRENT --> GEN_WRITE_RAW
Sources: percpu_macros/src/arch.rs(L15 - L50) percpu_macros/src/arch.rs(L90 - L181) percpu_macros/src/arch.rs(L183 - L263)
Generated Code Structure
For each #[def_percpu] declaration, the macro generates a complete wrapper structure:
// Generated inner symbol (placed in .percpu section)
#[link_section = ".percpu"]
static __PERCPU_VAR: T = init;
// Generated wrapper type with access methods
struct VAR_WRAPPER;
impl VAR_WRAPPER {
fn offset(&self) -> usize { /* architecture-specific assembly */ }
fn current_ptr(&self) -> *const T { /* architecture-specific assembly */ }
fn read_current(&self) -> T { /* optimized direct access */ }
fn write_current(&self, val: T) { /* optimized direct access */ }
// ... additional methods
}
// Public interface
static VAR: VAR_WRAPPER = VAR_WRAPPER;
Sources: percpu_macros/src/arch.rs(L54 - L88) percpu_macros/src/arch.rs(L90 - L181)
Optimization Strategies
The code generation pipeline implements several optimization strategies:
- Direct Assembly Generation: For primitive types, direct assembly instructions bypass pointer indirection percpu_macros/src/arch.rs(L90 - L181)
- Architecture-Specific Instruction Selection: Each platform uses optimal instruction sequences percpu_macros/src/arch.rs(L94 - L181)
- Register Constraint Optimization: Assembly constraints are tailored to each architecture's capabilities percpu_macros/src/arch.rs(L131 - L150)
- Compile-Time Offset Calculation: Variable offsets are resolved at compile time using linker symbols percpu_macros/src/arch.rs(L15 - L50)
Sources: percpu_macros/src/arch.rs(L90 - L263)
Feature Configuration Architecture
The system supports multiple operational modes through Cargo feature flags that modify both compile-time code generation and runtime behavior.
Feature Flag Impact Matrix
| Feature | Code Generation Changes | Runtime Changes | Use Case |
|---|---|---|---|
| sp-naive | Global variables instead of per-CPU | No register usage | Single-core systems |
| preempt | NoPreemptGuardintegration | Preemption disable/enable | Preemptible kernels |
| arm-el2 | TPIDR_EL2instead ofTPIDR_EL1 | Hypervisor register access | ARM hypervisors |
Sources: README.md(L69 - L79) percpu_macros/src/arch.rs(L55 - L61)
This architecture enables the percpu system to maintain high performance across diverse deployment scenarios while providing a consistent programming interface that abstracts away platform-specific complexities.