Memory Layout and Initialization
Relevant source files
This document explains the per-CPU memory layout structure, initialization process, and linker script integration in the percpu crate. It covers how per-CPU data areas are organized in memory, the template-based initialization process, and the relationship between linker script symbols and runtime address calculations.
For architecture-specific register management details, see Cross-Platform Abstraction. For low-level memory management internals, see Memory Management Internals.
Memory Layout Structure
The percpu system organizes per-CPU data using a template-based approach where all per-CPU variables are first collected into a single template area, then replicated for each CPU with proper alignment.
Per-CPU Area Organization
flowchart TD
subgraph subGraph2["Runtime Functions"]
AREA_SIZE["percpu_area_size()"]
AREA_NUM["percpu_area_num()"]
AREA_BASE["percpu_area_base(cpu_id)"]
ALIGN_UP["align_up_64()"]
end
subgraph subGraph1["Memory Layout"]
TEMPLATE["Template Area (CPU 0)Size: percpu_area_size()"]
CPU1_AREA["CPU 1 Area64-byte aligned"]
CPU2_AREA["CPU 2 Area64-byte aligned"]
CPUN_AREA["CPU N Area64-byte aligned"]
end
subgraph subGraph0["Linker Script Symbols"]
START["_percpu_start"]
END["_percpu_end"]
LOAD_START["_percpu_load_start"]
LOAD_END["_percpu_load_end"]
end
ALIGN_UP --> CPU1_AREA
AREA_BASE --> CPU1_AREA
AREA_NUM --> END
AREA_SIZE --> TEMPLATE
CPU1_AREA --> CPU2_AREA
CPU2_AREA --> CPUN_AREA
END --> CPUN_AREA
LOAD_END --> TEMPLATE
LOAD_START --> TEMPLATE
START --> TEMPLATE
TEMPLATE --> CPU1_AREA
The memory layout uses several key components:
| Component | Purpose | Implementation |
|---|---|---|
| Template Area | Contains initial values for all per-CPU variables | Defined by.percpusection content |
| Per-CPU Areas | Individual copies for each CPU | Created byinit()function |
| 64-byte Alignment | Cache line optimization | align_up_64()function |
| Address Calculation | Runtime pointer arithmetic | percpu_area_base()function |
Sources: percpu/src/imp.rs(L5 - L44) percpu/test_percpu.x(L1 - L17) README.md(L54 - L67)
Address Calculation Functions
The system provides several functions for calculating memory layout parameters:
flowchart TD
subgraph Calculations["Calculations"]
CALC1["size = load_end - load_start"]
CALC2["num = (end - start) / align_up_64(size)"]
CALC3["base = start + cpu_id * align_up_64(size)"]
end
subgraph subGraph1["Linker Symbols"]
SYMBOLS["_percpu_start_percpu_end_percpu_load_startUnsupported markdown: br _percpu_load_end"]
end
subgraph subGraph0["Core Functions"]
SIZE["percpu_area_size()Returns template size"]
NUM["percpu_area_num()Returns CPU count"]
BASE["percpu_area_base(cpu_id)Returns CPU area address"]
ALIGN["align_up_64(val)64-byte alignment"]
end
ALIGN --> CALC2
ALIGN --> CALC3
BASE --> CALC3
NUM --> CALC2
SIZE --> CALC1
SYMBOLS --> BASE
SYMBOLS --> NUM
SYMBOLS --> SIZE
Sources: percpu/src/imp.rs(L20 - L44)
Initialization Process
The initialization process occurs in two phases: global area setup via init() and per-CPU register configuration via init_percpu_reg().
Global Initialization Flow
flowchart TD
subgraph subGraph0["init() Function Flow"]
START_INIT["init() called"]
CHECK_INIT["Check IS_INIT atomic flag"]
ALREADY_INIT["Already initialized?"]
RETURN_ZERO["Return 0"]
PLATFORM_CHECK["target_os == linux?"]
ALLOC_LINUX["Allocate memory with std::alloc"]
GET_PARAMS["Get base, size, num parameters"]
SET_BASE["Set PERCPU_AREA_BASE"]
COPY_LOOP["For each CPU 1..num"]
COPY_DATA["copy_nonoverlapping(base, secondary_base, size)"]
RETURN_NUM["Return number of areas"]
end
ALLOC_LINUX --> SET_BASE
ALREADY_INIT --> PLATFORM_CHECK
ALREADY_INIT --> RETURN_ZERO
CHECK_INIT --> ALREADY_INIT
COPY_DATA --> RETURN_NUM
COPY_LOOP --> COPY_DATA
GET_PARAMS --> COPY_LOOP
PLATFORM_CHECK --> ALLOC_LINUX
PLATFORM_CHECK --> GET_PARAMS
SET_BASE --> GET_PARAMS
START_INIT --> CHECK_INIT
The init() function performs these key operations:
- Initialization Guard: Uses
IS_INITatomic boolean to prevent multiple initialization percpu/src/imp.rs(L58 - L63) - Platform-Specific Allocation: On Linux, allocates memory dynamically; on bare metal, uses linker-provided memory percpu/src/imp.rs(L65 - L71)
- Template Replication: Copies CPU 0's template data to all other CPU areas percpu/src/imp.rs(L76 - L84)
Sources: percpu/src/imp.rs(L46 - L86)
Per-CPU Register Initialization
flowchart TD
subgraph subGraph1["Architecture-Specific Registers"]
X86_REG["x86_64: GS_BASE via MSR or syscall"]
ARM_REG["aarch64: TPIDR_EL1/EL2 via msr"]
RISCV_REG["riscv: gp register via mv"]
LOONG_REG["loongarch64: $r21 via move"]
end
subgraph init_percpu_reg(cpu_id)["init_percpu_reg(cpu_id)"]
CALC_BASE["percpu_area_base(cpu_id)"]
WRITE_REG["write_percpu_reg(tp)"]
end
CALC_BASE --> WRITE_REG
WRITE_REG --> ARM_REG
WRITE_REG --> LOONG_REG
WRITE_REG --> RISCV_REG
WRITE_REG --> X86_REG
The init_percpu_reg() function configures the architecture-specific register to point to the appropriate per-CPU area base address.
Sources: percpu/src/imp.rs(L158 - L168) percpu/src/imp.rs(L119 - L156)
Linker Script Integration
The percpu system requires specific linker script modifications to reserve memory for per-CPU areas and define necessary symbols.
Required Linker Script Structure
The linker script must define a .percpu section with specific symbols and layout:
. = ALIGN(4K);
_percpu_start = .;
_percpu_end = _percpu_start + SIZEOF(.percpu);
.percpu 0x0 (NOLOAD) : AT(_percpu_start) {
_percpu_load_start = .;
*(.percpu .percpu.*)
_percpu_load_end = .;
. = _percpu_load_start + ALIGN(64) * CPU_NUM;
}
. = _percpu_end;
Symbol Relationships
flowchart TD
subgraph subGraph2["Memory Sections"]
TEMPLATE_SEC[".percpu section contentAll per-CPU variables"]
RESERVED_SPACE["Reserved spaceALIGN(64) * CPU_NUM"]
end
subgraph subGraph1["Runtime Usage"]
AREA_SIZE_CALC["percpu_area_size()= load_end - load_start"]
AREA_NUM_CALC["percpu_area_num()= (end - start) / align_up_64(size)"]
BASE_CALC["percpu_area_base(cpu_id)= start + cpu_id * align_up_64(size)"]
end
subgraph subGraph0["Linker Script Symbols"]
PERCPU_START["_percpu_startPhysical memory start"]
PERCPU_END["_percpu_endPhysical memory end"]
LOAD_START["_percpu_load_startTemplate data start"]
LOAD_END["_percpu_load_endTemplate data end"]
end
LOAD_END --> AREA_SIZE_CALC
LOAD_START --> AREA_SIZE_CALC
PERCPU_END --> AREA_NUM_CALC
PERCPU_START --> AREA_NUM_CALC
PERCPU_START --> BASE_CALC
RESERVED_SPACE --> PERCPU_END
TEMPLATE_SEC --> LOAD_END
TEMPLATE_SEC --> LOAD_START
Key linker script requirements:
| Symbol | Purpose | Usage in Runtime |
|---|---|---|
| _percpu_start | Base address of all per-CPU areas | percpu_area_base()calculations |
| _percpu_end | End of reserved per-CPU memory | Area count calculations |
| _percpu_load_start | Start of template data | Template size calculations |
| _percpu_load_end | End of template data | Template size calculations |
Sources: percpu/test_percpu.x(L1 - L17) README.md(L54 - L67) percpu/src/imp.rs(L13 - L18)
Platform-Specific Considerations
The initialization process varies based on the target platform:
Bare Metal (target_os = "none")
- Uses linker-provided memory directly via
_percpu_startsymbol - Memory layout is fixed at compile time
- No dynamic allocation required
Linux Userspace (target_os = "linux")
- Dynamically allocates memory using
std::alloc::alloc() - Stores base address in
PERCPU_AREA_BASEstatic variable - Uses
Oncesynchronization for thread-safe initialization
Memory Alignment Strategy
The system uses 64-byte alignment for performance optimization:
flowchart TD
subgraph Usage["Usage"]
AREA_SPACING["Per-CPU area spacing"]
CACHE_OPT["Cache line optimization"]
end
subgraph subGraph0["Alignment Function"]
INPUT["Input: val (area size)"]
CONST["SIZE_64BIT = 0x40"]
CALC["(val + 0x3f) & !0x3f"]
OUTPUT["Output: 64-byte aligned size"]
end
CALC --> OUTPUT
CONST --> CALC
INPUT --> CALC
OUTPUT --> AREA_SPACING
OUTPUT --> CACHE_OPT
The align_up_64() function ensures each per-CPU area starts on a 64-byte boundary to optimize cache performance and prevent false sharing between CPUs.
Sources: percpu/src/imp.rs(L5 - L8) percpu/src/imp.rs(L36 - L44) percpu/src/imp.rs(L65 - L71)