Architecture-Specific Code Generation
Relevant source files
This document covers the architecture-specific assembly code generation system used by the percpu_macros crate. The code generation pipeline transforms high-level per-CPU variable access patterns into optimized assembly instructions tailored for each supported CPU architecture.
For information about the macro expansion pipeline and user-facing API, see Code Generation Pipeline. For details about the naive single-CPU implementation, see Naive Implementation.
Overview
The architecture-specific code generation system consists of four main code generation functions that produce inline assembly blocks optimized for each target architecture. These functions are called during macro expansion to generate efficient per-CPU data access patterns.
Sources: percpu_macros/src/arch.rs(L16 - L50) percpu_macros/src/arch.rs(L54 - L88) percpu_macros/src/arch.rs(L94 - L181) percpu_macros/src/arch.rs(L187 - L263)
Offset Calculation Generation
The gen_offset function generates architecture-specific assembly code to calculate the offset of a per-CPU variable within the .percpu section. This offset is used for both local and remote CPU access patterns.
| Architecture | Instruction Pattern | Offset Limit | Register Usage |
|---|---|---|---|
| x86_64 | mov {0:e}, offset VAR | ≤ 0xffff_ffff | 32-bit register |
| AArch64 | movz {0}, #:abs_g0_nc:VAR | ≤ 0xffff | 64-bit register |
| RISC-V | lui {0}, %hi(VAR)+addi {0}, {0}, %lo(VAR) | ≤ 0xffff_ffff | 64-bit register |
| LoongArch64 | lu12i.w {0}, %abs_hi20(VAR)+ori {0}, {0}, %abs_lo12(VAR) | ≤ 0xffff_ffff | 64-bit register |
flowchart TD
subgraph subGraph4["LoongArch Generation"]
LA_INST1["lu12i.w {0}, %abs_hi20({VAR})"]
LA_INST2["ori {0}, {0}, %abs_lo12({VAR})"]
LA_OUT["out(reg) value"]
end
subgraph subGraph3["RISC-V Generation"]
RV_INST1["lui {0}, %hi({VAR})"]
RV_INST2["addi {0}, {0}, %lo({VAR})"]
RV_OUT["out(reg) value"]
end
subgraph subGraph2["AArch64 Generation"]
ARM_INST["movz {0}, #:abs_g0_nc:{VAR}"]
ARM_OUT["out(reg) value"]
end
subgraph subGraph1["x86_64 Generation"]
X86_INST["mov {0:e}, offset {VAR}"]
X86_OUT["out(reg) value"]
end
subgraph subGraph0["Symbol Processing"]
SYMBOL["symbol: &Ident"]
VAR["VAR = sym #symbol"]
end
ARM_INST --> ARM_OUT
LA_INST1 --> LA_INST2
LA_INST2 --> LA_OUT
RV_INST1 --> RV_INST2
RV_INST2 --> RV_OUT
SYMBOL --> VAR
VAR --> ARM_INST
VAR --> LA_INST1
VAR --> RV_INST1
VAR --> X86_INST
X86_INST --> X86_OUT
Sources: percpu_macros/src/arch.rs(L16 - L50)
Current CPU Pointer Generation
The gen_current_ptr function generates code to obtain a pointer to a per-CPU variable on the currently executing CPU. Each architecture uses a different approach based on available per-CPU base pointer registers.
Architecture-Specific Register Usage
The x86_64 architecture uses a special approach where the per-CPU base address is stored in the GS segment at a fixed offset (__PERCPU_SELF_PTR), allowing direct addressing with a single instruction that combines base retrieval and offset addition.
Sources: percpu_macros/src/arch.rs(L54 - L88) percpu_macros/src/arch.rs(L55 - L62)
Optimized Read Operations
The gen_read_current_raw function generates type-specific optimized read operations for primitive integer types. This avoids the overhead of pointer dereferencing for simple data types.
Type-Specific Assembly Generation
| Type | x86_64 Instruction | RISC-V Instruction | LoongArch Instruction |
|---|---|---|---|
| bool,u8 | mov byte ptr gs:[offset VAR] | lbu | ldx.bu |
| u16 | mov word ptr gs:[offset VAR] | lhu | ldx.hu |
| u32 | mov dword ptr gs:[offset VAR] | lwu | ldx.wu |
| u64,usize | mov qword ptr gs:[offset VAR] | ld | ldx.d |
The boolean type receives special handling by reading as u8 and converting the result to boolean through a != 0 comparison.
Sources: percpu_macros/src/arch.rs(L94 - L181) percpu_macros/src/arch.rs(L96 - L102) percpu_macros/src/arch.rs(L114 - L129) percpu_macros/src/arch.rs(L131 - L150)
Optimized Write Operations
The gen_write_current_raw function generates type-specific optimized write operations that directly store values to per-CPU variables without intermediate pointer operations.
Write Instruction Mapping
flowchart TD
subgraph subGraph4["Store Instructions"]
subgraph LoongArch["LoongArch"]
LA_STB["stx.b val, addr, $r21"]
LA_STH["stx.h val, addr, $r21"]
LA_STW["stx.w val, addr, $r21"]
LA_STD["stx.d val, addr, $r21"]
end
subgraph RISC-V["RISC-V"]
RV_SB["sb val, %lo(VAR)(addr)"]
RV_SH["sh val, %lo(VAR)(addr)"]
RV_SW["sw val, %lo(VAR)(addr)"]
RV_SD["sd val, %lo(VAR)(addr)"]
end
subgraph x86_64["x86_64"]
X86_BYTE["mov byte ptr gs:[offset VAR], val"]
X86_WORD["mov word ptr gs:[offset VAR], val"]
X86_DWORD["mov dword ptr gs:[offset VAR], val"]
X86_QWORD["mov qword ptr gs:[offset VAR], val"]
end
end
subgraph subGraph0["Input Processing"]
VAL["val: &Ident"]
TYPE["ty: &Type"]
FIXUP["bool -> u8 conversion"]
end
FIXUP --> LA_STB
FIXUP --> LA_STD
FIXUP --> LA_STH
FIXUP --> LA_STW
FIXUP --> RV_SB
FIXUP --> RV_SD
FIXUP --> RV_SH
FIXUP --> RV_SW
FIXUP --> X86_BYTE
FIXUP --> X86_DWORD
FIXUP --> X86_QWORD
FIXUP --> X86_WORD
TYPE --> FIXUP
VAL --> FIXUP
Sources: percpu_macros/src/arch.rs(L187 - L263) percpu_macros/src/arch.rs(L195 - L211) percpu_macros/src/arch.rs(L214 - L230) percpu_macros/src/arch.rs(L232 - L251)
Platform Compatibility Layer
The code generation system includes a compatibility layer for platforms that don't support inline assembly or per-CPU mechanisms. The macos_unimplemented function wraps generated assembly with conditional compilation directives.
| Platform | Behavior | Fallback |
|---|---|---|
| Non-macOS | Full assembly implementation | N/A |
| macOS | Compile-time unimplemented panic | Pointer-based access |
| Unsupported architectures | Fallback to pointer dereferencing | *self.current_ptr() |
Sources: percpu_macros/src/arch.rs(L4 - L13) percpu_macros/src/arch.rs(L64) percpu_macros/src/arch.rs(L171) percpu_macros/src/arch.rs(L253)
Integration with Macro System
The architecture-specific code generation functions are called from the main def_percpu macro implementation, which determines whether to generate optimized assembly based on the variable type and enabled features.
Sources: percpu_macros/src/lib.rs(L59 - L60) percpu_macros/src/lib.rs(L92) percpu_macros/src/lib.rs(L101 - L105) percpu_macros/src/lib.rs(L147 - L148)