Introduction
This document provides an in-depth analysis of the QNX Neutrino RTOS startup code for the Raspberry Pi 4 board, based on the Broadcom BCM2711 SoC. The startup program is responsible for initializing the hardware, setting up the system page, and transferring control to the QNX kernel.
Key Facts:
- Target Platform: Raspberry Pi 4 Model B, Compute Module 4, Pi 400
- SoC: Broadcom BCM2711 (Quad-core Cortex-A72)
- Architecture: ARMv8-A (AArch64)
- QNX Version: SDP 8.0
Information Sources
This document is based on:
- Direct Code Analysis: Primary source - actual QNX startup code files
- Code Comments: Inline documentation from BlackBerry/QNX engineers
- Raspberry Pi Architecture: Public documentation about RPi boot process and VideoCore GPU
- BCM2711 Documentation: Broadcom peripheral specifications
- ARM Architecture: Cortex-A72 and GICv2 technical reference manuals
Note on GPU/VideoCore Boot Sequence: The detailed boot flow (GPU boots first, loads firmware, then releases ARM cores) is inferred from:
- Mailbox interface code (
mbox.c) showing ARM-to-GPU communication - Code comments referencing “VPU-inserted arguments” and “VideoCore memory”
- Memory management code explicitly handling GPU RAM allocation
- Minimal
_start.Sentry point (hardware already initialized by GPU) - Public Raspberry Pi documentation about bootcode.bin → start.elf → kernel loading
Boot Sequence Overview
Source: This sequence is derived from code analysis and Raspberry Pi architecture documentation.
The boot sequence follows this flow:
GPU Bootloader (VideoCore)
↓
ARM Stub / U-Boot (loads startup)
↓
_start.S (Entry Point)
↓
cstart (Common ARM64 startup)
↓
main() (Board-specific initialization)
↓
├→ init_board() - GPIO/Pinmux
├→ bcm2711_init_raminfo() - Memory Map
├→ init_smp() - Multi-core
├→ init_intrinfo() - Interrupts (GIC)
├→ init_hwinfo() - Hardware Database
├→ init_qtime() - System Timer
├→ init_cacheattr() - Cache Configuration
└→ init_system_private() - Load IFS
↓
QNX Kernel (procnto-smp-instr)
1. Entry Point (_start.S)
File: src/hardware/startup/boards/bcm2711/_start.S
Purpose
The assembly entry point that receives control from the bootloader and preserves critical boot parameters before jumping to the C runtime initialization.
Code Analysis
.text
.align 2
.global _start
_start:
/* NOTE:
Do NOT modify registers X0-X3 before returning on the bootstrap
processor.
These registers may contain information provided by the IPL and
cstart will save them in the boot_regs variable for later perusal
by other portions of startup.
*/
b cstart
Key Points
- Register Preservation: X0-X3 contain boot parameters:
- X0: Device Tree Blob (DTB) address or machine type
- X1: Reserved (typically 0)
- X2: ATAGS or DTB address (legacy)
- X3: Reserved
- Minimal Setup: Unlike many embedded systems, this entry point is intentionally minimal because:
- The GPU’s bootloader has already initialized DRAM (inferred from minimal DRAM setup code)
- ARM stub code has set up basic CPU state (EL1/EL2) (Raspberry Pi boot architecture)
- No need for early MMU or cache setup at this stage (evident from code structure)
- Direct Jump: Transfers control immediately to
cstart, which is part of the common ARM64 startup library.
2. Main Initialization (main.c)
File: src/hardware/startup/boards/bcm2711/main.c
Command Line Processing
The Raspberry Pi 4 uses a unique boot architecture where the VideoCore GPU is the primary bootloader. It can pass command-line arguments through the mailbox interface:
void cpu_tweak_cmdline(struct bootargs_entry *bap, const char *name) {
const static char *next_arg;
if (!next_arg) { // only do this once
char *str = mbox_get_cmdline();
if (str == NULL) {
return;
}
// skip vpu-inserted arguments up to dash or double-space
do {
next_arg = find_next_arg(str, &str, 0);
if (next_arg[0] == '\0') break;
if (next_arg[0] == '-') break;
if ((next_arg[0] == ' ') && (next_arg[1] == ' ')) break;
} while (1);
for (;;) {
if (next_arg[0] == '\0') break;
if ((next_arg[0] == '-') && (next_arg[1] == '-')) break;
bootstrap_arg_adjust(bap, NULL, next_arg);
next_arg = find_next_arg(str, &str, 1);
}
}
}
Analysis:
- The GPU firmware can insert boot parameters in
config.txtorcmdline.txt(Raspberry Pi boot configuration) - Arguments before
--are for the GPU, after--are for the OS (from code comment: “skip vpu-inserted arguments”) - This function filters GPU-specific args before passing to QNX (evident from parsing logic)
Debug Device Configuration
QNX startup supports multiple debug serial ports:
const struct debug_device debug_devices[] = {
{ .name = "miniuart",
.defaults[DEBUG_DEV_CONSOLE] = "fe215000^0.115200", /* uart1 */
.init = init_miniuart,
.put = put_miniuart,
.callouts[DEBUG_DISPLAY_CHAR] = &display_char_miniuart,
.callouts[DEBUG_POLL_KEY] = &poll_key_miniuart,
.callouts[DEBUG_BREAK_DETECT] = &break_detect_miniuart,
},
{ .name = "pl011-3",
.defaults[DEBUG_DEV_CONSOLE] = "fe201600^0.115200.48000000", /* uart 3*/
.init = init_pl011,
.put = put_pl011,
.callouts[DEBUG_DISPLAY_CHAR] = &display_char_pl011,
.callouts[DEBUG_POLL_KEY] = &poll_key_pl011,
.callouts[DEBUG_BREAK_DETECT] = &break_detect_pl011,
},
};
Two UART Options:
- Mini UART (UART1): Simpler, tied to VPU core clock (variable frequency)
- PL011 (UART3): Full-featured ARM PrimeCell UART, fixed 48MHz clock
Main Function Flow
int main(const int argc, char ** const argv)
{
int opt, options = 0;
int wdt_timeout = BCM2711_WDT_USE_DEFAULT;
uint32_t board_rev;
// 1. Add callout array (reboot mechanism)
const struct callout_slot callouts[] = {
{ CALLOUT_SLOT( reboot, _bcm2711 ) },
};
add_callout_array(callouts, sizeof callouts);
// 2. Initialize board (GPIO/pinmux)
init_board();
// 3. Select debug device
select_debug(debug_devices, sizeof(debug_devices));
// 4. Parse command line options
while ((opt = getopt(argc, argv, COMMON_OPTIONS_STRING "W:")) != -1) {
switch (opt) {
case 'W':
options |= BCM2711_WDT_ENABLE;
if (optarg) {
wdt_timeout = (int) strtol(optarg, &optarg, 0);
}
break;
default:
handle_common_option(opt);
break;
}
}
// 5. Enable watchdog if requested
if (options & BCM2711_WDT_ENABLE) {
bcm2711_wdt_enable(wdt_timeout);
}
// 6. Re-initialize debug device if "-D" option specified
select_debug(debug_devices, sizeof(debug_devices));
// 7. Get CPU frequency from VideoCore
cpu_freq = mbox_get_clock_rate(MBOX_CLK_ARM);
cycles_freq = cpu_freq;
timer_freq = 0;
// 8. Setup RAM
bcm2711_init_raminfo();
alloc_ram(shdr->ram_paddr, shdr->ram_size, 1);
// 9. Enable Hypervisor if requested
hypervisor_init(0);
// 10. Initialize SMP
init_smp();
// 11. Debug info (if verbose)
if (debug_flag > 3) {
kprintf("%u CPUs @ %uMHz\n", lsp.syspage.p->num_cpu, cpu_freq / 1000000);
kprintf("Temps %d\n", mbox_get_temperature(0));
// ... more debug output
}
// 12. Initialize MMU if virtual mode
if (shdr->flags1 & STARTUP_HDR_FLAGS1_VIRTUAL) {
init_mmu();
}
// 13. Initialize interrupt system
init_intrinfo();
// 14. Initialize system timer
init_qtime();
// 15. Initialize cache attributes
init_cacheattr();
// 16. Initialize CPU info
init_cpuinfo();
// 17. Initialize hardware database
init_hwinfo();
// 18. Add board identification strings
board_rev = mbox_get_board_revision();
add_typed_string(_CS_MACHINE, get_board_name(board_rev));
add_typed_string(_CS_HW_PROVIDER, get_mfr_name(board_rev));
add_typed_string(_CS_ARCHITECTURE, get_soc_name(board_rev));
add_typed_string(_CS_HW_SERIAL, get_board_serial());
// 19. Load IFS and transfer control
init_system_private();
print_syspage();
return 0;
}
Command Line Options
QNX startup supports standard options plus BCM2711-specific ones:
- Common Options (
COMMON_OPTIONS_STRING):-v: Verbose mode-D: Debug device specification-K: Kernel options-P: Processor options- And many more…
- BCM2711-Specific:
-W[timeout]: Enable watchdog with optional timeout in milliseconds
3. Board Configuration System
Files:
src/hardware/startup/boards/bcm2711/bcm2711_init_board.csrc/hardware/startup/boards/bcm2711/rpi4/rpi4_board_config.h
GPIO/Pinmux Architecture
The BCM2711 has 54 GPIO pins, each capable of 8 different functions:
#define FUNC_IP 0 // Input
#define FUNC_OP 1 // Output
#define FUNC_A0 4 // Alternate Function 0
#define FUNC_A1 5 // Alternate Function 1
#define FUNC_A2 6 // Alternate Function 2
#define FUNC_A3 7 // Alternate Function 3
#define FUNC_A4 3 // Alternate Function 4
#define FUNC_A5 2 // Alternate Function 5
Pin Configuration Structure
typedef struct _bcm2711_pinmux {
uint8_t pin_num; // GPIO pin number (0-53)
uint8_t fsel; // Function select
uint8_t pull; // Pull-up/pull-down/none
uint8_t gpio_lvl; // Output level (if output)
uint8_t is_last; // Array terminator
} bcm2711_pinmux_t;
Board Configuration Example
const bcm2711_config_t rpi_board_config[] = {
{ // UART Configuration
.pinmux = (bcm2711_pinmux_t[])
{
{ .pin_num = 14, .fsel = FUNC_A5, .pull = PULL_UP, .gpio_lvl = 0 }, /* TXD0 */
{ .pin_num = 15, .fsel = FUNC_A5, .pull = PULL_UP, .gpio_lvl = 0 }, /* RXD0 */
{ .pin_num = 4, .fsel = FUNC_A4, .pull = PULL_UP, .gpio_lvl = 0 }, /* TXD3 */
{ .pin_num = 5, .fsel = FUNC_A4, .pull = PULL_UP, .gpio_lvl = 0 }, /* RXD3 */
{ .is_last = 1 }
},
},
{ // SPI Configuration
.pinmux = (bcm2711_pinmux_t[])
{
{ .pin_num = 7, .fsel = FUNC_A0, .pull = PULL_UP, .gpio_lvl = 0 }, /* SPI0_CE1_N */
{ .pin_num = 8, .fsel = FUNC_A0, .pull = PULL_UP, .gpio_lvl = 0 }, /* SPI0_CE0_N */
{ .pin_num = 9, .fsel = FUNC_A0, .pull = PULL_NONE, .gpio_lvl = 0 }, /* SPI0_MISO */
{ .pin_num = 10, .fsel = FUNC_A0, .pull = PULL_NONE, .gpio_lvl = 0 }, /* SPI0_MOSI */
{ .pin_num = 11, .fsel = FUNC_A0, .pull = PULL_NONE, .gpio_lvl = 0 }, /* SPI0_SCLK */
{ .is_last = 1 }
},
},
{ // I2C Configuration
.pinmux = (bcm2711_pinmux_t[])
{
{ .pin_num = 0, .fsel = FUNC_A0, .pull = PULL_UP, .gpio_lvl = 0 }, /* SDA0 */
{ .pin_num = 1, .fsel = FUNC_A0, .pull = PULL_UP, .gpio_lvl = 0 }, /* SCL0 */
{ .pin_num = 2, .fsel = FUNC_A0, .pull = PULL_UP, .gpio_lvl = 0 }, /* SDA1 */
{ .pin_num = 3, .fsel = FUNC_A0, .pull = PULL_UP, .gpio_lvl = 0 }, /* SCL1 */
{ .is_last = 1 }
},
},
};
GPIO Register Functions
static void bcm2711_set_gpio_fsel(const uint32_t gpio, const uint32_t fsel)
{
const uint32_t reg = (gpio / 10) * 4; // GPFSEL0-5 (10 pins per register)
const uint32_t sel = gpio % 10;
uint32_t val;
if ((gpio < GPIO_MIN) || (gpio > GPIO_MAX)) {
return;
}
val = in32(BCM2711_GPIO_BASE + reg);
val &= ~(0x7 << (3 * sel)); // Clear 3-bit field
val |= (fsel & 0x7) << (3 * sel); // Set new function
out32(BCM2711_GPIO_BASE + reg, val);
}
Register Layout: Each GPFSEL register controls 10 pins with 3 bits per pin:
- GPFSEL0 (offset 0x00): GPIO 0-9
- GPFSEL1 (offset 0x04): GPIO 10-19
- GPFSEL2 (offset 0x08): GPIO 20-29
- …and so on
Pull-Up/Pull-Down Configuration (BCM2711-Specific)
The BCM2711 uses a different pull-up/down mechanism than earlier BCM283x chips:
static void bcm2711_gpio_set_pull(const uint32_t gpio, const uint32_t type)
{
const uint32_t reg = GPPUPPDN0 + (gpio >> 4) * 4; // 16 pins per register
const uint32_t shift = (gpio & 0xf) << 1; // 2 bits per pin
uint32_t val;
if ((gpio < GPIO_MIN) || (gpio > GPIO_MAX)) {
return;
}
if ((type < 0) || (type > 2)) {
return;
}
val = in32(BCM2711_GPIO_BASE + reg);
val &= ~(3 << shift); // Clear 2-bit field
val |= (type << shift); // Set new pull type
out32(BCM2711_GPIO_BASE + reg, val);
}
Pull Types:
0: No pull1: Pull-up2: Pull-down
Board Revision Decoding
const char *get_board_name(const uint32_t board_rev)
{
const char * const types[] = {
[0] = "RaspberryPi1A",
[1] = "RaspberryPi1B",
[2] = "RaspberryPi1A+",
[3] = "RaspberryPi1B+",
[4] = "RaspberryPi2B",
[5] = "RaspberryPiAlpha",
[6] = "RaspberryPiCM1",
[8] = "RaspberryPi3B",
[9] = "RaspberryPi0",
[10] = "RaspberryPiCM3",
[12] = "RaspberryPi0W",
[13] = "RaspberryPi3B+",
[14] = "RaspberryPi3A+",
[16] = "RaspberryPiCM3+",
[17] = "RaspberryPi4B", // ← Our target
[18] = "RaspberryPi02W",
[19] = "RaspberryPi400",
[20] = "RaspberryPiCM4",
};
const uint32_t idx = (board_rev >> BOARD_REV_TYPE_SHIFT) & BOARD_REV_TYPE_MASK;
if (idx < NUM_ELTS(types)) {
return types[idx];
}
return "Unknown";
}
Board Revision Format (New-style, 32-bit):
Bit 31-25: Overvoltage, OTP flags
Bit 24: Unused
Bit 23: New-style revision flag (always 1)
Bit 22-20: Memory size (0=256MB, 1=512MB, 2=1GB, 3=2GB, 4=4GB, 5=8GB)
Bit 19-16: Manufacturer (0=Sony UK, 1=Egoman, 2=Embest, 3=Sony Japan, etc.)
Bit 15-12: Processor (0=BCM2835, 1=BCM2836, 2=BCM2837, 3=BCM2711)
Bit 11-4: Board type (17=RPi4B, 19=RPi400, 20=CM4)
Bit 3-0: Revision number
4. Memory Subsystem
File: src/hardware/startup/boards/bcm2711/bcm2711_init_raminfo.c
BCM2711 Memory Architecture
The BCM2711 has a complex memory map that differs from earlier Raspberry Pi models:
┌─────────────────────────────────┐ 0x4_0000_0000 (16GB)
│ Not Used │
├─────────────────────────────────┤ 0x2_0000_0000 (8GB)
│ SDRAM (ARM) - High │ ← For 8GB models only
│ (4GB region) │
├─────────────────────────────────┤ 0x1_0000_0000 (4GB)
│ ARM Local Peripherals │
│ (Local Timer, Mailboxes) │
├─────────────────────────────────┤ 0x0_FF80_0000
│ Main Peripherals │
│ (GPIO, UART, SPI, I2C, etc) │
├─────────────────────────────────┤ 0x0_FC00_0000 (3840MB)
│ SDRAM (ARM) - Low │
│ (before VideoCore split) │
├─────────────────────────────────┤
│ VideoCore Memory (VC RAM) │ ← GPU frame buffer, etc.
│ (split point varies) │
├─────────────────────────────────┤
│ SDRAM (ARM) - Low │ ← Main system RAM
│ │
├─────────────────────────────────┤ 0x0_0000_1000 (4KB)
│ Stub (Reserved, 4KB) │ ← Startup stub
├─────────────────────────────────┤ 0x0_0000_0000
Memory Initialization Code
void bcm2711_init_raminfo(void) {
const uint32_t vcram_size = mbox_get_vc_memory();
const paddr_t vcram_start = GIG(1) - vcram_size;
const uint32_t rev = mbox_get_board_revision();
uint64_t ramsize = 0;
/* Avoid touching the stub in startup */
avoid_ram(BCM2711_DRAM_BASE, BCM2711_DRAM_STUB_SIZE);
/* Avoid touching the VideoCore memory in startup */
avoid_ram(vcram_start, vcram_size);
ramsize = MEG(get_memory_size(rev));
if (ramsize == 0) {
ramsize = MEG(256);
kprintf("Warning: Failed to detect RAM size. Defaulting to %u MB.\n",
ramsize);
}
if (ramsize >= GIG(4)) {
add_ram(BCM2711_DRAM_BASE, GIG(4) - MEG(64)); // periphs are mapped to 0xfc000000
if (ramsize >= GIG(8)) {
add_ram(BCM2711_DRAM_BASE_1, GIG(4)); // for rpi4 8GB target
}
} else {
add_ram(BCM2711_DRAM_BASE, ramsize); // 256M, 512M, 1G, 2G
}
/* Reserve VideoCore memory */
alloc_ram(vcram_start, vcram_size, 0);
as_add_containing(vcram_start, vcram_start + vcram_size - 1, AS_ATTR_RAM, "vcram", "ram");
/* Add a below1G tag for legacy peripherals */
if (ramsize > GIG(1)) {
as_add_containing(BCM2711_DRAM_BASE, vcram_start - 1, AS_ATTR_RAM, "below1G", "ram");
}
if (debug_flag > 3) {
kprintf("vcram start-size: %P-%P, memory size: %l\n", vcram_start, vcram_size, ramsize);
}
}
Key Memory Concepts
- VideoCore Memory Split:
- The GPU (VideoCore VI) shares DRAM with the ARM cores
- Split configured in
config.txt(e.g.,gpu_mem=128) - VideoCore memory is at the top of the first 1GB
- Typical splits: 64MB, 128MB, 256MB
- 4GB Peripherals Hole:
- BCM2711 peripherals are mapped at 0xFC000000-0xFFFFFFFF
- This creates a 64MB hole unavailable for RAM
- For ≥4GB models, RAM is split: [0, 3840MB) and [4GB, 8GB)
- 8GB Support:
- 8GB models use both low (0-4GB) and high (4GB-8GB) regions
- Requires kernel compiled with
PADDR_BITS=40orLPAEsupport - High memory added separately with
add_ram(BCM2711_DRAM_BASE_1, GIG(4))
- Address Space Tags:
- “vcram”: Reserved for GPU
- “below1G”: RAM below 1GB barrier (for DMA-limited devices)
- These tags allow drivers to allocate from specific regions
Example Memory Configurations
1GB Model:
0x00000000 - 0x00000FFF: Stub (avoided)
0x00001000 - 0x3B3FFFFF: ARM RAM (~944MB)
0x3B400000 - 0x3FFFFFFF: VideoCore RAM (76MB, example)
0xFC000000 - 0xFFFFFFFF: Peripherals
4GB Model:
0x00000000 - 0x00000FFF: Stub (avoided)
0x00001000 - 0x3B3FFFFF: ARM RAM (~944MB)
0x3B400000 - 0x3FFFFFFF: VideoCore RAM (76MB)
0xFC000000 - 0xFFFFFFFF: Peripherals
8GB Model:
0x0_00000000 - 0x0_00000FFF: Stub (avoided)
0x0_00001000 - 0x0_3B3FFFFF: ARM RAM (~944MB)
0x0_3B400000 - 0x0_3FFFFFFF: VideoCore RAM (76MB)
0x0_FC000000 - 0x0_FFFFFFFF: Peripherals
0x1_00000000 - 0x1_FFFFFFFF: ARM RAM (4GB high memory)
5. Interrupt Controller Setup
File: src/hardware/startup/boards/bcm2711/init_intrinfo.c
ARM Generic Interrupt Controller (GIC)
The BCM2711 includes an ARM GICv2 controller with the following configuration:
void init_intrinfo(void)
{
/*
* Initialise GIC
*/
#ifndef BCM2711_GIC_V3
#define GICD_PADDR 0xff841000 // Distributor
#define GICC_PADDR 0xff842000 // CPU Interface
gic_v2_init(GICD_PADDR, GICC_PADDR);
#else
#define GIC_GICD_BASE 0x4c0040000 // CPU distributor BASE address
#define GIC_GICR_BASE 0x4c0041000 // CPU redistributor BASE address
#define GIC_GICC_BASE 0x4c0042000 // CPU interface, CBAR register value
gic_v3_init(GIC_GICD_BASE, GIC_GICR_BASE, GIC_GICC_BASE, 0, 1);
#endif
// ... MSI interrupt setup follows
}
GIC Architecture Overview
┌──────────────────────────────────────┐
│ Interrupt Sources │
│ (Peripherals, Timers, etc.) │
└──────────────┬───────────────────────┘
│
↓
┌──────────────────────┐
│ GIC Distributor │ ← Routes interrupts to CPU interfaces
│ (GICD @ 0xff841000)│ Configures priority, edge/level, etc.
└──────────┬───────────┘
│
┌──────┴──────┬──────┬──────┐
↓ ↓ ↓ ↓
┌────────┐ ┌────────┐ ... ┌────────┐
│ CPU IF │ │ CPU IF │ │ CPU IF │
│ Core 0 │ │ Core 1 │ │ Core 3 │
└────────┘ └────────┘ └────────┘
GIC Interrupt Types
- SGI (Software Generated Interrupts): 0-15
- Used for inter-processor communication (IPI)
- Core-to-core signaling for SMP
- PPI (Private Peripheral Interrupts): 16-31
- Per-core private interrupts
- Examples: ARM generic timer, PMU
- SPI (Shared Peripheral Interrupts): 32-1019
- Shared among all cores
- BCM2711 peripherals (UART, GPIO, SPI, etc.)
MSI (Message Signaled Interrupts) for PCIe
The BCM2711 includes a PCIe controller that supports MSI:
static const paddr_t bcm2711_msi_base = BCM2711_PCIE_BASE + 0x4000;
void init_intrinfo(void)
{
const struct startup_intrinfo interrupts[] =
{
/* MSI interrupts (32) starting at 0x200 */
{ .vector_base = 0x200, // MSI vector base (MSI 0x200 - 0x21F)
.num_vectors = 32, // 32 MSI vectors
.cascade_vector = BCM2711_PCIE_MSI, // Parent SPI interrupt
.cpu_intr_base = 0, // CPU-relative base (0-31)
.cpu_intr_stride = 0,
.flags = INTR_FLAG_MSI,
.id = { .genflags = INTR_GENFLAG_NOGLITCH, .size = 0,
.rtn = &interrupt_id_bcm2711_msi },
.eoi = { .genflags = INTR_GENFLAG_LOAD_INTRMASK, .size = 0,
.rtn = &interrupt_eoi_bcm2711_msi },
.mask = &interrupt_mask_bcm2711_msi,
.unmask = &interrupt_unmask_bcm2711_msi,
.config = 0,
.patch_data = (paddr_t*)&bcm2711_msi_base,
},
};
add_interrupt_array(interrupts, sizeof interrupts);
}
MSI Interrupt Callout (Assembly)
File: src/hardware/startup/boards/bcm2711/callout_interrupt_bcm2711_msi.S
CALLOUT_START(interrupt_id_bcm2711_msi, 0, patch_intr)
/* Get the MSI base address (patched) */
mov x7, #0x0000
movk x7, #0x0000, lsl #16
movk x7, #0x0000, lsl #32
movk x7, #0x0000, lsl #48
mov x19, #-1
/*
* Read Interrupt Mask and Status
*/
ldr w0, [x7, #BCM2711_MSI_MASK_STATUS]
ldr w1, [x7, #BCM2711_MSI_STATUS]
/* clear any masked bits from the status register */
bics w1, w1, w0
/* if no more bits set we're done */
cbz w1, 0f
/* calculate the interrupt number to be returned */
rbit w2, w1 // Reverse bits
clz w1, w2 // Count leading zeros
/* if ID not between 0 and 31, there's nothing to do */
cmp w1, #BCM2711_MAX_MSI
bge 0f
mov w19, w1
/* create the bit mask to clear the status register */
mov w2, #1
lsl w1, w2, w19
str w1, [x7, #BCM2711_MSI_MASK_SET]
str w1, [x7, #BCM2711_MSI_INTR_CLR]
0:
CALLOUT_END(interrupt_id_bcm2711_msi)
How it Works:
- Read MSI status register (32 bits, one per MSI vector)
- Mask out any already-masked interrupts
- Find the first set bit (lowest priority-first)
- Return the bit position as the interrupt ID
- Mask and clear that MSI source
BCM2711 Interrupt Map (Selected)
| Vector | Source | Description |
|---|---|---|
| 32 | TIMER0 | ARM Timer (system timer) |
| 33 | TIMER1 | ARM Timer |
| 64 | GPIO0 | GPIO bank 0 (pins 0-27) |
| 65 | GPIO1 | GPIO bank 1 (pins 28-45) |
| 66 | GPIO2 | GPIO bank 2 (pins 46-53) |
| 79 | I2C0 | I2C master 0 |
| 80 | SPI0 | SPI master 0 |
| 85 | I2C1 | I2C master 1 |
| 121 | UART0 | PL011 UART 0 |
| 125 | UART3 | PL011 UART 3 |
| 153 | ETHERNET | GENET Ethernet (queue 0) |
| 154 | ETHERNET | GENET Ethernet (queue 16) |
| 158 | PCIE_MSI | PCIe MSI (cascades to 32 MSI vectors) |
6. Hardware Information Database
File: src/hardware/startup/boards/bcm2711/rpi4/init_hwinfo.c
Purpose
The Hardware Information (hwinfo) database is a QNX-specific feature that stores comprehensive hardware configuration in the system page. Device drivers query this database at runtime instead of using hardcoded addresses.
Structure
void init_hwinfo(void)
{
const unsigned hwi_bus_internal = 0;
/* Add ENET (Ethernet) */
{
unsigned i, hwi_off;
uint8_t mac[6];
hwiattr_enet_t attr = HWIATTR_ENET_T_INITIALIZER;
hwiattr_common_t common_attr = HWIATTR_COMMON_INITIALIZER;
HWIATTR_ENET_SET_NUM_IRQ(&attr, 1);
uint32_t irqs[] = {
BCM2711_ENET_IRQA, // Queue 0 interrupt
BCM2711_ENET_IRQB // Queue 16 interrupt
};
HWIATTR_SET_NUM_IRQ(&common_attr, NUM_ELTS(irqs));
/* Create GENET device */
HWIATTR_ENET_SET_LOCATION(&attr, BCM2711_ENET_BASE,
BCM2711_ENET_SIZE, 0, hwi_find_as(BCM2711_ENET_BASE, 1));
hwi_off = hwidev_add_enet(BCM2711_HWI_GENET, &attr, hwi_bus_internal);
hwitag_add_common(hwi_off, &attr);
ASSERT(hwi_find_unit(hwi_off) == 0);
/* Add IRQ numbers */
for(i = 0; i < NUM_ELTS(irqs); i++) {
hwitag_set_ivec(hwi_off, i, irqs[i]);
}
/* Get MAC address from VideoCore mailbox */
mbox_get_board_mac_address(mac);
hwitag_add_nicaddr(hwi_off, mac, sizeof(mac));
}
/* add the WATCHDOG device */
{
unsigned hwi_off;
hwiattr_timer_t attr = HWIATTR_TIMER_T_INITIALIZER;
const struct hwi_inputclk clksrc_kick = {.clk = 500, .div = 1};
HWIATTR_TIMER_SET_NUM_CLK(&attr, 1);
HWIATTR_TIMER_SET_LOCATION(&attr, BCM2711_PM_BASE, BCM2711_PM_SIZE, 0,
hwi_find_as(BCM2711_PM_BASE, 1));
hwi_off = hwidev_add_timer("wdog", &attr, HWI_NULL_OFF);
ASSERT(hwi_off != HWI_NULL_OFF);
hwitag_set_inputclk(hwi_off, 0, (struct hwi_inputclk *)&clksrc_kick);
hwi_off = hwidev_add("wdt,options", 0, HWI_NULL_OFF);
hwitag_add_regname(hwi_off, "write_width", 32);
hwitag_add_regname(hwi_off, "enable_width", 32);
}
}
HWInfo Database Layout
The hwinfo database is a tree structure:
System Bus (hwi_bus_internal)
├── Ethernet (GENET)
│ ├── Location: 0xFD580000, size 0x10000
│ ├── IRQ[0]: 153 (queue 0)
│ ├── IRQ[1]: 154 (queue 16)
│ └── MAC Address: xx:xx:xx:xx:xx:xx
├── Watchdog Timer
│ ├── Location: 0xFE100000, size 0x1000
│ ├── Clock: 500 Hz
│ └── Options: write_width=32, enable_width=32
└── [Other devices added by common code]
Querying HWInfo at Runtime
Device drivers use the Hardware Abstraction Layer (HAL) to query hwinfo:
// Example: Ethernet driver queries its configuration
#include <hw/hwinfo.h>
void enet_driver_init(void) {
unsigned hwi_off = hwi_find_device("genet", 0); // Find first GENET
if (hwi_off != HWI_NULL_OFF) {
uint64_t base;
unsigned size;
unsigned irq;
uint8_t mac[6];
// Get base address and size
hwitag_find_ivaddr(hwi_off, NULL, &base, &size);
// Get interrupt vector
hwitag_find_ivec(hwi_off, 0, &irq);
// Get MAC address
hwitag_find_nicaddr(hwi_off, mac, sizeof(mac));
kprintf("GENET at 0x%llx, IRQ %d, MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
base, irq, mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
}
}
7. Callout Mechanism
What Are Callouts?
Callouts are small, position-independent assembly routines embedded in the system page that provide low-level hardware access. They’re used by the kernel and drivers without requiring full context switches.
Benefits:
- Fast execution (no context switch)
- Small footprint
- Can run with minimal kernel state
- Position-independent (can be relocated)
Types of Callouts
- Debug Callouts: Character I/O for kernel debugging
- Reboot Callouts: System restart
- Timer Callouts: System timer access
- Interrupt Callouts: Interrupt controller operations
- Cache Callouts: Cache maintenance
Debug Callouts - Mini UART
File: src/hardware/startup/boards/bcm2711/callout_debug_miniuart.S
/*
* Patch callouts
* The patcher replaces the mov/movk sequence with the actual register address
*/
patch_debug:
sub sp, sp, #16
stp x19, x30, [sp]
add x19, x0, x2 // address of callout routine
// Map registers (convert physical to virtual)
mov x0, #0x100
ldr x1, [x4] // Get physical address from patch_data
bl callout_io_map // Returns virtual address in x0
// Patch the mov/movk instructions with virtual address
CALLOUT_PATCH x19, w6, w7
ldp x19, x30, [sp]
add sp, sp, #16
ret
/*
* Display a character on Mini UART
*/
CALLOUT_START(display_char_miniuart, 0, patch_debug)
mov x7, #0xabcd // These instructions are patched
movk x7, #0xabcd, lsl #16 // to contain the actual
movk x7, #0xabcd, lsl #32 // virtual address of
movk x7, #0xabcd, lsl #48 // the UART registers
// Wait for transmitter ready
0:
ldr w2, [x7, #AUX_MU_LSR_REG] // Read Line Status Register
tbz w2, #5, 0b // Test bit 5 (TX ready)
// Send the character
and w2, w1, #0xff
str w2, [x7, #AUX_MU_IO_REG]
// Wait for transmission complete
1:
ldr w2, [x7, #AUX_MU_LSR_REG]
tbz w2, #6, 1b // Test bit 6 (TX complete)
ret
CALLOUT_END(display_char_miniuart)
Callout Patching Process
The callout patching is a clever technique that allows the same callout code to work at different addresses:
- Startup Phase:
- Callout code is written with placeholder addresses (0xabcd…)
- Physical hardware address is known
- Patching Phase:
callout_io_map()creates a virtual mapping for the hardware- The patcher rewrites the
mov/movkinstructions with the actual virtual address
- Runtime Phase:
- Callout executes with the correct virtual address
- No overhead - direct register access
Reboot Callout
File: src/hardware/startup/boards/bcm2711/callout_reboot_bcm2711.S
#define RSTC 0x1c // Reset Control register
#define WDOG 0x24 // Watchdog register
#define PASSWORD 0x5a000000 // Required password for PM registers
#define RESET_CMD 0x20 // Full reset command
CALLOUT_START(reboot_bcm2711, rw_size_reboot, patch_reboot)
mov w7, #0xabcd // offset of rw data (patched)
add x7, x7, x0 // address of rwdata
ldr x5, [x7] // register base (virtual address)
/*
* Set up the watchdog timer to something really small
* and let it trigger the reboot
*/
mov w1, #PASSWORD
mov w0, #10 // 10 ticks (~150 microseconds)
orr w0, w0, w1
str w0, [x5, #WDOG]
ldr w0, [x5, #RSTC]
bic w0, w0, #0x30 // Clear reset type
orr w0, w0, w1 // Add password
orr w0, w0, #RESET_CMD // Set full reset
str w0, [x5, #RSTC]
0: b 0b // Wait for reboot (never returns)
CALLOUT_END(reboot_bcm2711)
How BCM2711 Reboot Works:
- The PM (Power Management) watchdog is used for reboot
- Set watchdog to very short timeout (10 ticks)
- Configure RSTC for “full reset” mode
- Watchdog times out → hardware reset triggered
- GPU bootloader takes over from scratch
8. SMP (Multi-Core) Support
File: src/hardware/startup/boards/bcm2711/board_smp.c
Raspberry Pi 4 Multi-Core Architecture
The BCM2711 has 4 Cortex-A72 cores:
- All cores start in a powered-off state except core 0
- Core 0 (bootstrap processor) initializes the system
- Secondary cores (1-3) are started by writing to spin table addresses
SMP Initialization
unsigned board_smp_num_cpu(void) {
return 4; // BCM2711 has 4 cores
}
void board_smp_init(struct smp_entry *smp, const unsigned num_cpus)
{
#ifndef BCM2711_GIC_V3
smp->send_ipi = (void *)&sendipi_gic_v2;
#else
smp->send_ipi = (void *)&sendipi_gic_v3_sr;
#endif
/* Need to turn on the IPI interrupt for mailbox 0 on all cores */
uint32_t cpu;
for (cpu = 0; cpu < num_cpus; ++cpu) {
const uint32_t mbic_addr = 0x40000050 + 0x4 * cpu;
uint32_t mbic = in32(mbic_addr);
mbic |= 0x1; // Enable mailbox 0 interrupt
out32(mbic_addr, mbic);
}
}
Starting Secondary Cores
static void wake_aux_cpu(void)
{
// Same assembly works in 32 and 64 bit
__asm__ __volatile__(
"dsb sy\n" // Data Synchronization Barrier
"sev\n" // Send Event (wakes cores from WFE)
: : : "memory"
);
}
int board_smp_start(unsigned cpu, void (*start)(void))
{
typedef void (*start_t)(void);
volatile start_t *cpu_start = (start_t *)(0xd8L + cpu * 8);
*cpu_start = start; // Write start address to spin table
wake_aux_cpu(); // Wake the core
return 1;
}
Spin Table Mechanism
The BCM2711 uses a spin table for SMP startup:
Address Purpose
--------- -------
0x000000D8 CPU 1 start address
0x000000E0 CPU 2 start address
0x000000E8 CPU 3 start address
Boot Sequence:
- GPU firmware places secondary cores in WFE (Wait For Event) loop
- Each core polls its spin table address
- Startup writes function pointer to spin table
- Startup executes SEV (Send Event) instruction
- Secondary core wakes up, reads function pointer, jumps to it
Inter-Processor Interrupts (IPI)
IPIs are used for:
- TLB shootdown (invalidate TLB on other cores)
- Scheduler preemption
- System calls that affect all cores
// IPI mechanism (simplified)
void sendipi_gic_v2(struct syspage_entry *syspage, unsigned cpu, int cmd, void *arg)
{
// GIC uses SGI (Software Generated Interrupt) for IPI
// Write to GICD_SGIR register:
// - Target CPU in bits [16:23]
// - SGI number (0-15) in bits [0:3]
const unsigned sgi_num = 0; // Use SGI 0 for IPI
const uint32_t gicd_sgir = GICD_BASE + 0xF00;
// Send SGI to specific CPU
out32(gicd_sgir, (1 << (16 + cpu)) | sgi_num);
}
Core-Local Mailboxes
The BCM2711 has ARM local peripherals including per-core mailboxes:
0x4000_0000 ARM Local Base
+0x0050 Core 0 Mailbox 0 Interrupt Control
+0x0054 Core 1 Mailbox 0 Interrupt Control
+0x0058 Core 2 Mailbox 0 Interrupt Control
+0x005C Core 3 Mailbox 0 Interrupt Control
+0x0060 Core 0 IRQ Source
+0x0064 Core 1 IRQ Source
...
+0x0080 Core 0 Mailbox 0 Write-Set
+0x0084 Core 0 Mailbox 1 Write-Set
...
These mailboxes provide an alternative IPI mechanism, though QNX primarily uses GIC SGIs.
9. VideoCore Mailbox Interface
Files:
src/hardware/startup/boards/bcm2711/mbox.csrc/hardware/startup/boards/bcm2711/mbox.h
VideoCore Architecture
Source: This boot sequence is documented in Raspberry Pi architecture guides and confirmed by evidence in the code.
The Raspberry Pi 4 has a unique architecture where the GPU (VideoCore VI) boots first:
Power On
↓
VideoCore GPU boots from bootcode.bin on SD card
↓
GPU initializes DRAM
↓
GPU loads start.elf (GPU firmware)
↓
GPU parses config.txt
↓
GPU loads kernel (startup-bcm2711-rpi4)
↓
GPU releases ARM cores from reset
↓
ARM cores start executing at kernel entry point
Code Evidence for GPU-first Boot:
mbox.c: Extensive mailbox communication functions (ARM requests info from GPU)mbox_get_vc_memory(): GPU owns and allocates VideoCore RAMmbox_get_cmdline(): Boot cmdline comes from GPU firmwarembox_get_clock_rate(): CPU frequency set by GPU- Minimal
_start.S: No DRAM initialization (GPU already did it) - Memory reservation for “VideoCore memory” in
bcm2711_init_raminfo.c
Mailbox Protocol
Source: Protocol details from code analysis (mbox.c implementation).
The mailbox is a bidirectional communication channel:
#define MBOX_REG_BASE 0xfe00b880
#define MBOX_BUFFER_ADDR 0x100 // Fixed buffer at 0x100
#define MBOX_BUFFER_SIZE 128
#define MBOX_SEND_CHANNEL 8
// Mailbox registers
#define MBOX0_READ 0x00 // Read from VC
#define MBOX0_STATUS 0x18 // Status flags
#define MBOX1_WRITE 0x20 // Write to VC
#define MBOX1_STATUS 0x38 // Status flags
#define MBOX_STATUS_FULL 0x80000000
#define MBOX_STATUS_EMPTY 0x40000000
Message Structure
struct mbox_msg_header {
uint32_t size; // Total message size (including header)
uint32_t code; // Request: 0, Response: 0x80000000
};
struct mbox_msg_tag {
uint32_t tag; // Property tag ID
uint32_t size; // Buffer size
uint32_t code; // Request: 0, Response: 0x80000000 | value_length
};
// Example message format
struct mbox_msg_get_clock_rate {
struct mbox_msg_header header;
struct mbox_msg_tag tag;
struct {
uint32_t clock_id; // Input: which clock
uint32_t rate; // Output: clock rate in Hz
} body;
uint32_t end; // End tag: 0x00000000
};
Sending a Mailbox Message
static uint32_t mbox_send_message(void) {
uint32_t rdata;
// Wait for mailbox ready (not full)
for (;;) {
rdata = *((volatile uint32_t *) (MBOX_REG_BASE + MBOX0_STATUS));
if (!(rdata & MBOX_STATUS_FULL)) {
break;
}
}
// Send message address | channel number
rdata = (MBOX_BUFFER_ADDR & ~0xf) | MBOX_SEND_CHANNEL;
*((volatile uint32_t *) (MBOX_REG_BASE + MBOX1_WRITE)) = rdata;
// Wait for response
for (;;) {
// Wait for mailbox not empty
rdata = *((volatile uint32_t *) (MBOX_REG_BASE + MBOX0_STATUS));
if (rdata & MBOX_STATUS_EMPTY) {
continue;
}
// Read response
rdata = *((volatile uint32_t *) (MBOX_REG_BASE + MBOX0_READ));
// Make sure it's for our channel
if ((rdata & 0xf) == MBOX_SEND_CHANNEL) {
return rdata & ~0xf;
}
}
return 0;
}
Common Mailbox Functions
Get Board Revision
uint32_t mbox_get_board_revision(void) {
volatile struct mbox_msg_get_board_revision {
struct mbox_msg_header header;
struct mbox_msg_tag tag;
struct {
uint32_t revision;
} body;
uint32_t end;
} *msg = (volatile struct mbox_msg_get_board_revision *) MBOX_BUFFER_ADDR;
MSG_INIT(msg, MBOX_TAG_GET_BOARDREVISION);
mbox_send_message();
return msg->body.revision;
}
Returns: 32-bit revision code, e.g., 0xa03111 for RPi 4B 1GB:
0xa= New-style revision3= BCM2711 SoC1= 512MB RAM11= RPi 4B type1= Revision 1.1
(Format documented in init_board_type.c comments and Raspberry Pi documentation)
Get MAC Address
void mbox_get_board_mac_address(uint8_t* const address) {
volatile struct mbox_msg_get_board_mac_address {
struct mbox_msg_header header;
struct mbox_msg_tag tag;
struct {
uint8_t address[6];
uint8_t _pad[2];
} body;
uint32_t end;
} *msg = (volatile struct mbox_msg_get_board_mac_address *) MBOX_BUFFER_ADDR;
MSG_INIT(msg, MBOX_TAG_GET_MACADDRESS);
mbox_send_message();
memcpy(address, (uint8_t*) msg->body.address, 6);
}
Returns: Ethernet MAC address (burned into BCM2711 OTP)
Get Clock Rate
uint32_t mbox_get_clock_rate(const uint32_t id) {
volatile struct mbox_msg_get_clock_rate {
struct mbox_msg_header header;
struct mbox_msg_tag tag;
struct {
uint32_t clock_id;
uint32_t rate;
} body;
uint32_t end;
} *msg = (volatile struct mbox_msg_get_clock_rate *) MBOX_BUFFER_ADDR;
MSG_INIT(msg, MBOX_TAG_GET_CLOCKRATE);
msg->body.clock_id = id;
mbox_send_message();
return msg->body.rate;
}
Clock IDs:
1: EMMC12: UART3: ARM (CPU frequency)4: CORE (VPU frequency)5: V3D6: H2647: ISP8: SDRAM9: PIXEL10: PWM12: EMMC2
Get Temperature
int32_t mbox_get_temperature(const uint32_t id) {
volatile struct {
struct mbox_msg_header header;
struct mbox_msg_tag tag;
struct {
uint32_t temperature_id;
int32_t millidegrees;
} body;
uint32_t end;
} *msg = (volatile void *) MBOX_BUFFER_ADDR;
MSG_INIT(msg, MBOX_TAG_GET_TEMPERATURE);
msg->body.temperature_id = id;
mbox_send_message();
return msg->body.millidegrees; // Returns temperature in millidegrees C
}
Get GPIO State (Expansion GPIO)
The BCM2711 has “expansion GPIOs” managed by the GPU firmware:
uint32_t mbox_get_expgpio(const uint32_t gpio) {
volatile struct {
struct mbox_msg_header header;
struct mbox_msg_tag tag;
struct {
uint32_t gpio;
uint32_t state;
} body;
uint32_t end;
} *msg = (volatile void *) MBOX_BUFFER_ADDR;
MSG_INIT(msg, MBOX_TAG_GET_GPIO_STATE);
msg->body.gpio = gpio;
msg->body.state = 0;
mbox_send_message();
return msg->body.state;
}
Expansion GPIOs:
128: BT_ON (Bluetooth power)129: WL_ON (WiFi power)130: PWR_LED (Power LED control)131: ACT_LED (Activity LED)132: SD_VDDIO (SD card voltage)133: CAM_SHUTDOWN (Camera shutdown)
Mailbox in Main Startup
The main initialization uses mailbox extensively:
// Get CPU frequency
cpu_freq = mbox_get_clock_rate(MBOX_CLK_ARM);
// Get memory configuration
const uint32_t vcram_size = mbox_get_vc_memory();
// Get board info
board_rev = mbox_get_board_revision();
uint64_t serial = mbox_get_board_serial();
// Debug output
if (debug_flag > 3) {
kprintf("Temps %d\n", mbox_get_temperature(0));
kprintf("Volts");
for (unsigned i = 1; i <= 4; i++) {
kprintf(" %u=%u", i, mbox_get_voltage(i));
}
kprintf("\n");
}
10. Watchdog Timer
File: src/hardware/startup/boards/bcm2711/init_wdt.c
BCM2711 PM Watchdog
The Power Management block includes a watchdog timer that can reset the system:
#define PM_PASSWORD 0x5a000000 // Required for all PM writes
#define PM_WDOG_TIME_SET 0x000fffff // Max watchdog value
#define PM_RSTC_WRCFG_CLR 0xffffffcf // Clear reset config
#define PM_RSTS_HADWRH_SET 0x00000040 // Hardware reset flag
#define PM_RSTC_WRCFG_SET 0x00000030 // Full reset config
#define PM_RSTC_WRCFG_FULL_RESET 0x00000020 // Full reset mode
#define PM_RSTC_STOP 0x00000102 // Stop watchdog
#define PM_WDOG_MS_TO_TICKS 67UL // Conversion factor
#define WDT_MAX_TICKS 0x000fffff // Maximum ticks
#define WDT_DEFAULT_TICKS 0x00010000 // Default timeout
Watchdog Enable
void bcm2711_wdt_enable(const int timeout_value)
{
const uintptr_t base = (uintptr_t) BCM2711_PM_BASE;
uint32_t val;
uint64_t ticks;
/* Disable watch dog first */
out32(base + BCM2711_PM_RSTC, PM_PASSWORD | PM_RSTC_STOP);
// Calculate timeout in ticks
if (timeout_value <= 1000) {
val = WDT_DEFAULT_TICKS; // ~4.2 seconds
} else {
ticks = timeout_value * PM_WDOG_MS_TO_TICKS;
if (ticks > WDT_MAX_TICKS) {
ticks = WDT_MAX_TICKS; // Cap at maximum (~262 seconds)
}
val = (uint32_t) ticks;
}
/* Reload the timer value */
out32(base + BCM2711_PM_WDOG, PM_PASSWORD | val);
/* Enable the watchdog timer in full reset mode */
val = in32(base + BCM2711_PM_RSTC) & PM_RSTC_WRCFG_CLR;
val |= (PM_PASSWORD | PM_RSTC_WRCFG_FULL_RESET);
out32(base + BCM2711_PM_RSTC, val);
}
Watchdog Timing
- Tick Rate: ~16 kHz (one tick ≈ 62.5 microseconds)
- Conversion: 1 millisecond ≈ 67 ticks (actually 67.108…)
- Default: 0x10000 ticks ≈ 4.2 seconds
- Maximum: 0xFFFFF ticks ≈ 262 seconds
Usage
Enable watchdog with custom timeout:
startup-bcm2711-rpi4 -W5000 # 5 second watchdog
The watchdog must be “kicked” periodically by a userspace utility (wdtkick) to prevent system reset:
// Typical wdtkick implementation
while (1) {
out32(BCM2711_PM_BASE + BCM2711_PM_WDOG, PM_PASSWORD | current_value);
sleep(timeout / 2); // Kick before timeout expires
}
Conclusion
The QNX startup code for Raspberry Pi 4 is a sophisticated piece of low-level systems programming that bridges the gap between the VideoCore GPU bootloader and the QNX kernel. Key takeaways:
Architecture Highlights
- Dual-Processor Boot: GPU boots first, initializes hardware, then releases ARM cores
- Clean Abstraction: Board-specific code separated from common ARM64 logic
- Mailbox Communication: Extensive use of GPU mailbox for hardware queries
- Modern ARM: GICv2 interrupts, ARMv8-A AArch64 mode, 4-core SMP
Memory Management
- Flexible RAM configuration (1GB to 8GB)
- VideoCore memory sharing (split configurable)
- High memory support for 8GB models
- Address space tagging for DMA constraints
Hardware Database
- Runtime hardware discovery
- No hardcoded device addresses in drivers
- Supports board variants transparently
- MAC address, serial number from OTP
Performance Optimizations
- Callouts avoid context switches
- Direct hardware access in critical paths
- SMP support for all 4 cores
- MSI interrupts for PCIe
Production Features
- Watchdog timer support
- System reboot mechanism
- Comprehensive debug output
- Multiple UART options
This startup code demonstrates production-quality embedded systems design, balancing flexibility, performance, and maintainability. It serves as an excellent reference for anyone developing low-level code for modern ARM SoCs.
Additional Resources
- QNX Documentation: http://www.qnx.com/developers/docs/
- BCM2711 Documentation: Broadcom BCM2711 ARM Peripherals
- Raspberry Pi Documentation: https://www.raspberrypi.org/documentation/
- ARM Cortex-A72 TRM: ARM Technical Reference Manual
- GICv2 Architecture: ARM Generic Interrupt Controller Architecture Specification
Author Note: This document is based on analysis of QNX SDP 8.0 BSP for Raspberry Pi 4 (BCM2711). Code examples are simplified for clarity. Always refer to the original source code and official documentation for production use.
License: This analysis document follows the same Apache License 2.0 as the analyzed code.
Date: February 7, 2026