There’s something almost magical about the moment you realise that GPIOA->ODR |= (1 << 5) isn’t just moving bits around in memory—it’s physically changing voltage on a wire, photons leaving an LED, the real world responding to your code.
But how does writing to an address actually do something?
The Core Idea
Your microcontroller has an address space. On a 32-bit chip, this runs from 0x00000000 to 0xFFFFFFFF. Some addresses point to RAM, where your variables live. Some point to flash, where your program code lives.
And some addresses don’t point to memory at all. They point to hardware. Read that again if you’re coming from application development (as I am), because this probably sounds strange.
When you read from or write to one of these special addresses, you’re communicating with a peripheral. Writing a value might turn on an LED, change a timer’s speed, or send a byte out of a serial port. Reading a value might tell you whether a button is pressed or whether a transmission has completed.
This is memory-mapped I/O. Hardware peripherals are “mapped” into the memory address space, and you interact with them using ordinary reads and writes.
A Concrete Example: Turning on an LED
Say you have an LED connected to pin 5 of GPIO port A on an STM32 microcontroller. The GPIO peripheral has several registers, and the datasheet tells you that GPIO port A’s registers start at address 0x40020000. A register is just a small piece of memory inside the peripheral that controls or reflects some aspect of its behaviour. The Output Data Register (ODR), which controls the pin states, is at offset 0x14 from that base.
In C:
// The ODR register for GPIOA is at address 0x40020014.
// We cast this address to a pointer, then dereference it so we can
// read and write it like a variable.
#define GPIOA_ODR (*(volatile uint32_t *)0x40020014)
// Turn on the LED by setting bit 5
GPIOA_ODR |= (1 << 5);
// Turn off the LED by clearing bit 5
GPIOA_ODR &= ~(1 << 5);When you write GPIOA_ODR |= (1 << 5), the compiler generates a read from address 0x40020014, an OR operation, and a write back to that address. That write physically changes the voltage on pin 5, which lights up your LED.
Why “Volatile” Matters
Notice the volatile keyword. This is critical for MMIO, and leaving it out will cause bugs.
The compiler tries to optimise your code. One optimisation is caching values in CPU registers rather than reading from memory repeatedly. Without volatile, the compiler becomes an overly helpful optimiser that’s about to ruin your day.
Consider this loop that waits for a button press:
// Wait until bit 0 of the input register goes high
while ((GPIOA_IDR & (1 << 0)) == 0) {
// Waiting...
}The compiler looks at your loop, sees you reading the same address repeatedly, and thinks: “Why read this a thousand times? I’ll just cache it.” Reasonable! Except the hardware doesn’t care what the compiler thinks. Your code effectively becomes:
// BAD: Compiler "optimises" by caching the value
uint32_t cached_value = GPIOA_IDR; // Read once
while ((cached_value & (1 << 0)) == 0) {
// This loops forever! cached_value never updates,
// even when the button is pressed.
}But GPIOA_IDR is a hardware register. Its value changes when someone presses the button. Your code would never see that change because it’s looking at a stale copy.
The volatile keyword tells the compiler: “This value can change outside the program’s control. Read it fresh every single time.” With volatile, the compiler generates an actual memory read on every loop iteration.
Structuring Register Access
Writing raw addresses everywhere gets messy. A cleaner approach is to define a struct that matches the peripheral’s register layout:
// This struct must match the hardware register layout exactly.
// Each field corresponds to a register at a specific offset.
// The order matters: the compiler lays out fields sequentially,
// so MODER is at offset 0, OTYPER at offset 4, and so on.
typedef struct {
volatile uint32_t MODER; // Offset 0x00: Mode register
volatile uint32_t OTYPER; // Offset 0x04: Output type register
volatile uint32_t OSPEEDR; // Offset 0x08: Output speed register
volatile uint32_t PUPDR; // Offset 0x0C: Pull-up/pull-down register
volatile uint32_t IDR; // Offset 0x10: Input data register
volatile uint32_t ODR; // Offset 0x14: Output data register
volatile uint32_t BSRR; // Offset 0x18: Bit set/reset register
volatile uint32_t LCKR; // Offset 0x1C: Lock register
volatile uint32_t AFRL; // Offset 0x20: Alternate function low
volatile uint32_t AFRH; // Offset 0x24: Alternate function high
} GPIO_TypeDef;
// Create pointers to each GPIO port at their base addresses
#define GPIOA ((GPIO_TypeDef *)0x40020000)
#define GPIOB ((GPIO_TypeDef *)0x40020400)
#define GPIOC ((GPIO_TypeDef *)0x40020800)Now you can write:
GPIOA->ODR |= (1 << 5); // Set pin 5 high
if (GPIOB->IDR & (1 << 0)) {
// Pin 0 on port B is high
}This is exactly what vendor-provided header files like stm32f4xx.h give you. The manufacturer has already turned the datasheet into C structures—which, when you think about it, is a pretty wild artefact. Thousands of pages of hardware documentation, distilled into structs and macros.
Read-Modify-Write: A Subtle Trap
When you write GPIOA->ODR |= (1 << 5), three things happen: read the current value, OR in your bit, write the result back. This is called read-modify-write, and it can cause problems.
If an interrupt fires between the read and the write, and the interrupt handler modifies the same register, your write will overwrite the handler’s changes.
The read-modify-write trap catches everyone eventually. My first encounter was a motor controller that would occasionally ignore commands, but only when the system was under load. Took me ages to realise an interrupt was racing against my main loop.
Many peripherals provide registers that avoid this. On STM32, the GPIO has a Bit Set/Reset Register (BSRR) that lets you set or clear individual pins with a single write:
// Set pin 5 high. This is a single write, no read required.
// Only pin 5 is affected; other pins are unchanged.
GPIOA->BSRR = (1 << 5);
// Set pin 5 low. The lower 16 bits of BSRR set pins,
// the upper 16 bits clear pins. Pin 5's clear bit is bit 21 (5 + 16).
GPIOA->BSRR = (1 << 21);This is faster and avoids race conditions with interrupt handlers. There’s something deeply satisfying when hardware designers give you atomic operations like this. They’ve been bitten by the same bugs and decided to solve them in silicon.
A Complete Example: Blinking an LED
Here’s a standalone example showing what it takes to blink an LED on PA5 (port A, pin 5) on an STM32F4:
#include <stdint.h>
// Register structures (simplified from the full definitions)
typedef struct {
volatile uint32_t CR;
volatile uint32_t PLLCFGR;
volatile uint32_t CFGR;
volatile uint32_t CIR;
volatile uint32_t AHB1RSTR;
volatile uint32_t AHB2RSTR;
volatile uint32_t AHB3RSTR;
volatile uint32_t RESERVED0;
volatile uint32_t APB1RSTR;
volatile uint32_t APB2RSTR;
volatile uint32_t RESERVED1[2];
volatile uint32_t AHB1ENR; // Offset 0x30: enables clocks to peripherals
} RCC_TypeDef;
typedef struct {
volatile uint32_t MODER;
volatile uint32_t OTYPER;
volatile uint32_t OSPEEDR;
volatile uint32_t PUPDR;
volatile uint32_t IDR;
volatile uint32_t ODR;
volatile uint32_t BSRR;
} GPIO_TypeDef;
#define RCC ((RCC_TypeDef *)0x40023800)
#define GPIOA ((GPIO_TypeDef *)0x40020000)
// Simple delay loop. The 'volatile' prevents the compiler from
// optimising away the entire loop.
void delay(volatile uint32_t count) {
while (count--);
}
int main(void) {
// Step 1: Enable the clock to GPIO port A.
// Peripherals are powered down by default to save energy.
// Bit 0 of AHB1ENR controls the GPIOA clock.
// Note: This is a read-modify-write, but there's no atomic alternative
// for clock enables. Safe here since interrupts aren't enabled yet.
RCC->AHB1ENR |= (1 << 0);
// Step 2: Configure PA5 as a general-purpose output.
// The MODER register uses 2 bits per pin:
// 00 = input
// 01 = general-purpose output
// 10 = alternate function
// 11 = analog
// Pin 5 is controlled by bits 10 and 11 (pin number * 2).
GPIOA->MODER &= ~(0x3 << 10); // Clear both bits first
GPIOA->MODER |= (0x1 << 10); // Set to output mode
// Step 3: Blink forever
while (1) {
// Set PA5 high (LED on).
// Writing to the lower 16 bits of BSRR sets pins.
GPIOA->BSRR = (1 << 5);
delay(500000);
// Set PA5 low (LED off).
// Writing to the upper 16 bits of BSRR clears pins.
// Bit 21 = bit 5 + 16.
GPIOA->BSRR = (1 << 21);
delay(500000);
}
return 0;
}Every step here is a write to a memory-mapped register. Enabling the clock, configuring the pin mode, toggling the output. It’s all done by writing specific values to specific addresses.
Why This Matters
Understanding MMIO is foundational. When you use a HAL function like HAL_GPIO_WritePin(), it’s doing exactly this underneath. When you configure a timer, a DMA controller, or an ADC, you’re manipulating registers through MMIO.
Knowing what’s really happening lets you configure peripherals that your HAL doesn’t support, debug problems when abstractions behave unexpectedly, optimise critical code paths, and understand why details like volatile and register access order matter.
The address space is your interface to the hardware. Every peripheral you’ll ever configure, every sensor you’ll read, it all comes down to reading and writing specific addresses. Memory-mapped I/O isn’t just a technique. It’s the bridge between your code and the physical world.
As always, thanks for joining me on this journey into embedded development!
