Understanding the ‘volatile’ keyword
If you’ve spent any time in embedded development in C, you’ve likely encountered the volatile keyword. And if you’re like me, you’ve been bitten by infuriating bugs when you forgot to use it (like why does my code break when I turn optimisation on?!).
Despite being one of the most important qualifiers in embedded C, it’s frequently misunderstood. This post will explain what volatile actually does, why it matters in embedded contexts, and just as importantly what it doesn’t do.
The Problem: Compilers Are Too Clever but They Don’t See Everything
Modern C compilers are extraordinarily sophisticated optimisers. They analyse your code, track how variables are used, and make transformations to generate faster, smaller machine code. Most of the time, this is exactly what you want.
Consider this innocent-looking code:
int ready = 0;
void wait_for_ready(void) {
while (ready == 0) {
// Wait...
}
}A clever compiler might reason: The variable ready is never modified inside this loop. Therefore, if it’s zero when we enter, it will always be zero. I can optimise this to:”
void wait_for_ready(void) {
if (ready == 0) {
while (1) { } // Infinite loop
}
}From the compiler’s perspective within the abstract machine model of C, this transformation is perfectly valid. It doesn’t know, and has no way of knowing, that an interrupt handler running asynchronously might set ready to 1. So while compilers are very sophisticated, they are not omnipotent.
This is the fundamental problem volatile solves.
What volatile Actually Means
And this is where the ‘volatile’ qualifier comes in. It tells the compiler: “This variable’s value may change at any time, through means you cannot see or analyse. Do not make any assumptions about its value. Every time the code reads this variable, generate an actual load instruction. Every time the code writes to it, generate an actual store instruction.”
Let’s fix our earlier example:
volatile int ready = 0;
void wait_for_ready(void) {
while (ready == 0) {
// This actually works now
}
}Now the compiler must generate code that genuinely reads from the memory location of ready on every iteration of the loop. If an interrupt handler sets it to 1, our loop will see the change and exit.
The Three Core Use Cases in Embedded Systems
1. Memory-Mapped Hardware Registers
In embedded systems, hardware peripherals are typically controlled through memory-mapped registers. Reading or writing to a specific memory address communicates with the hardware, not with regular RAM.
// UART status register at address 0x40001000
#define UART_STATUS (*(volatile uint32_t *)0x40001000)
// UART data register at address 0x40001004
#define UART_DATA (*(volatile uint32_t *)0x40001004)
void uart_send(char c) {
// Wait until transmit buffer is empty
while ((UART_STATUS & TX_READY) == 0) {
// The hardware updates UART_STATUS independently
}
UART_DATA = c;
}Without volatile, the compiler might:
- Read
UART_STATUSonce and cache the value, never seeing when the hardware clears the busy flag - Decide that writing to
UART_DATAhas no observable effect (since it’s never read) and eliminate the write entirely - Reorder the status check and data write, corrupting the transmission
Hardware registers have another important property: reading them often has side effects. Reading a UART receive register might clear an interrupt flag or advance a FIFO pointer. The compiler cannot be allowed to eliminate or combine these reads.
// Reading this register clears pending interrupts
volatile uint32_t *interrupt_clear = (uint32_t *)0x40002000;
void clear_pending_interrupts(void) {
(void)*interrupt_clear; // The read itself is the operation
}Without volatile, the compiler would remove this “useless” read.
2. Variables Modified by Interrupt Service Routines
This is perhaps the most common source of volatile-related bugs. Variables shared between your main code and interrupt handlers must be marked volatile:
volatile uint32_t system_ticks = 0;
// Called by timer interrupt
void SysTick_Handler(void) {
system_ticks++;
}
void delay_ms(uint32_t ms) {
uint32_t start = system_ticks;
while ((system_ticks - start) < ms) {
// Busy wait
}
}The system_ticks variable is modified by code that the compiler cannot see when compiling delay_ms(). The interrupt can fire at any point, completely asynchronously. Without volatile, the compiler has every right to assume system_ticks never changes during the loop.
3. DMA Buffers and Multi-Core Shared Memory
Direct Memory Access (DMA) controllers transfer data between peripherals and memory without CPU involvement. The CPU sets up the transfer and then continues executing other code while the DMA engine works in the background.
volatile uint8_t rx_buffer[256];
volatile int rx_complete = 0;
void start_dma_receive(void) {
// Configure DMA to write to rx_buffer
// When complete, DMA sets rx_complete = 1
DMA1->CMAR = (uint32_t)rx_buffer;
DMA1->CCR |= DMA_CCR_EN;
}
void process_received_data(void) {
while (!rx_complete) { }
// Process data in rx_buffer
for (int i = 0; i < 256; i++) {
handle_byte(rx_buffer[i]);
}
}The CPU didn’t write to rx_buffer, so the compiler might assume it contains whatever was there before (possibly uninitialised garbage). The volatile qualifier forces it to actually read the DMA-filled contents.
A Deeper Look at the Optimisations volatile Prevents
Let’s examine the specific optimisations that volatile inhibits:
Register Caching
Normally, compilers keep frequently-accessed variables in CPU registers for speed:
int counter = 0;
void count_events(void) {
while (check_sensor()) {
counter++;
}
}The compiler might load counter into a register at the start, increment the register in the loop, and write the final value back only when exiting. With volatile counter, every increment involves a load from memory, an increment, and a store back to memory.
Dead Store Elimination
If a variable is written multiple times before being read, the compiler may keep only the final write:
int status = 0;
status = 1; // Compiler might remove this
status = 2; // Keep only thisFor hardware registers, each write might trigger a different hardware behaviour. volatile ensures all writes occur.
Dead Read Elimination
Similarly, if a value is read but never used, the compiler may remove the read:
int dummy = *status_register; // Compiler might remove thisReading hardware registers often clears flags or advances state machines. volatile prevents this elimination.
Loop-Invariant Code Motion
Compilers can move computations that don’t change during a loop to outside the loop:
while (running) {
if (*config_flag) { // Might be hoisted outside the loop
do_something();
}
}With volatile, the flag must be checked on every iteration.
What volatile Does NOT Provide
This is where many developers get tripped up. The volatile keyword is more limited than people often assume.
No Atomicity
volatile does not make operations atomic. Consider:
volatile int counter = 0;
// In main code
counter++;
// In interrupt handler
counter++;The counter++ operation typically compiles to three instructions: load, increment, store. If an interrupt fires between the load and store in the main code, you’ll lose an increment.
For atomicity, you need:
// Option 1: Disable interrupts
disable_interrupts();
counter++;
enable_interrupts();
// Option 2: Use atomic types (C11)
#include <stdatomic.h>
atomic_int counter = 0;
atomic_fetch_add(&counter, 1);
// Option 3: Use platform-specific atomic intrinsics
__sync_fetch_and_add(&counter, 1);No Memory Barriers
On modern processors with out-of-order execution and caching hierarchies, volatile doesn’t guarantee other CPUs or cores see your writes when you expect:
volatile int data_ready = 0;
int data[100]; // Non-volatile
void producer(void) {
// Fill the data array
for (int i = 0; i < 100; i++) {
data[i] = compute(i);
}
data_ready = 1; // Signal completion
}
void consumer(void) {
while (!data_ready) { }
// BUG: data[] writes might not be visible yet!
process(data);
}Even though data_ready is volatile, the writes to data[] might still be sitting in a write buffer or cache, invisible to the consumer. You need explicit memory barriers:
void producer(void) {
for (int i = 0; i < 100; i++) {
data[i] = compute(i);
}
__sync_synchronize(); // Memory barrier
data_ready = 1;
}On single-core microcontrollers without write buffers (like many ARM Cortex-M devices), this is less of a concern. But on multi-core systems or processors with complex memory systems, memory barriers are essential.
No Prevention of Compiler Reordering Around Non-Volatile Accesses
The compiler can still reorder non-volatile operations with respect to each other, even if they’re adjacent to volatile accesses:
volatile int flag = 0;
int a = 0, b = 0;
void setup(void) {
a = 1;
b = 2;
flag = 1; // Compiler must put this access here...
// ...but a=1 and b=2 can still be reordered
}If you need ordering guarantees for the non-volatile variables, you need compiler barriers:
void setup(void) {
a = 1;
b = 2;
__asm__ __volatile__ ("" ::: "memory"); // Compiler barrier
flag = 1;
}Best Practices
1. Always Use volatile for Hardware Registers
Create a clear convention in your codebase:
// Hardware abstraction layer
typedef volatile uint32_t reg32_t;
typedef struct {
reg32_t CR; // Control register
reg32_t SR; // Status register
reg32_t DR; // Data register
} UART_TypeDef;
#define UART1 ((UART_TypeDef *)0x40011000)2. Mark All ISR-Shared Variables as volatile
Even if you think you’ve analysed the code and it’s safe, use volatile. Future optimisation levels or compiler versions might break your assumptions:
// ISR communication variables
static volatile bool tx_complete = false;
static volatile uint32_t error_count = 0;
static volatile uint8_t rx_buffer[64];3. Don’t Overuse volatile
Every volatile access has a performance cost. Don’t mark variables as volatile unless they genuinely can change unexpectedly:
// BAD: Unnecessary volatile
volatile int loop_counter; // Only modified by this function
for (loop_counter = 0; loop_counter < 100; loop_counter++) {
// Performance penalty on every access
}
// GOOD: Copy volatile to local for intensive computation
volatile uint32_t raw_adc_value;
void process_adc(void) {
uint32_t local_value = raw_adc_value; // Single volatile read
// Perform many operations on local_value
// ...
}4. Remember: volatile is Necessary but Not Sufficient
For correct concurrent code, you typically need volatile plus one or more of:
- Critical sections (disabled interrupts)
- Atomic operations
- Memory barriers
- Proper mutex/semaphore primitives
The volatile keyword is a fundamental tool in embedded C programming. It prevents the compiler from optimising away critical accesses to hardware registers, ISR-shared variables, and DMA buffers. However, it’s important to understand its limitations, it provides neither atomicity nor memory ordering guarantees. When working with hardware or interrupt handlers, ask yourself: “Can this variable change in ways the compiler can’t see?” If the answer is yes, you need volatile. Then ask: “Do I also need atomicity or memory barriers?” The answer to that question depends on your specific hardware and use case.
Understanding these nuances is what separates embedded code that “seems to work” from code that’s genuinely correct. Code that won’t mysteriously break when you change compiler versions, enable optimisation, or port to a new platform.
As always, thanks for joining me on this journey into embedded development!
