GCC for asm Experts (and C/C++ Intermediates) - Part 4
[ Atariscne.org - News ] GCC for asm Experts (and C/C++ Intermediates) - Part 4
Register Allocation and the Cost Model
The m68k has 15 usable registers: d0-d7 and a0-a6 (a7 is the stack pointer, and sometimes a6 is also reserved as a frame pointer). For a CPU from 1979, this is actually quite generous; the 6502 has three, the Z80 and 8086 have seven or eight. So 15 should be plenty, right?
Not so fast. We have two register files with different capabilities. A pointer must be in an address register to dereference it. Most arithmetic must happen in data registers. For many use cases, we effectively have 7 or 8 registers to choose from, not 15. And if our function calls others, the ABI reserves the caller-clobbered registers, effectively leaving only ten registers: d3-d7/a2-a6 for free use with fastcall. When we run out, the allocator spills to the stack: on 68000, each 16-bit spill costs at least 16 cycles (8 to read, 8 to write back), and 32-bit values adds another 8. In a tight loop, it adds up fast.
When you write assembly, register allocation is intuition. You look at a routine and think: I need this value for the next three instructions, then I can reuse the register for something else. You juggle lifetimes in your head, naturally overlapping short-lived values. Teaching a computer this intuition is the fundamental challenge. The answer is to build a graph of which pseudo-registers are live simultaneously; two pseudos that overlap in lifetime interfere and cannot share a register. Then color the graph with K colors, one per available register, starting with the cheap caller-clobbered regs. When 19 pseudos from put_pixel compete for 15 registers, some will not fit — especially once you account for the data/address split — and those get spilled.
So how does GCC make the best of a difficult situation?