Chapter 10
How can one chip become any circuit?
- The browser
- The network
- Software and languages
- The CPU
- Memory
- Arithmetic
- Logic gates
- The switch
Every circuit so far was built by wiring fixed gates together: the adder of Chapter 3 is a particular arrangement of particular gates, soldered in place. A field-programmable gate array, an FPGA, turns that around. Its hardware is generic, a grid of identical blocks, and a circuit is not wired into it but loaded into it as data. Field-programmable means exactly what it says: the factory ships the same blank chip to everyone, and it is programmed in the field, after manufacture, by whoever loads it. The same chip can be an adder one minute and something else the next, because what it computes is a table you write, not a wiring you fix.
10.1 One block, any gate
The block at the heart of the array is a lookup table, a LUT. It does not know what its inputs mean; it simply uses them to index a small stored table of outputs. A two-input block holds four entries, one per input combination. Real LUTs are larger, usually four to six inputs and so sixteen to sixty-four entries; the two-input block here is the scaled-down version of the same idea. Load the four entries for NAND and the block is a NAND gate; load four different entries and the same block is an XOR gate, as Figure 10.2 shows.
Example 10-1. A configurable logic block
class LUT: """A two-input configurable logic block: a four-entry truth table indexed by its two input bits. Whatever table it holds is the function it computes.""" def __init__(self, table: list) -> None: if len(table) != 4:1 raise ValueError("a two-input LUT needs exactly four entries") self._table = table def __call__(self, a: int, b: int) -> int: return self._table[(a << 1) | b]2
- a two-input block holds exactly four entries, one per input pair
- the output is just the table looked up at the index the two inputs spell, so the loaded table is the whole behaviour
10.2 The table is a memory
Look again at what the block does. It stores four one-bit values and hands back the one its inputs point at. Chapter 4 built exactly this device and called it a RAM: a bank of words behind a decoder, read through a mux by an address. The LUT is that machine, shrunk. Its four entries are four one-bit words; its two inputs, packed side by side, are a two-bit address; and reading the gate’s output is reading the word stored at that address, as Figure 10.3 shows. Nothing inside a LUT computes anything. The answer to every possible input was written down in advance, and the inputs only say where to look.
This is not an analogy; the program makes it literal. It builds a working LUT out of the RAM of Chapter 4, unchanged: load the table into the memory one clocked write at a time, then read with the address the two inputs spell.
Example 10-2. A LUT built from the memory chapter’s RAM
def lut_from_ram(table: list): """A LUT built literally from Chapter 4's memory: load the four entries into a one-bit-wide RAM, then read with the address the two inputs spell. The stored table IS a small memory; nothing else is in the block.""" ram = RAM(1)1 for address, bit in enumerate(table): a1, a0 = address >> 1, address & 1 ram.tick(a1, a0, [bit], HIGH, LOW) # clock LOW, then the rising ram.tick(a1, a0, [bit], HIGH, HIGH) # edge loads the addressed word def read(a: int, b: int) -> int: return ram.read(a, b)[0]2 return read
- the block’s storage is Chapter 4’s RAM, one bit wide
- reading the gate is reading the word at the address the inputs spell, the same index the LUT class uses
Configuring a block, then, is nothing exotic. It is a memory write. That single observation carries the rest of the chapter.
10.3 Blocks, wires, and the bitstream
One block is one gate; a circuit is several, connected. The half adder of Chapter 3 was an XOR for the sum and an AND for the carry: on an array, two blocks loaded differently and fed the same inputs, as in Figure 10.4.
But who decides that both blocks read the chip’s inputs, rather than each other? On a real array the wiring between blocks runs through a mesh of programmable switch points, and each one is the device this book opened with: Chapter 1’s transistor, its gate driven not by a passing signal but by a stored configuration bit. Write a 1 into that bit and the two wires it straddles connect; write a 0 and they do not. So the connections are memory too, and the bitstream, the configuration loaded into the chip, sets both things at once: what each block’s table holds, and which wires reach which inputs. That configurable interconnect is most of what lets one chip become any circuit. The netlist of Chapter 6 is exactly what a real toolchain compiles into this bitstream: the description-as-data of that chapter becomes configuration-as-data here.
The model grows to match. A fabric is a row of blocks plus wiring entries, and one flat bitstream loads all of it: four table entries per block, then, for each block input, one entry naming its source, a chip input or an earlier block’s output. A block may only read blocks before it, so a loaded circuit can never chase its own tail.
Example 10-3. A fabric: blocks plus loadable wiring
class Fabric: """A row of two-input blocks plus configurable wiring, both loaded from one bitstream: four table entries per block, then one wiring entry per block input naming its source. Sources 0..inputs-1 are the chip inputs; inputs+k is block k's output. A block may only read earlier blocks, so every loaded circuit is feed-forward and always settles.""" def __init__(self, blocks: int, inputs: int) -> None: self.blocks = blocks self.inputs = inputs self._luts: list = [] self._wiring: list = [] def load(self, bits: list) -> None: """Configure every table and every wire from one flat bitstream.""" need = self.blocks * 4 + self.blocks * 2 if len(bits) != need: raise ValueError(f"this fabric loads exactly {need} bits, got {len(bits)}") tables, wiring = bits[:self.blocks * 4], bits[self.blocks * 4:]1 for k in range(self.blocks): for source in wiring[k * 2:k * 2 + 2]: if not 0 <= source < self.inputs + self.blocks: raise ValueError(f"wiring source {source} does not exist") if source >= self.inputs + k: raise ValueError(f"block {k} may only read earlier blocks") self._luts = [LUT(tables[k * 4:k * 4 + 4]) for k in range(self.blocks)] self._wiring = [tuple(wiring[k * 2:k * 2 + 2]) for k in range(self.blocks)] def __call__(self, *chip: int) -> tuple: """Run the loaded circuit: blocks settle in order, each reading the chip inputs or earlier outputs, exactly as the wiring bits say.""" if not self._luts: raise ValueError("load a bitstream before running the fabric") signals = list(chip) for lut, (s1, s2) in zip(self._luts, self._wiring): signals.append(lut(signals[s1], signals[s2]))2 return tuple(signals[self.inputs:])
- one flat bitstream splits into the tables and the wiring
- each block reads exactly the sources its wiring entries name, chip inputs or earlier outputs
10.4 One fabric, two circuits
Now the payoff. The fabric never changes; only the bits loaded into it do. Bitstream A loads XOR and AND and wires both blocks to the chip inputs: the half adder again. Bitstream B loads NAND into both blocks and rewires block 1 to read block 0 twice. Feeding a NAND both copies of one signal makes it a NOT, and NOT of NAND is AND. That is Chapter 1’s closing claim, that NAND alone is enough, loaded into hardware rather than argued, and Figure 10.5 shows both configurations side by side.
Example 10-4. Two bitstreams for one fabric
# Both blocks read the chip inputs a and b: the Chapter 3 half adder, loaded. HALF_ADDER = bitstream(XOR, AND) + [0, 1, 0, 1]
# Both blocks hold NAND, and block 1 reads block 0 twice: NOT of NAND is AND, # Chapter 1's "NAND alone is enough", loaded into the wiring rather than argued. AND_FROM_NANDS = bitstream(NAND, NAND) + [0, 1, 2, 2]
10.5 Why load a circuit at all?
Two honest notes about real arrays. First, a real logic block pairs its LUT with the other machine from Chapter 4: a flip-flop on the block’s output, so a fabric can hold state between clock ticks and be a counter or a CPU, not just a formula. This model stops at combinational fabrics; the pairing is the same discipline Chapter 4 already built. Second, configurability is bought with speed. A signal crossing a mesh of switch points travels further and slower than one crossing a wire soldered in place, so an FPGA running a circuit is typically several times slower than the same circuit cast directly in silicon.
That price buys a spectrum. Fixed silicon is the fastest and the most frozen: change one gate and you manufacture a new chip. The processor of Chapter 5 sits at the other end: one fixed circuit that is universal because it runs programs, paying for that universality one fetch-decode-execute cycle at a time. The FPGA sits between them, hardware speed with software’s ability to be reloaded, and that middle seat is its job. Chip designers prototype on it: the processor of Chapter 5 could be loaded into a fabric like this one, tested against real inputs for months, and recast in silicon only once it stops changing. Small production runs stay on it forever, because a reloadable chip beats a factory run you cannot afford.
A circuit loaded as data is the stored-program idea of Chapter 5 arriving at the hardware itself. That is the last of the hardware; the next chapters turn to the software that runs on it.
Exercises
- Find the table that turns one block into XNOR, and check it against the XNOR gate.
- Build a full adder as a fabric of configured blocks. How many blocks does it take?