Chapter 10
How can one chip become any circuit?

  1. The browser
  2. The network
  3. Software and languages
  4. The CPU
  5. Memory
  6. Arithmetic
  7. Logic gates
  8. The switch
Figure 10.1: Where we are in the climb from transistor to browser: still on the processor, the last look at hardware. This chapter asks how one chip can be configured into any circuit at all.

Every circuit so far was built by wiring fixed gates together: the adder of Chapter 3 is a particular arrangement of particular gates, soldered in place. A field-programmable gate array, an FPGA, turns that around. Its hardware is generic, a grid of identical blocks, and a circuit is not wired into it but loaded into it as data. Field-programmable means exactly what it says: the factory ships the same blank chip to everyone, and it is programmed in the field, after manufacture, by whoever loads it. The same chip can be an adder one minute and something else the next, because what it computes is a table you write, not a wiring you fix.

10.1 One block, any gate

The block at the heart of the array is a lookup table, a LUT. It does not know what its inputs mean; it simply uses them to index a small stored table of outputs. A two-input block holds four entries, one per input combination. Real LUTs are larger, usually four to six inputs and so sixteen to sixty-four entries; the two-input block here is the scaled-down version of the same idea. Load the four entries for NAND and the block is a NAND gate; load four different entries and the same block is an XOR gate, as Figure 10.2 shows.

configureconfigure
Figure 10.2: One configurable block. The lookup table indexes a loaded four-entry table by its inputs. The table 1110 makes it a NAND gate; load another table and it is another gate. The bits loaded into the chip are its bitstream.

Example 10-1. A configurable logic block

class LUT:
    """A two-input configurable logic block: a four-entry truth table indexed by
    its two input bits. Whatever table it holds is the function it computes."""

    def __init__(self, table: list) -> None:
        if len(table) != 4:1
            raise ValueError("a two-input LUT needs exactly four entries")
        self._table = table

    def __call__(self, a: int, b: int) -> int:
        return self._table[(a << 1) | b]2

10.2 The table is a memory

Look again at what the block does. It stores four one-bit values and hands back the one its inputs point at. Chapter 4 built exactly this device and called it a RAM: a bank of words behind a decoder, read through a mux by an address. The LUT is that machine, shrunk. Its four entries are four one-bit words; its two inputs, packed side by side, are a two-bit address; and reading the gate’s output is reading the word stored at that address, as Figure 10.3 shows. Nothing inside a LUT computes anything. The answer to every possible input was written down in advance, and the inputs only say where to look.

ram-lutram-lut
Figure 10.3: The block opened up. The stored table is a memory: each entry is a one-bit word, the two inputs are the address, and the output is the addressed bit, read out rather than computed. Here the XOR table is loaded and the inputs spell row 10.

This is not an analogy; the program makes it literal. It builds a working LUT out of the RAM of Chapter 4, unchanged: load the table into the memory one clocked write at a time, then read with the address the two inputs spell.

Example 10-2. A LUT built from the memory chapter’s RAM

def lut_from_ram(table: list):
    """A LUT built literally from Chapter 4's memory: load the four entries
    into a one-bit-wide RAM, then read with the address the two inputs spell.
    The stored table IS a small memory; nothing else is in the block."""
    ram = RAM(1)1
    for address, bit in enumerate(table):
        a1, a0 = address >> 1, address & 1
        ram.tick(a1, a0, [bit], HIGH, LOW)   # clock LOW, then the rising
        ram.tick(a1, a0, [bit], HIGH, HIGH)  # edge loads the addressed word

    def read(a: int, b: int) -> int:
        return ram.read(a, b)[0]2
    return read

Configuring a block, then, is nothing exotic. It is a memory write. That single observation carries the rest of the chapter.

10.3 Blocks, wires, and the bitstream

One block is one gate; a circuit is several, connected. The half adder of Chapter 3 was an XOR for the sum and an AND for the carry: on an array, two blocks loaded differently and fed the same inputs, as in Figure 10.4.

fabricfabric
Figure 10.4: A half adder as a configured fabric. Both inputs feed two blocks; the one loaded as XOR gives the sum, the one loaded as AND gives the carry. Same circuit as Chapter 3, but loaded, not wired.

But who decides that both blocks read the chip’s inputs, rather than each other? On a real array the wiring between blocks runs through a mesh of programmable switch points, and each one is the device this book opened with: Chapter 1’s transistor, its gate driven not by a passing signal but by a stored configuration bit. Write a 1 into that bit and the two wires it straddles connect; write a 0 and they do not. So the connections are memory too, and the bitstream, the configuration loaded into the chip, sets both things at once: what each block’s table holds, and which wires reach which inputs. That configurable interconnect is most of what lets one chip become any circuit. The netlist of Chapter 6 is exactly what a real toolchain compiles into this bitstream: the description-as-data of that chapter becomes configuration-as-data here.

The model grows to match. A fabric is a row of blocks plus wiring entries, and one flat bitstream loads all of it: four table entries per block, then, for each block input, one entry naming its source, a chip input or an earlier block’s output. A block may only read blocks before it, so a loaded circuit can never chase its own tail.

Example 10-3. A fabric: blocks plus loadable wiring

class Fabric:
    """A row of two-input blocks plus configurable wiring, both loaded from one
    bitstream: four table entries per block, then one wiring entry per block
    input naming its source. Sources 0..inputs-1 are the chip inputs; inputs+k
    is block k's output. A block may only read earlier blocks, so every loaded
    circuit is feed-forward and always settles."""

    def __init__(self, blocks: int, inputs: int) -> None:
        self.blocks = blocks
        self.inputs = inputs
        self._luts: list = []
        self._wiring: list = []

    def load(self, bits: list) -> None:
        """Configure every table and every wire from one flat bitstream."""
        need = self.blocks * 4 + self.blocks * 2
        if len(bits) != need:
            raise ValueError(f"this fabric loads exactly {need} bits, got {len(bits)}")
        tables, wiring = bits[:self.blocks * 4], bits[self.blocks * 4:]1
        for k in range(self.blocks):
            for source in wiring[k * 2:k * 2 + 2]:
                if not 0 <= source < self.inputs + self.blocks:
                    raise ValueError(f"wiring source {source} does not exist")
                if source >= self.inputs + k:
                    raise ValueError(f"block {k} may only read earlier blocks")
        self._luts = [LUT(tables[k * 4:k * 4 + 4]) for k in range(self.blocks)]
        self._wiring = [tuple(wiring[k * 2:k * 2 + 2]) for k in range(self.blocks)]

    def __call__(self, *chip: int) -> tuple:
        """Run the loaded circuit: blocks settle in order, each reading the
        chip inputs or earlier outputs, exactly as the wiring bits say."""
        if not self._luts:
            raise ValueError("load a bitstream before running the fabric")
        signals = list(chip)
        for lut, (s1, s2) in zip(self._luts, self._wiring):
            signals.append(lut(signals[s1], signals[s2]))2
        return tuple(signals[self.inputs:])

10.4 One fabric, two circuits

Now the payoff. The fabric never changes; only the bits loaded into it do. Bitstream A loads XOR and AND and wires both blocks to the chip inputs: the half adder again. Bitstream B loads NAND into both blocks and rewires block 1 to read block 0 twice. Feeding a NAND both copies of one signal makes it a NOT, and NOT of NAND is AND. That is Chapter 1’s closing claim, that NAND alone is enough, loaded into hardware rather than argued, and Figure 10.5 shows both configurations side by side.

reloadreload
Figure 10.5: The same fabric loaded twice. Bitstream A wires both blocks to the chip inputs and loads XOR and AND: the half adder. Bitstream B loads NAND twice and rewires block 1 to read block 0, so NOT of NAND makes AND. Nothing physical changed between the two panels.

The interactive version needs the in-browser runtime.

Example 10-4. Two bitstreams for one fabric

# Both blocks read the chip inputs a and b: the Chapter 3 half adder, loaded.
HALF_ADDER = bitstream(XOR, AND) + [0, 1, 0, 1]
# Both blocks hold NAND, and block 1 reads block 0 twice: NOT of NAND is AND,
# Chapter 1's "NAND alone is enough", loaded into the wiring rather than argued.
AND_FROM_NANDS = bitstream(NAND, NAND) + [0, 1, 2, 2]

10.5 Why load a circuit at all?

Two honest notes about real arrays. First, a real logic block pairs its LUT with the other machine from Chapter 4: a flip-flop on the block’s output, so a fabric can hold state between clock ticks and be a counter or a CPU, not just a formula. This model stops at combinational fabrics; the pairing is the same discipline Chapter 4 already built. Second, configurability is bought with speed. A signal crossing a mesh of switch points travels further and slower than one crossing a wire soldered in place, so an FPGA running a circuit is typically several times slower than the same circuit cast directly in silicon.

That price buys a spectrum. Fixed silicon is the fastest and the most frozen: change one gate and you manufacture a new chip. The processor of Chapter 5 sits at the other end: one fixed circuit that is universal because it runs programs, paying for that universality one fetch-decode-execute cycle at a time. The FPGA sits between them, hardware speed with software’s ability to be reloaded, and that middle seat is its job. Chip designers prototype on it: the processor of Chapter 5 could be loaded into a fabric like this one, tested against real inputs for months, and recast in silicon only once it stops changing. Small production runs stay on it forever, because a reloadable chip beats a factory run you cannot afford.

A circuit loaded as data is the stored-program idea of Chapter 5 arriving at the hardware itself. That is the last of the hardware; the next chapters turn to the software that runs on it.

Exercises

Previous
✎ Suggest an edit