How to write a layout engine

Imagine that you need to create a complex UI in the Canvas API. Telling each rectangle where to go is not an option anymore. What do we do?

Dec 30, 2022

When developing web apps, we usually don't need to spend time thinking how to position our elements on the screen. We have a powerful layout engine doing the job for us. We just specify flex properties, margins and so on and it's done.

But how does it work underneath? And what would it take to create a layout engine from scratch, for use cases where we don't have one? Like in Canvas API, for positioning drawn shapes? Or in WebGL, where we are left completely on our own with positioning triangle vertices?

In this episode: the layout engine, or how to avoid going crazy from telling each individual rectangle where to sit.

What am I actually solving here

Many drawing APIs, for example, web Canvas API, allow us to give commands for drawing. ctx.fillRect(x, y, width, height). Something like this. With WebGL, it would involve much more code, but would eventually come down to also drawing two triangles on the screen that resemble a rectangle.

The problem is that we cannot keep going with calculating the x and y coordinates when our UI gets bigger and more complex.

This is where we need a layout engine.

A good example of a layout engine is, as mentioned above, what the browser is doing with HTML. We provide content and it manages to present it in a way that things end up not overlapping. Even more, we have very fine-grained control over it – using CSS, flexbox and so on.

The concept

Layout engines can have various levels of complexity. CSS is example of one of the more complex ones, with multiple modes of operation (regular, float, flex, grid) that can be used interchangeably.

I like reinventing the wheel, but when I do it, I try to not pick the hardest way.

Inspiration came from Figma. It features very powerful, yet concise, auto layout capabilities which make designing UIs much easier. And, what was probably the biggest reason, there is only way to do each thing (well, almost; but still, complexity reduction compared to CSS is massive).

Figma UI

Parameters

Each frame (a rectangle) can be specified using either fixed coordinates (x, y, width, height) or it can use auto layout – a mode in which dimensions are calculated automatically.

For the auto layout, there are the following properties available:

  • Direction – either ROW or a COLUMN.
  • Vertical and horizontal resizing:
    • FIXED – specified by setting width or height.
    • HUG_CONTENT – means that it will take size that is just enough to display all the children (aligned in a row or a column).
    • FILL_CONTAINER – a frame will take as much space as is available in the parent.
  • Spacing mode – either packed or space between (as in CSS flexbox).
  • Alignment – [TOP, CENTER, BOTTOM] × [LEFT, CENTER, RIGHT], for example TOP_LEFT, CENTER, BOTTOM_RIGHT etc. Similar effect to a combination of justify-content and align-items, but gracefully handled in one prop.
  • Padding: how much space is added before looking for a place to put children.
  • Gap: how much space should be left between children aligned in a column/row.

This pretty much covers everything that we might need, and, if you look one more time at the screenshot above, it covers pretty much everything related to layout in Figma.

Implementation

To solve a layout problem, we first need a way for users to specify any components.

I went with API design in the style of immediate mode GUIs. Usually referred to this video by Casey Muratori which introduces the concept.

The basic idea is that rendering is fast and we can just prepare what we need to render in every frame and then send to GPU.

The most important concept here is that we don't, unlike in usual UI APIs, declare a button, give it a function pointer that is called when the user clicks.

Here, the button is a function, like this:

if (doButton("Click me")) {
  print("Looks like I was clicked!");
}

It is declared as part of the runtime execution in the loop, not managed externally. It can be interpreted as if it's a whole React component tree recreated every frame.

Talking about tree, to achieve any kind of tree, I need a way to declare when a frame starts and when it ends.

Similarly, some way to render text is necessary.

All together:

const layout: Layout = Layout.init();

const background = Frame{ ... };
const avatar = Frame{ ... };

while (true) { // The rendering loop.
  layout.frame(background);

  layout.frame(avatar);
  layout.end();

  layout.text("John Doe");

  if (layout.button("Follow")) {
    // Mark user as followed.
  }

  layout.end();

  // Send everything to GPU and get ready to draw from scratch next time.
  layout.draw();
}

This gives us an API to specify components, provide them with configs (direction, spacing...) and leaves all the spacing to the layout algorithm. Perfect.

What happens here is we create a tree of frames, with layout.frame() marking a new child, all subsequent layout.frame() calls becoming siblings, and finally, layout.end() returning to the parent.

This way a tree of nodes is generated.

Calculating the layout

From the previously described steps we get a tree of nodes which define what we will see on the screen. Now we need to figure out how to place those elements.

The two most tricky issues will be: how to resolve HUG_CONTENT and FILL_CONTAINER. For one we need to know all possible children, while for the other we must know how all parent containers look like.

I thought about this for a long time, to eventually come up with a solution that is literally described in the sentence above. We need to know all parents, so what do we do? We traverse the tree top-down, level order. At each step we will know all parents of a given node. Next, we need to know all the children. What do we do? We go bottom-up, also level order. We can base each frame on results from children, no issue.

Pseudocode

  1. Traverse tree top-down, generate the tree of quads which we will fill with absolute coordinates.

  2. Traverse tree bottom-up, solving HUG_CONTENT. For each node in the tree:

    • If the node is set to HUG_CONTENT horizontally:

      • For each of its children:

        • If direction of the node (not the child) is row:
          • increase width of the node by width of the child.
        • Else:
          • Set width of the node to maximum(current_width, child_width)
      • Then increase the width by the sum of left and right padding of the node and if direction was row, the value of the gap multiplied by number of children - 1.

    • If node is set to vertically HUG_CONTENT, do the same, but replacing row with column and width with height. Don't forget about padding and gap.

  3. Finally, the third and last pass which will resolve FILL_CONTAINER and all positions. Again, for each node in the tree:

    • First calculate available width and height for expanding. For each child of the node:
      • If node has direction ROW and is child's horizontal resizing is NOT set to FILL_CONTAINER, decrease available_width by the width of the child.
      • Similar for column, vertical and height.
      • If node has direction row and horizontal resizing set to FILL_CONTAINER, then increase counter of nodes sharing width by one.
      • Similar for column, vertical and height.
    • To avoid division by zero, sharing_width = maximum(sharing_width, 1) (and same for height).
    • Decrease available_width and height by padding and gaps (multiplied again by the number of children - 1).
    • For each child:
      • If the child has horizontal resizing mode set to FILL_CONTAINER, set its height to available_width / sharing_width.
      • Same for vertical resizing and height.
    • If there was at least one child filling width, set available_width to 0 as there is no more remaining. Same for height.
    • Calculate alignment:
      • Create a variable x, for child's X coordinate. Depending on which of the 9 combinations of TOP_LEFT etc. we are dealing with, we will anyway increase the x in each case by adding the left padding of parent. For TOP_CENTER, CENTER, BOTTOM_CENTER we add also available_width / 2, and for TOP_RIGHT, CENTER_RIGHT and BOTTOM_RIGHT we add full available_width.
      • Similarly we need a variable y, with respectively changed additions.
    • Time to solve spacing mode.
      • If spacing mode is PACKED, for each child:
        • If direction is ROW, set the value of its x to previously calculated x coordinate. Increase that x by width of the child and the value of gap of the node.
        • If direction wasn't ROW, increase child's x by alignment x, and if the parent node is right aligned, subtract the width of the child from its x.
        • Exactly opposite for COLUMN and the else branch.
      • If spacing mode is SPACE_BETWEEN, there will be a horizontal_gap = available_width / (children_count - 1) and similarly vertical_gap. For each child of the node:
        • Set x value of the child to alignment x. Same for y.
        • If direction of the parent is ROW, increase the alignment x by width of this child and the horizontal gap.
        • Similar for COLUMN.
    • Solving vertical and horizontal centering – again for each child:
      • If node's direction is ROW and is vertically centered (alignment is CENTER_LEFT, CENTER or CENTER_RIGHT), then decrease the y value of the node by half of its height.
      • Analogical situation for COLUMN and width.

This is terribly complicated, but I don't see any other way around this. It's just a lot of calculations, a lot of parameters to take into account. I am also sorry but I won't provide a fully working code example this time – it relies on so many other things that the layout algorithm becomes a small part.

But, the result is definitely quite impressive. Using this, we can render achieve probably any layout that we could implement in CSS, and definitely we can implement any layout designed in Figma.

Example

Time for example of this algorithm in action. It is based on what I described in the previous article – Using Zig for writing OpenGL in browsers. There are some pieces of the story I will leave out for now – mainly how to prepare the rendering pipeline, draw all the rectangles and how to render text.

For now, proof that the algorithm works:

(this will render in a way that makes sense on a screen at least 600px wide, probably will be crazy overlapping on mobile).

This is how I defined frames. You can see how this approach, used with reasonable default values in structs, leads to quite acceptable amount of code. Of course, it is almost 50 LOC, but this is not something that would be much shorter in CSS.

const container = Frame{
    .width = width,
    .height = height,
    .direction = Direction.ROW,
    .spacing = Spacing.SPACE_BETWEEN,
    .background = orange,
};

const left = Frame{
    .direction = Direction.COLUMN,
    .width = 300,
    .vertical_resizing = Resizing.FILL_CONTAINER,
    .background = pink,
};

const right = Frame{
    .direction = Direction.COLUMN,
    .gap = 10,
    .padding = Padding.all(20),
    .horizontal_resizing = Resizing.HUG_CONTENT,
    .vertical_resizing = Resizing.FILL_CONTAINER,
    .background = pink,
};

const left_top = Frame{
    .direction = Direction.COLUMN,
    .horizontal_resizing = Resizing.HUG_CONTENT,
    .vertical_resizing = Resizing.HUG_CONTENT,
    .padding = Padding.all(20),
    .gap = 20,
    .background = blue,
};

const left_center = Frame{
    .direction = Direction.COLUMN,
    .horizontal_resizing = Resizing.FILL_CONTAINER,
    .vertical_resizing = Resizing.FILL_CONTAINER,
    .padding = Padding.all(20),
    .gap = 10,
    .alignment = Alignment.BOTTOM_RIGHT,
    .background = green,
};

const left_bottom = Frame{
    .direction = Direction.COLUMN,
    .width = 130,
    .height = 60,
    .background = yellow,
};

And the types:

pub const Spacing = enum {
    NONE,
    SPACE_BETWEEN,
};

pub const Resizing = enum {
    FIXED,
    HUG_CONTENT,
    FILL_CONTAINER,
};

pub const Direction = enum {
    NONE,
    ROW,
    COLUMN,
};

pub const Alignment = enum {
    NONE,

    TOP_LEFT,
    TOP_CENTER,
    TOP_RIGHT,

    CENTER_LEFT,
    CENTER,
    CENTER_RIGHT,

    BOTTOM_LEFT,
    BOTTOM_CENTER,
    BOTTOM_RIGHT,
};

pub const Padding = struct {
    left: f32 = 0,
    right: f32 = 0,
    top: f32 = 0,
    bottom: f32 = 0,

    pub fn all(i: f32) Padding {
        return Padding{ .left = i, .right = i, .top = i, .bottom = i };
    }
};

pub const Frame = struct {
    /// Fixed dimensions
    x: f32 = 0,
    y: f32 = 0,
    width: f32 = 0,
    height: f32 = 0,

    // Auto layout
    direction: Direction = Direction.NONE,
    padding: Padding = Padding{},
    horizontal_resizing: Resizing = Resizing.FIXED,
    vertical_resizing: Resizing = Resizing.FIXED,
    spacing: Spacing = Spacing.NONE,
    alignment: Alignment = Alignment.NONE,
    gap: f32 = 0,

    // Styling
    background: Vec4 = colors.TRANSPARENT,
};

This is how I call the rendering tree. Scopes created by the use of { and } are artificial. They only help by adding extra indentation but don't have any semantic meaning.

try layout.frame(container, "container");
{
    try layout.frame(left, "left");
    {
        try layout.frame(left_top, "left_top");
        try layout.text("This is HUG_CONTENT.", colors.BLACK, 12);
        try layout.text("Another line.", colors.BLACK, 12);
        layout.end();

        try layout.frame(left_center, "left_center");
        try layout.text("FILL_CONTAINER,", colors.BLACK, 12);
        try layout.text("Aligned to BOTTOM_RIGHT.", colors.BLACK, 12);
        layout.end();

        try layout.frame(left_bottom, "left_bottom");
        try layout.text("Fixed size.", colors.BLACK, 12);
        layout.end();
    }
    layout.end();

    try layout.frame(right, "right");
    {
        try layout.text("Those two pink", colors.BLACK, 12);
        try layout.text("ones are set apart", colors.BLACK, 12);
        try layout.text("with SPACE_BETWEEN.", colors.BLACK, 12);
    }
    layout.end();
}
layout.end();

Future work

This covers making a layout engine that allows us to express basically anything we might end up creating in Figma designs, which roughly translates to anything we might want to display as part of the user interface.

However, there is one problem here that might be tricky to spot at first – writing actual code is a bit different than just declaring layout, and sometimes we might find ourselves limited by the code structure.

Let's take a select component as an example. We need to render dropdown that will show up precisely below the select button, and will show up over next elements of the UI. With current code, we would have to both move the dropdown rendering to the bottom of the parent container, and calculate x, y coordinates manually. Not impossible but can be done better.

Conclusions

This is just a small piece of a bigger puzzle. Layout algorithm is one thing, but we need some kind of rendering backend to get things to screen. As briefly mentioned above, we definitely need a way to render text. There won't be much use of a static GUI – but how to handle user input? How to model state, keyboard focus and so on?

Newsletter

Sometimes I write blogposts. If you want to get an old fashioned email announcing arrival of a new tech writing piece from me – you can leave your contact details below.

At the moment there are ... people subscribing.

<- Go to homepage
© Tomasz Czajęcki 2018 – 2022. All Rights Reserved.