How to write a layout engine

Imagine that you need to create a complex UI in the Canvas API. Telling each rectangle where to go is not an option anymore. What do we do?

Dec 29, 2022

When developing web apps, we usually don't need to spend time thinking how to position our elements on the screen. We have a powerful layout engine doing the job for us. We just specify Flex layout, margins and so on and it's done.

But how does it work underneath? And what would it take to create a layout engine from scratch, for use cases where we don't have one? Like in Canvas API, for positioning drawn shapes? Or in WebGL, where we are left completely on our own with positioning triangle vertices?

In this episode: the layout engine, or how to avoid going crazy from telling each individual rectangle where to sit.

What am I actually solving here

Many drawing APIs, for example, web Canvas API, allow us to give commands for drawing. ctx.fillRect(x, y, width, height). Something like this. With WebGL, it would involve much more code, but would eventually come down to also drawing two triangles on the screen that resemble a rectangle.

The problem is that we cannot keep going with calculating the x and y coordinates when our UI gets bigger and more complex.

This is where we need a layout engine.

A good example of a layout engine is, as mentioned above, what the browser is doing with HTML. We provide content and it manages to present it in a way that things end up not overlapping. Even more, we have very fine-grained control over it – using CSS, flexbox and so on.

Essentially, what we have here is a function that takes a tree of layout components with dynamic properties and returns a tree of rectangles in screen space coordinates.

The concept

Layout engines can have various levels of complexity. CSS is example of one of the more complex ones, with multiple modes of operation (regular, float, flex, grid) that can be used interchangeably.

I like reinventing the wheel, but when I do it, I try to not pick the hardest way.

Inspiration came from Figma. It features a very powerful, yet concise, auto layout concept which makes designing UIs much easier. And, what was probably the biggest reason, there is mostly only one way to do each thing.

Figma UI

Parameters

Each frame (a rectangle) can be specified using either fixed coordinates (x, y, width, height) or it can use auto layout – a mode in which dimensions are calculated automatically.

For the auto layout, the following properties are available:

  • Direction – either ROW or a COLUMN.
  • Vertical and horizontal resizing:
    • FIXED – specified by setting width or height.
    • HUG_CONTENT – means that it will take size that is just enough to display all the children (aligned in a row or a column).
    • FILL_CONTAINER – a frame will take as much space as is available in the parent.
  • Spacing mode – either packed or space between (as in CSS flexbox).
  • Alignment[TOP, CENTER, BOTTOM] × [LEFT, CENTER, RIGHT], for example TOP_LEFT, CENTER, BOTTOM_RIGHT etc. Similar effect to a combination of justify-content and align-items, but gracefully handled in one prop.
  • Padding: how much space is added before looking for a place to put children.
  • Gap: how much space should be left between children aligned in a column/row.

This pretty much covers everything that we might need (almost, but I will get to that in the end), and, if you look one more time at the screenshot above, it covers pretty much everything related to layout in Figma.

Implementation

To solve the layout problem, we first need a way for users to specify any components.

I went with API design in the style of immediate mode GUIs. People usually refer to this video by Casey Muratori which introduces the concept.

The basic idea is that rendering things on modern computers is fast (and the concept itself is 17 years old, which makes it even more relevant now) and we can afford to render everything from scratch (declare, prepare and send to GPU) every frame.

The most important concept here is that we don't, unlike in deffered rendering UI APIs, declare a button, give it a function pointer that is called when the user clicks.

Here, the button is a function, like this:

if (doButton("Click me")) {
  print("Looks like I was clicked!");
}

It is declared as part of the runtime execution in the loop, not managed externally. It can be interpreted as if it's a whole React component tree recreated every frame.

Talking about tree, to achieve any kind of tree, I need a way to declare when a frame starts and when it ends. Similarly, some function to render text is necessary.

All together, example API for how it might look like (in Zig, but easily translates to other languages):

const layout: Layout = Layout.init();

const background = Frame{ ... };
const avatar = Frame{ ... };

while (true) { // The rendering loop.
  layout.frame(background);

  layout.frame(avatar);
  layout.end();

  layout.text("John Doe");

  if (layout.button("Follow")) {
    // Mark user as followed.
  }

  layout.end();

  // Send everything to GPU and get ready to draw from scratch next time.
  layout.draw();
}

This gives us an API to specify UI component tree, provide them with layout parameters (direction, resize mode...) and leaves all the element positioning to the layout algorithm. Perfect.

What happens here is we create a tree of frames, with layout.frame() marking a new child, all subsequent layout.frame() calls becoming siblings, and finally, layout.end() returning to the parent. This way a tree of nodes is generated.

Types

Previously explained parameters in the form of Zig types.

One thing to note here is that in order to save on keystrokes, there's no explicit choice between fixed and auto layout mode. This is questionable decision and I am not sure if I would stick to it going forward, but for now I am happy with this trade off.

In my implementation I have added a validation function that checks in runtime that all constraints are met:

  • Properties such as spacing, alignment or gap are allowed only in auto layout mode.
  • HUG_CONTENT content is only allowed in auto layout.
  • FILL_CONTAINER is only allowed if parent has auto layout mode on.
  • If node has HUG_CONTENT set, setting width or height is not possible.
  • Similarly, setting size is not allowed with FILL_CONTAINER.
  • If parent has HUG_CONTENT property, child cannot use FILL_CONTAINER.
  • Setting position (x, y) is not allowed in auto layout.
  • If spacing mode is SPACE_BETWEEN, gap is not allowed.
  • FILL_CONTAINER is not allowed when parent has SPACE_BETWEEN property.

The code:

pub const Spacing = enum {
    NONE,
    SPACE_BETWEEN,
};

pub const Resizing = enum {
    FIXED,
    HUG_CONTENT,
    FILL_CONTAINER,
};

pub const Direction = enum {
    NONE,
    ROW,
    COLUMN,
};

pub const Alignment = enum {
    NONE,

    TOP_LEFT,
    TOP_CENTER,
    TOP_RIGHT,

    CENTER_LEFT,
    CENTER,
    CENTER_RIGHT,

    BOTTOM_LEFT,
    BOTTOM_CENTER,
    BOTTOM_RIGHT,
};

pub const Padding = struct {
    left: f32 = 0,
    right: f32 = 0,
    top: f32 = 0,
    bottom: f32 = 0,

    pub fn all(i: f32) Padding {
        return Padding{ .left = i, .right = i, .top = i, .bottom = i };
    }
};

pub const Frame = struct {
    /// Fixed dimensions
    x: f32 = 0,
    y: f32 = 0,
    width: f32 = 0,
    height: f32 = 0,

    // Auto layout
    direction: Direction = Direction.NONE,
    padding: Padding = Padding{},
    horizontal_resizing: Resizing = Resizing.FIXED,
    vertical_resizing: Resizing = Resizing.FIXED,
    spacing: Spacing = Spacing.NONE,
    alignment: Alignment = Alignment.NONE,
    gap: f32 = 0,

    // Styling
    background: Vec4 = colors.TRANSPARENT,
};

Calculating the layout

From the previously described steps we get a tree of nodes which define what we will see on the screen. Now we need to figure out how to place those elements.

The two most tricky issues will be: how to resolve HUG_CONTENT and FILL_CONTAINER. Why? For the first one we need to know all possible children so that we can "hug" them, while for the other one we must know how all parent containers look like, in order to fill all available space.

I thought about this for a long time, and eventually came up with a solution that is literally described by the sentence above. We need to know all parents, so what do we do? We traverse the tree top-down, level order. At each step we will know all parents of a given node. Next, we need to know all the children. What do we do? We go bottom-up, also level order. We can base each frame on results from children, no issue.

Pseudocode

It will be the ugliest part of text I ever wrote on this blog, but I think that there is no good way around this. I am using pseudocode to cut down on the lines of code, hopefully without skipping anything very important for understanding.

  1. Traverse tree top-down, generate the tree of quads which we will later fill with absolute coordinates.

  2. Traverse tree bottom-up, solving HUG_CONTENT. For each node in the tree:

    • If the node is set to HUG_CONTENT horizontally:

      • For each of its children:

        • If direction of the node (not the child) is ROW:
          • increase width of the node by width of the child.
        • Else:
          • Set width of the node to maximum(current_width, child_width)
      • Then increase the width of the node by the sum of left and right padding of the node and if the direction was ROW, the value of the gap multiplied by the number of children - 1.

    • If node is set to vertically HUG_CONTENT, do the same as for horizontal, but replacing ROW with COLUMN and width with height. padding and gap are applied in the same way.

  3. Finally, the third and last pass which will resolve FILL_CONTAINER and all positions. Again, for each node in the tree:

    • First calculate available width and height for expanding. For each child of the node:
      • If node has direction ROW and is child's horizontal resizing is NOT set to FILL_CONTAINER, decrease available_width by the width of the child.
      • Similar for column, vertical and height.
      • If node has direction row and horizontal resizing set to FILL_CONTAINER, then increase counter of nodes sharing width by one.
      • Similar for column, vertical and height.
    • To avoid division by zero, sharing_width = maximum(sharing_width, 1) (and same for height).
    • Decrease available_width and height by padding and gaps (multiplied again by the number of children - 1).
    • For each child:
      • If the child has horizontal resizing mode set to FILL_CONTAINER, set its height to available_width / sharing_width.
      • Same for vertical resizing and height.
    • If there was at least one child filling width, set available_width to 0 as there is no more remaining. Same for height.
    • Calculate alignment:
      • Create a variable x, for child's X coordinate. Depending on which of the 9 combinations of TOP_LEFT etc. we are dealing with, we will anyway increase the x in each case by adding the left padding of parent. For TOP_CENTER, CENTER, BOTTOM_CENTER we add also available_width / 2, and for TOP_RIGHT, CENTER_RIGHT and BOTTOM_RIGHT we add full available_width.
      • Similarly we need a variable y, with respectively changed additions.
    • Time to solve spacing mode.
      • If spacing mode is PACKED, for each child:
        • If direction is ROW, set the value of its x to previously calculated x coordinate. Increase that x by width of the child and the value of gap of the node.
        • If direction wasn't ROW, increase child's x by alignment x, and if the parent node is right aligned, subtract the width of the child from its x.
        • Exactly opposite for COLUMN and the else branch.
      • If spacing mode is SPACE_BETWEEN, there will be a horizontal_gap = available_width / (children_count - 1) and similarly vertical_gap. For each child of the node:
        • Set x value of the child to alignment x. Same for y.
        • If direction of the parent is ROW, increase the alignment x by width of this child and the horizontal gap.
        • Similar for COLUMN.
    • Solving vertical and horizontal centering – again for each child:
      • If node's direction is ROW and is vertically centered (alignment is CENTER_LEFT, CENTER or CENTER_RIGHT), then decrease the y value of the node by half of its height.
      • Analogical situation for COLUMN and width.

This is terribly complicated, but I don't see any other way around this. It's just a lot of calculations, a lot of parameters to take into account. But the result is certainly worth it. Using this, we can achieve probably any layout that we could implement in CSS, and definitely we can implement any layout designed in Figma.

Result

Time for example of this algorithm in action. It is based on what I described in the previous article – Using Zig for writing OpenGL in browsers. There are some pieces of the story I will leave out for now – mainly how to prepare the rendering pipeline, draw all the rectangles and how to render text.

For now, proof that the algorithm works:

(this will render in a way that makes sense on a screen at least 600px wide, probably will be crazy overlapping on mobile).

This is how I defined frames. You can see how this approach, used with reasonable default values in structs, leads to quite acceptable amount of code. Of course, it is almost 50 LOC, but this is not something that would be much shorter in CSS.

const container = Frame{
    .width = width,
    .height = height,
    .direction = Direction.ROW,
    .spacing = Spacing.SPACE_BETWEEN,
    .background = orange,
};

const left = Frame{
    .direction = Direction.COLUMN,
    .width = 300,
    .vertical_resizing = Resizing.FILL_CONTAINER,
    .background = pink,
};

const right = Frame{
    .direction = Direction.COLUMN,
    .gap = 10,
    .padding = Padding.all(20),
    .horizontal_resizing = Resizing.HUG_CONTENT,
    .vertical_resizing = Resizing.FILL_CONTAINER,
    .background = pink,
};

const left_top = Frame{
    .direction = Direction.COLUMN,
    .horizontal_resizing = Resizing.HUG_CONTENT,
    .vertical_resizing = Resizing.HUG_CONTENT,
    .padding = Padding.all(20),
    .gap = 20,
    .background = blue,
};

const left_center = Frame{
    .direction = Direction.COLUMN,
    .horizontal_resizing = Resizing.FILL_CONTAINER,
    .vertical_resizing = Resizing.FILL_CONTAINER,
    .padding = Padding.all(20),
    .gap = 10,
    .alignment = Alignment.BOTTOM_RIGHT,
    .background = green,
};

const left_bottom = Frame{
    .direction = Direction.COLUMN,
    .width = 130,
    .height = 60,
    .background = yellow,
};

This is how I call the rendering tree. Scopes created by the use of { and } are artificial. They only help by adding extra indentation but don't have any additional meaning.

try layout.frame(container, "container");
{
    try layout.frame(left, "left");
    {
        try layout.frame(left_top, "left_top");
        try layout.text("This is HUG_CONTENT.", colors.BLACK, 12);
        try layout.text("Another line.", colors.BLACK, 12);
        layout.end();

        try layout.frame(left_center, "left_center");
        try layout.text("FILL_CONTAINER,", colors.BLACK, 12);
        try layout.text("Aligned to BOTTOM_RIGHT.", colors.BLACK, 12);
        layout.end();

        try layout.frame(left_bottom, "left_bottom");
        try layout.text("Fixed size.", colors.BLACK, 12);
        layout.end();
    }
    layout.end();

    try layout.frame(right, "right");
    {
        try layout.text("Those two pink", colors.BLACK, 12);
        try layout.text("ones are set apart", colors.BLACK, 12);
        try layout.text("with SPACE_BETWEEN.", colors.BLACK, 12);
    }
    layout.end();
}
layout.end();

Future work

This covers making a layout engine that allows us to express basically anything we might end up creating in Figma designs, which roughly translates to anything we might want to display as part of the user interface.

z-index

However, there is one problem here that might be tricky to spot at first – writing actual code is a bit different than just declaring layout, and sometimes we might find ourselves limited by the code structure.

Let's take a select component as an example. We need to render dropdown that will show up precisely below the select button, and will show up over next elements of the UI. With current code, we would have to both move the dropdown rendering to the bottom of the parent container, and calculate x, y coordinates manually. Not impossible but can be done better.

Conclusions

This is just a small piece of a bigger puzzle. Layout algorithm is one thing, but we need some kind of rendering backend to get things to screen. As briefly mentioned above, we definitely need a way to render text. There won't be much use of a static GUI – but how to handle user input? How to model state, keyboard focus and so on?

Newsletter

Sometimes I write blogposts. It doesn’t happen very often or in regular intervals, so subscribing to a newsletter might come in handy.

I promise I will use this one only for sending blogposts and nothing else. If I ever want to organize any broader newsletter, there will be another form.

At the moment there are ... people subscribing.

<- Go to homepage
© Tomasz Czajęcki 2018 – 2022. All Rights Reserved.