Imagine that you need to create a complex UI in the Canvas API. Telling each rectangle where to go is not an option anymore. What do we do?
When developing web apps, we usually don't need to spend time thinking how to position our elements on the screen. We have a powerful layout engine doing the job for us. We just specify Flex layout, margins and so on and it's done.
But how does it work underneath? And what would it take to create a layout engine from scratch, for use cases where we don't have one? Like in Canvas API, for positioning drawn shapes? Or in WebGL, where we are left completely on our own with positioning triangle vertices?
In this episode: the layout engine, or how to avoid going crazy from telling each individual rectangle where to sit.
Many drawing APIs, for example, web Canvas API, allow us to give commands for drawing. ctx.fillRect(x, y, width, height)
. Something like this. With WebGL, it would involve much more code, but would eventually come down to also drawing two triangles on the screen that resemble a rectangle.
The problem is that we cannot keep going with calculating the x
and y
coordinates when our UI gets bigger and more complex.
This is where we need a layout engine.
A good example of a layout engine is, as mentioned above, what the browser is doing with HTML. We provide content and it manages to present it in a way that things end up not overlapping. Even more, we have very fine-grained control over it – using CSS, flexbox and so on.
Essentially, what we have here is a function that takes a tree of layout components with dynamic properties and returns a tree of rectangles in screen space coordinates.
Layout engines can have various levels of complexity. CSS is example of one of the more complex ones, with multiple modes of operation (regular, float, flex, grid) that can be used interchangeably.
I like reinventing the wheel, but when I do it, I try to not pick the hardest way.
Inspiration came from Figma. It features a very powerful, yet concise, auto layout concept which makes designing UIs much easier. And, what was probably the biggest reason, there is mostly only one way to do each thing.
Each frame (a rectangle) can be specified using either fixed coordinates (x
, y
, width
, height
) or it can use auto layout – a mode in which dimensions are calculated automatically.
For the auto layout, the following properties are available:
ROW
or a COLUMN
.FIXED
– specified by setting width
or height
.HUG_CONTENT
– means that it will take size that is just enough to display all the children (aligned in a row or a column).FILL_CONTAINER
– a frame will take as much space as is available in the parent.[TOP, CENTER, BOTTOM] × [LEFT, CENTER, RIGHT]
, for example TOP_LEFT
, CENTER
, BOTTOM_RIGHT
etc. Similar effect to a combination of justify-content
and align-items
, but gracefully handled in one prop.This pretty much covers everything that we might need (almost, but I will get to that in the end), and, if you look one more time at the screenshot above, it covers pretty much everything related to layout in Figma.
To solve the layout problem, we first need a way for users to specify any components.
I went with API design in the style of immediate mode GUIs. People usually refer to this video↗ by Casey Muratori which introduces the concept.
The basic idea is that rendering things on modern computers is fast (and the concept itself is 17 years old, which makes it even more relevant now) and we can afford to render everything from scratch (declare, prepare and send to GPU) every frame.
The most important concept here is that we don't, unlike in deffered rendering UI APIs, declare a button, give it a function pointer that is called when the user clicks.
Here, the button is a function, like this:
if (doButton("Click me")) { print("Looks like I was clicked!"); }
It is declared as part of the runtime execution in the loop, not managed externally. It can be interpreted as if it's a whole React component tree recreated every frame.
Talking about tree, to achieve any kind of tree, I need a way to declare when a frame starts and when it ends. Similarly, some function to render text is necessary.
All together, example API for how it might look like (in Zig, but easily translates to other languages):
const layout: Layout = Layout.init(); const background = Frame{ ... }; const avatar = Frame{ ... }; while (true) { // The rendering loop. layout.frame(background); layout.frame(avatar); layout.end(); layout.text("John Doe"); if (layout.button("Follow")) { // Mark user as followed. } layout.end(); // Send everything to GPU and get ready to draw from scratch next time. layout.draw(); }
This gives us an API to specify UI component tree, provide them with layout parameters (direction, resize mode...) and leaves all the element positioning to the layout algorithm. Perfect.
What happens here is we create a tree of frames, with layout.frame()
marking a new child, all subsequent layout.frame()
calls becoming siblings, and finally, layout.end()
returning to the parent. This way a tree of nodes is generated.
Previously explained parameters in the form of Zig types.
One thing to note here is that in order to save on keystrokes, there's no explicit choice between fixed and auto layout mode. This is questionable decision and I am not sure if I would stick to it going forward, but for now I am happy with this trade off.
In my implementation I have added a validation function that checks in runtime that all constraints are met:
spacing
, alignment
or gap
are allowed only in auto layout mode.HUG_CONTENT
content is only allowed in auto layout.FILL_CONTAINER
is only allowed if parent has auto layout mode on.HUG_CONTENT
set, setting width
or height
is not possible.FILL_CONTAINER
.HUG_CONTENT
property, child cannot use FILL_CONTAINER
.x
, y
) is not allowed in auto layout.SPACE_BETWEEN
, gap
is not allowed.FILL_CONTAINER
is not allowed when parent has SPACE_BETWEEN
property.The code:
pub const Spacing = enum { NONE, SPACE_BETWEEN, }; pub const Resizing = enum { FIXED, HUG_CONTENT, FILL_CONTAINER, }; pub const Direction = enum { NONE, ROW, COLUMN, }; pub const Alignment = enum { NONE, TOP_LEFT, TOP_CENTER, TOP_RIGHT, CENTER_LEFT, CENTER, CENTER_RIGHT, BOTTOM_LEFT, BOTTOM_CENTER, BOTTOM_RIGHT, }; pub const Padding = struct { left: f32 = 0, right: f32 = 0, top: f32 = 0, bottom: f32 = 0, pub fn all(i: f32) Padding { return Padding{ .left = i, .right = i, .top = i, .bottom = i }; } }; pub const Frame = struct { /// Fixed dimensions x: f32 = 0, y: f32 = 0, width: f32 = 0, height: f32 = 0, // Auto layout direction: Direction = Direction.NONE, padding: Padding = Padding{}, horizontal_resizing: Resizing = Resizing.FIXED, vertical_resizing: Resizing = Resizing.FIXED, spacing: Spacing = Spacing.NONE, alignment: Alignment = Alignment.NONE, gap: f32 = 0, // Styling background: Vec4 = colors.TRANSPARENT, };
From the previously described steps we get a tree of nodes which define what we will see on the screen. Now we need to figure out how to place those elements.
The two most tricky issues will be: how to resolve HUG_CONTENT
and FILL_CONTAINER
. Why? For the first one we need to know all possible children so that we can "hug" them, while for the other one we must know how all parent containers look like, in order to fill all available space.
I thought about this for a long time, and eventually came up with a solution that is literally described by the sentence above. We need to know all parents, so what do we do? We traverse the tree top-down, level order. At each step we will know all parents of a given node. Next, we need to know all the children. What do we do? We go bottom-up, also level order. We can base each frame on results from children, no issue.
It will be the ugliest part of text I ever wrote on this blog, but I think that there is no good way around this. I am using pseudocode to cut down on the lines of code, hopefully without skipping anything very important for understanding.
Traverse tree top-down, generate the tree of quads which we will later fill with absolute coordinates.
Traverse tree bottom-up, solving HUG_CONTENT
. For each node in the tree:
If the node is set to HUG_CONTENT
horizontally:
For each of its children:
direction
of the node (not the child) is ROW
:
maximum(current_width, child_width)
Then increase the width
of the node by the sum of left and right padding
of the node and if the direction
was ROW
, the value of the gap
multiplied by the number of children - 1.
If node is set to vertically HUG_CONTENT
, do the same as for horizontal, but replacing ROW
with COLUMN
and width
with height
. padding
and gap
are applied in the same way.
Finally, the third and last pass which will resolve FILL_CONTAINER
and all positions. Again, for each node in the tree:
ROW
and is child's horizontal resizing is NOT set to FILL_CONTAINER
, decrease available_width
by the width of the child.FILL_CONTAINER
, then increase counter of nodes sharing width by one.sharing_width = maximum(sharing_width, 1)
(and same for height
).available_width
and height by padding and gaps (multiplied again by the number of children - 1).FILL_CONTAINER
, set its height to available_width / sharing_width
.available_width
to 0
as there is no more remaining. Same for height
.x
, for child's X coordinate. Depending on which of the 9 combinations of TOP_LEFT
etc. we are dealing with, we will anyway increase the x
in each case by adding the left padding of parent. For TOP_CENTER
, CENTER
, BOTTOM_CENTER
we add also available_width / 2
, and for TOP_RIGHT
, CENTER_RIGHT
and BOTTOM_RIGHT
we add full available_width
.y
, with respectively changed additions.PACKED
, for each child:
ROW
, set the value of its x
to previously calculated x coordinate. Increase that x
by width
of the child and the value of gap
of the node.ROW
, increase child's x
by alignment x
, and if the parent node is right aligned, subtract the width of the child from its x
.COLUMN
and the else branch.SPACE_BETWEEN
, there will be a horizontal_gap = available_width / (children_count - 1)
and similarly vertical_gap
. For each child of the node:
x
value of the child to alignment x
. Same for y
.ROW
, increase the alignment x
by width of this child and the horizontal gap.COLUMN
.ROW
and is vertically centered (alignment is CENTER_LEFT
, CENTER
or CENTER_RIGHT
), then decrease the y
value of the node by half of its height
.COLUMN
and width
.This is terribly complicated, but I don't see any other way around this. It's just a lot of calculations, a lot of parameters to take into account. But the result is certainly worth it. Using this, we can achieve probably any layout that we could implement in CSS, and definitely we can implement any layout designed in Figma.
Time for example of this algorithm in action. It is based on what I described in the previous article – Using Zig for writing OpenGL in browsers. There are some pieces of the story I will leave out for now – mainly how to prepare the rendering pipeline, draw all the rectangles and how to render text.
For now, proof that the algorithm works:
(this will render in a way that makes sense on a screen at least 600px wide, probably will be crazy overlapping on mobile).
This is how I defined frames. You can see how this approach, used with reasonable default values in structs, leads to quite acceptable amount of code. Of course, it is almost 50 LOC, but this is not something that would be much shorter in CSS.
const container = Frame{ .width = width, .height = height, .direction = Direction.ROW, .spacing = Spacing.SPACE_BETWEEN, .background = orange, }; const left = Frame{ .direction = Direction.COLUMN, .width = 300, .vertical_resizing = Resizing.FILL_CONTAINER, .background = pink, }; const right = Frame{ .direction = Direction.COLUMN, .gap = 10, .padding = Padding.all(20), .horizontal_resizing = Resizing.HUG_CONTENT, .vertical_resizing = Resizing.FILL_CONTAINER, .background = pink, }; const left_top = Frame{ .direction = Direction.COLUMN, .horizontal_resizing = Resizing.HUG_CONTENT, .vertical_resizing = Resizing.HUG_CONTENT, .padding = Padding.all(20), .gap = 20, .background = blue, }; const left_center = Frame{ .direction = Direction.COLUMN, .horizontal_resizing = Resizing.FILL_CONTAINER, .vertical_resizing = Resizing.FILL_CONTAINER, .padding = Padding.all(20), .gap = 10, .alignment = Alignment.BOTTOM_RIGHT, .background = green, }; const left_bottom = Frame{ .direction = Direction.COLUMN, .width = 130, .height = 60, .background = yellow, };
This is how I call the rendering tree. Scopes created by the use of {
and }
are artificial. They only help by adding extra indentation but don't have any additional meaning.
try layout.frame(container, "container"); { try layout.frame(left, "left"); { try layout.frame(left_top, "left_top"); try layout.text("This is HUG_CONTENT.", colors.BLACK, 12); try layout.text("Another line.", colors.BLACK, 12); layout.end(); try layout.frame(left_center, "left_center"); try layout.text("FILL_CONTAINER,", colors.BLACK, 12); try layout.text("Aligned to BOTTOM_RIGHT.", colors.BLACK, 12); layout.end(); try layout.frame(left_bottom, "left_bottom"); try layout.text("Fixed size.", colors.BLACK, 12); layout.end(); } layout.end(); try layout.frame(right, "right"); { try layout.text("Those two pink", colors.BLACK, 12); try layout.text("ones are set apart", colors.BLACK, 12); try layout.text("with SPACE_BETWEEN.", colors.BLACK, 12); } layout.end(); } layout.end();
This covers making a layout engine that allows us to express basically anything we might end up creating in Figma designs, which roughly translates to anything we might want to display as part of the user interface.
However, there is one problem here that might be tricky to spot at first – writing actual code is a bit different than just declaring layout, and sometimes we might find ourselves limited by the code structure.
Let's take a select component as an example. We need to render dropdown that will show up precisely below the select button, and will show up over next elements of the UI. With current code, we would have to both move the dropdown rendering to the bottom of the parent container, and calculate x
, y
coordinates manually. Not impossible but can be done better.
This is just a small piece of a bigger puzzle. Layout algorithm is one thing, but we need some kind of rendering backend to get things to screen. As briefly mentioned above, we definitely need a way to render text. There won't be much use of a static GUI – but how to handle user input? How to model state, keyboard focus and so on?
Sometimes I write blogposts. It doesn’t happen very often or in regular intervals, so subscribing to a newsletter might come in handy.
I promise I will use this one only for sending blogposts and nothing else. If I ever want to organize any broader newsletter, there will be another form.
At the moment there are ... people subscribing.