Unicode Text Rendering in Zig with FreeType and HarfBuzz

October 18, 2024
Final result. Devanagari, Hiragana, Cyryllic, Arabic, Emoji.

Final result. Devanagari, Hiragana, Cyryllic, Arabic, Emoji.

A year ago I have researched rendering text from scratch. Using a bunch of miniprojects from the past, I put together a TTF font parser and SDF renderer. This time, for a change, I decided to try using industry standard tools and get text to render the same way that many operating systems and browsers do.

This article is a summary of my journey. I go over terms that were useful to know, challenges I faced and things I learned. Most of the problems here apply to any language or a platform but in case it makes any difference: I am using Zig, WebGPU, FreeType, HarfBuzz.

Glossary

There will be a lot of terms used in the article and as it is with terms, they are often not very descriptive by themselves but mean a very precise thing. Here are some of the terms I will be using:

  • TTF – TrueType Font. A font file format that is widely used and supported.
  • OTF – OpenType Font. An extension of TTF format with additional features.
  • Face – A font file. Part of a font family. Example: Roboto Regular, part of Roboto font family.
  • SDF – Signed Distance Field. A technique to render vector shapes to a bitmap, often used for resolution-independent text rendering.
  • FreeType – A library to parse TTF and OTF files and render the glyphs to a bitmap.
  • HarfBuzz – A text shaping engine.
  • Shaping – A process of converting a unicode text to a list of glyphs with their positions. Note that it is only concerned about one line. Line breaking is another problem, related to segmentation.
  • Unicode – A standard that assigns a unique number to every character in every language.
  • Atlas – A texture that contains multiple smaller textures. In this case, it will contain glyphs.
  • Ligature – A glyph that is created by combining two or more characters.
  • Contextual substitution – A feature that allows to replace one glyph with another depending on the context.
  • Script – A writing system. For example, Latin, Arabic, Cyrillic, Devanagari.
  • Glyph – A visual representation of a character.
  • Ideograph – A character that represents an idea or a concept. For example, Chinese characters.
  • Grapheme – A smallest unit of writing in a language. It can be a letter, a syllable or a character. Since UTF-8 is a variable-length encoding, a single grapheme can be multiple bytes.
  • Cluster – A group of graphemes that should be treated as a single unit. For example, in English, a cluster is a single letter. In Arabic, a cluster is a letter and its diacritics.

    For example, two individual letters are often two separate graphemes. When two letters form a ligature, however, they combine into a single glyph. They are then part of the same cluster and are treated as a unit by the shaping engine — even though the two original, underlying letters remain separate graphemes.

  • Segmentation – A process of breaking a text into smaller parts. There is sentence segmentation – breaking text into sentences, word segmentation – breaking text into words, and grapheme segmentation – breaking text into characters.

Challenges

What things about text rendering are not trivial to replicate yourself?

  • Parsing TTF. You can, but only to some extent. I wrote a whole article about building my own TTF parser: TTF File Parsing. The thing is, this is a complex format that keeps growing over time and while the ‘old’ subset is relatively easy to implement, there are dozens of new features meant to support full Unicode range that are quite complex. And in practice: knowing I can swap any font instead of the few ones my custom parser was fine-tuned to read is a big plus.
  • Rasterizing glyphs. First thing, you need to parse vector data and render glyphs using bezier curves. You need to handle rendering a simple shape, shapes with holes and so on. Then you need to rasterize them to a bitmap. Next comes hinting which is entirely optional but was created for a reason – without it your text will be mostly unreadable at small sizes.
  • Text shaping. It's very easy to render latin alphabet using just information parsed by FreeType. But with any more complex script it becomes a nightmare and it makes you ask yourself is it really how you want to spend several weeks or months worth of your productivity. For example, today I learned that Devanagari script used in Hindi uses a lot of ligatures and contextual forms. And unlike in English alphabet where they only serve decorational purposes, in Hindi they are essential for properly displaying text. If I were to continue going down the route of implementing everything from scratch, I would probably still be working at the moment on supporting full GPOS and GSUB tables in my font parser.
  • Segmenting. Again this is something that is easy to do yourself for English – just break the text at spaces. But again that only works for some alphabets. In Chinese, Japanese and Thai there are no spaces and you need a different method to detect word boundaries.
  • Iterating. Believe it or not, but even iterating a UTF-8 string is a challenge. Since UTF-8 text is stored as an array of u8, obviously only a basic latin script fits in there and anything else uses multiple bytes. This means that length of the array is not equal to the number of characters in the string. It is not very hard to write your own function that properly iterates over a UTF-8 string but Zig happens to have support for that in the standard library so I will just use that.
    var utf8 = try std.unicode.Utf8View.init(value).iterator();
    while (utf8.nextCodepointSlice()) |slice| {
        const codepoint = try std.unicode.utf8Decode (slice);
        std.debug.print("{s}", .{codepoint});
    }
    
  • Emoji. I will go into more detail further down the article, but let's just mention that there are several different ways to store emojis and it includes whole SVG files, custom colored outline formats or even entire bitmaps.

What libraries did I use?

FreeType can parse TTF file and use it to render font to a bitmap. The key strength here is knowing that it will support almost anything you throw at it. As a bonus, it has quite good hinting, which, while arguably ugly, helps a ton with readability of small text.

HarfBuzz on the other hand is a text shaping engine. It takes a unicode text and returns positions of glyphs. It supports dozens of complex languages and scripts, ligatures, contextual forms and is de facto industry standard.

PlutoSVG is a library that can render SVG files to a bitmap. The reason I chose it was that it's a C library, which made Zig bindings very easy. As a nice bonus, it has hooks for FreeType included, which saved me some trial and error.

stb_rect_pack is a single header library for packing texture atlas – combining rectangles into one larger rectangle.

zgpu is a Zig wrapper for WebGPU that uses Dawn. Part of the larger Zig Gamedev library collection but I only needed the WebGPU support part.

Some technical details

I am using Zig 0.13.0-dev.351+64ef45eb0 which I download and manage using zigup. You can see the final result of techniques described in this blogpost in the repository: zig-text-rendering.

Note on using Zig vs C

Recently I have built another similar project (using WebGPU to render things) in C and I have some observations. So where does Zig win for me?

Compilation time for dummies

I tried to do everything in my (limited) power to keep the compilation times fast - I avoided using templates, stayed with basic C-like C++ syntax, kept implementations outside of header files, tried to avoid unnecessary header imports. But that doesn’t help with the elephant in the room – Dawn – which makes the compile times take around 15-30s. And in Zig it is usually close to 5s. It’s worth noting that this is not a fair comparison as I am using zgpu which is not just simply compiling Dawn in Zig and I am pretty sure there are plenty of ways to make Dawn leaner. So don't take it as a strong argument – I am pretty sure the problems are mitigable in C as well, just in this particular setup Zig wins.

Memory safety

I remember from my first attempts at using Zig sometime in 2022 that std.testing.allocator would throw an error if there was a memory leak in the test. In mid 2024 I am delighted to report that now in debug builds the regular std.heap.GeneralPurpuseAllocator is doing the same! This already helped me find some leaks in parts I wouldn’t normally be eager to cover in tests.

Elegant syntax

That’s purely subjective and I am pretty sure many people won’t share this feeling but I love Zig syntax. defer is brilliant. I learned to like the while (i < 10) : (i += 1) loop syntax.

And a new highlight for me - multisequence for loops:

for (positions, sizes, 0..) |position, size, id| {
  ...
}

This helps me keep the code shorter and cleaner when iterating over multiple arrays.

Overall strictness

When porting my code from C to Zig I found a bit of dead code. This is not a fair comparison since I haven’t tried to port this Zig code any further, but I am somewhat confident that Zig’s compiler being strict about const vs var declarations and forcing _ = ... for unused variables has a positive impact on my code.

What wasn't so bad about C?

Having said all that, I am actually quite surprised how manageable my experience with C was. CMake is still a terrible mess with incomprehensive syntax but gets the job done – to my surprise nowadays it’s even possible to download dependencies automatically from GitHub:

Include(FetchContent)
FetchContent_Declare(
  Catch2
  GIT_REPOSITORY https://github.com/catchorg/Catch2.git
  GIT_TAG        v3.4.0
)
FetchContent_MakeAvailable(Catch2)

Which brings it on par with Zig’s relatively new package manager capabilities.

Since I already mentioned Catch2 – it’s a really nice testing framework that was relatively easy to set up and worked very reliably ever since.

Unexpected JavaScript win

Still, in both languages, I can only dream of what I have in NPM, where all dependencies live in one directory that I can rebuild with one command (unlike a mix of vendored in libraries that I keep in the repository, some that are downloaded, and some that use a mix of living in my repository but having to download their own dependencies, and so on…), packages use semantic versioning (here best I can do is check commit hashes, but that is also often tricky because in case of some dependencies I am relying on some stale repository with a Zig wrapper for a C library). And not only it took me significantly more time to set everything up but also I don’t think I see myself upgrading any version any time soon. Unlike with NPM where I can just try updating by changing one number and super easily revert if it breaks anything.

This of course has a flip side – the ease of including new libraries in JavaScript definitely leads people to overusing them. On the other hand C projects are known for being very conservative about dependencies and rewriting many things from scratch. While it might be hurting productivity a tiny bit, it's without a doubt win for security and performance on the end user side.

Architecture

Back to the project and Zig. For getting things to appear on the screen, I used WebGPU. I have used zig-gamedev project – the Dawn (Google’s WebGPU implementation) bindings, GLFW and a math library. For FreeType and HarfBuzz I used mach-freetype which is (are?) Zig bindings for those two libraries.

The concept of renderer is similar to what I did previously with my UI libraries like Red Otter and as I found later is unsurprisingly roughly the same as what Dear ImGui uses.

In short, when the app starts, a font atlas texture is generated, which means that FreeType is used for rasterizing given character ranges. They are stored in a texture using a simple algorithm to pack bitmap – I initially used Packing Lightmaps article for this but later switched to stb_rect_pack as it worked better for a large number of similarly sized glyphs.

Every frame issues text rendering commands like renderer.text("Hello world!", 200, 100); and in the end of the frame renderer.draw(); is called. The renderer constructs a buffer of vertices and UV coordinates and does a single draw call to the GPU.

This is the buffer layout shape:

const vertex_attributes = [_]wgpu.VertexAttribute{
    .{ .format = .float32x2, .offset = 0, .shader_location = 0 },
    .{ .format = .float32x2, .offset = 2 * @sizeOf(f32), .shader_location = 1 },
};
const vertex_buffers = [_]wgpu.VertexBufferLayout{.{
    .array_stride = 4 * @sizeOf(f32),
    .attribute_count = vertex_attributes.len,
    .attributes = &vertex_attributes,
}};

Vertex shader

Vertex shader is very straightforward and doesn't need to do any transformations – I am doing clip space calculations before sending the vertices to the GPU.

struct VertexIn {
    @location(0) position: vec2f,
    @location(1) uv: vec2f,
};

struct VertexOut {
    @builtin(position) position: vec4f,
    @location(1) uv: vec2f,
};

@vertex fn main(in: VertexIn) -> VertexOut {
    var out: VertexOut;
    out.position = vec4f(in.position, 0.0, 1.0);
    out.uv = in.uv;
    return out;
}

Fragment shader

Fragment shader just reads color information from the texture.

struct VertexOut {
    @builtin(position) position: vec4f,
    @location(1) uv: vec2f,
};

@group(0) @binding(0) var t: texture_2d<f32>;
@group(0) @binding(1) var s: sampler;

@fragment fn main(in: VertexOut) -> @location(0) vec4f {
    return textureSample(t, s, in.uv);
}

Basic setup

In the simplest variant, we need to initialize a FreeType library, load a font face, create a HarfBuzz face and font and set their sizes.

const font_binary = @embedFile("./assets/NotoSans-Regular.ttf");

const font_size = 13;
const ft_lib = try ft.Library.init();
const ft_face = try ft_lib.createFaceMemory(font_binary, 0);
const hb_face = hb.Face.fromFreetypeFace(ft_face);
const hb_font = hb.Font.init(hb_face);

ft_face.setPixelSizes(0, font_size);
hb_font.setScale(font_size * 64, font_size * 64);

You can check this function for more complex implementation details.

Note on loading fonts: initially I tried to operate on Unicode ranges for alphabets I was interested in to minimize character duplication and focus on ones that I am sure are needed. This quickly became a problem – contextual substitutions use glyphs that do not have Unicode representation. So I ended up loading the whole font file – all glyphs – into the font atlas.

This has one drawback – if gluing together multiple fonts, like with Noto Sans, there will be a lot of duplicate glyphs coming from each (or nearly each) font having latin alphabet. Some optimization could be done but I am not sure what's the best way to compare glyphs without Unicode code points.

Shaping text

As I mentioned, basic text shaping is possible using only FreeType. Here is how it looks:

pub fn shape(allocator: Allocator, glyphs: GlyphMap, value: []const u8) !TextShape {
    var shapes = std.ArrayList(GlyphShape).init(allocator);
    var cursor_x: i32 = 0;
    for (value, 0..) |c, i| {
        const glyph = glyphs.get(c) orelse continue;
        try shapes.append(GlyphShape{
            .x = cursor_x + glyph.bearing_x,
            .y = -glyph.bearing_y,
            .glyph = glyph,
        });
        cursor_x += glyph.advance_x;
    }
    return shapes.toOwnedSlice();
}

One common problem I noticed about most examples I found online was that people were ignoring the bitmap_top and bitmap_left offsets from FreeType bitmaps. Most of the time they are either 0 or 1px and might not be visible when using a larger font size, but for smaller text they will produce very noticeable spacing issues. In the example above I am storing those offsets as bearing_x and bearing_y. Not sure if that's semantically correct but I saw it somewhere and picked up using that name.

This works nice but has some limitations as mentioned previously. It doesn't work for more complex scripts that require ligatures or contextual substitutions like Devanagari or Arabic.

Using HarfBuzz

HarfBuzz operates on a buffer. It takes a Unicode string and returns a list of glyphs. The length of those lists can be different than the input string – for example, a ligature can merge two letters into a single glyph.

pub fn shape(allocator: Allocator, hb_face: hb.Face, glyphs: GlyphMap, value: []const u8) ![]GlyphShape {
    var shapes = std.ArrayList(GlyphShape).init(allocator);
    var buffer = hb.Buffer.init() orelse return error.OutOfMemory;
    defer buffer.deinit();

    buffer.guessSegmentProps();
    buffer.addUTF8(value, 0, null);

    hb_font.shape(buffer, null);

    const infos = buffer.getGlyphInfos();
    const positions = buffer.getGlyphPositions() orelse return error.OutOfMemory;

    var cursor_x: i32 = 0;
    for (positions, infos) |pos, info| {
        const glyph = glyphs.get(info.codepoint) orelse continue;
        try shapes.append(.{
            .x = cursor_x + glyph.bearing_x,
            .y = -glyph.bearing_y,
            .glyph = glyph,
        });
        cursor_x += pos.x_advance >> 6;
    }

    return shapes.toOwnedSlice();
}

Multiple fonts and scripts

One OTF font file is limited to 65535 glyphs. This is a lot but not enough for to cover all possible unicode characters. So it is unavoidable that to support multiple scripts you will end up having to support multiple font files. It is important to note here that HarfBuzz shape() function works on a font face basis. To solve it I have a list of fonts and a function which maps character ranges to a HarfBuzz script that is best suited for rendering it. Then I have another function which takes the input string and returns a list of ranges of characters that can be shaped using the same script.

I really like the Zig syntax for switch and number ranges.

fn codepointToScript(codepoint: u64) hb.Script {
    return switch (codepoint) {
        0x0020...0x007F => hb.Script.latin,
        0x00A0...0x00FF => hb.Script.latin,
        0x0100...0x017F => hb.Script.latin,
        0x0180...0x024F => hb.Script.latin,
        0x0900...0x097F => hb.Script.devanagari,
        0x0600...0x06FF => hb.Script.arabic,
        0x3041...0x3096 => hb.Script.hiragana,
        0x30A0...0x30FF => hb.Script.katakana,
        else => hb.Script.common,
    };
}

This is how the function to split the string into ranges looks like:

/// Split the input string into list of ranges with the same script.
fn getRanges(allocator: Allocator, value: []const u8) ![]Range {
    var ranges = std.ArrayList(Range).init(allocator);
    var utf8 = try std.unicode.Utf8View.init(value);
    var iterator = utf8.iterator();

    var current_range: ?Range = null;
    var byte_index: usize = 0;

    while (iterator.nextCodepointSlice()) |slice| {
        const codepoint = try std.unicode.utf8Decode(slice);
        const script = codepointToScript(codepoint);

        if (current_range) |*range| {
            if (range.script == script) {
                range.end = byte_index + slice.len - 1;
            } else {
                try ranges.append(range.*);
                current_range = Range{
                    .script = script,
                    .start = byte_index,
                    .end = byte_index + slice.len - 1,
                };
            }
        } else {
            current_range = Range{
                .script = script,
                .start = byte_index,
                .end = byte_index + slice.len - 1,
            };
        }
        byte_index += slice.len;
    }

    if (current_range) |range| {
        try ranges.append(range);
    }

    return ranges.toOwnedSlice();
}

And the improved shaping function:

pub fn shape(allocator: Allocator, hb_face: hb.Face, glyphs: GlyphMap, value: []const u8) ![]GlyphShape {
  const ranges = try getRanges(allocator, value);
    defer allocator.free(ranges);

    var shapes = std.ArrayList(GlyphShape).init(allocator);
    var cursor_x: i32 = 0;

    const segments = try segment(allocator, value);
    defer allocator.free(segments);

    for (ranges) |range| {
        var buffer = hb.Buffer.init() orelse return error.OutOfMemory;
        defer buffer.deinit();

        buffer.setDirection(scriptToDirection(range.script));
        buffer.setScript(range.script);

        buffer.addUTF8(value[range.start .. range.end + 1], 0, null);

        const fontId = scriptToFont(range.script) orelse {
            std.debug.print("No font for script {d}\n", .{@intFromEnum(range.script)});
            continue;
        };

        self.fonts[fontId].hb_font.shape(buffer, null);

        const infos = buffer.getGlyphInfos();
        const positions = buffer.getGlyphPositions() orelse return error.OutOfMemory;

        for (positions, infos) |pos, info| {
            // After shaping info.codepoint is a glyph index not 
            // unicode point. 
            const glyph = self.fonts[fontId].glyphs.get(info.codepoint) orelse {
                std.debug.print("No glyph for {d}\n", .{info.codepoint});
                continue;
            };

            try shapes.append(GlyphShape{
                .x = cursor_x + (pos.x_offset >> 6) + glyph.bearing_x,
                .y = (pos.y_offset >> 6) - glyph.bearing_y,
                .glyph = glyph,
            });
            cursor_x += pos.x_advance >> 6;
        }
    }
    return shapes.toOwnedSlice();
}

Retina displays

Since in the renderer I am manually calculating NDC coordinates, as long as I read the buffer size from GLFW window, everything stays sharp. The only missing thing is to also check the DPR (Device Pixel Ratio) and multiply by it the font size and other sizes.

Emoji

I briefly mentioned how TTF files worked in the introduction. From what we discussed so far we can easily conclude that the classic font files won’t take us anywhere near rendering emojis which have multiple colors and complex shapes.

I spent a while trying to get to the bottom of this and it's a never ending mess. Basically, there are multiple color formats and seems like every major company ended up rolling their own. Apple has sbix bitmap format (yes, emojis on Apple divices come from fixed size bitmaps), Google has CBDT/CBLC that used to be used in Noto Color Emoji, Microsoft uses COLR/CPAL (aka COLRv0). There's SVG-in-OTF (OpenType SVG) which is a whole SVG file embedded in a font file, supported by some fonts. Since recently there's now also COLRv1 which is an evolution of COLRv0 and features a binary format that could be described as a minimal subset of SVG that is needed to render glyphs of any shape and color (including gradients). It is said to be the future of color fonts but requires a custom renderer for generating bitmaps and so far only Skia has one.

There's barely any information about those formats online and very few examples how to render them using FreeType.

In the end I ended up with similar approach to DearIMGUI by using OT SVG. I used PlutoSVG. It is a C library which makes it convenient to write Zig bindings for it. It also provides the SVG hooks for FreeType, making it as simple as possible to set up. I also want to shout out to the author who very swiftly jumped in and fixed a bug in the library that broke radial gradients and thus rendering of Noto Color Emoji.

const hooks = plutosvg.c.plutosvg_ft_svg_hooks() orelse return error.PlutoSVG;
try ft_lib.setProperty("ot-svg", "svg-hooks", hooks);

Since now there were colorful glyphs I had to adjust the font atlas generation:

switch (ft_bitmap.pixelMode()) {
    .gray => {
        // ...
    },
    .bgra => {
        // New!
    },
    else => unreachable;
}

And to keep it simple, I switched the texture from single channel (alpha) to RGBA – this had unfortunate side effect of making my text always and only white, but this is easy to fix by differentiating between color and regular glyphs.

Final result. Devanagari, Hiragana, Cyryllic, Arabic, Emoji.

Final result. Devanagari, Hiragana, Cyryllic, Arabic, Emoji.

Next steps

This journey is far from over and there's a lot more to do.

For now I am loading whole fonts into one texture but this has already become very limiting. Especially on retina displays where the resolution can be twice as large, I started hitting hardware texture size limits.

There's a whole separate topic of text segmentation. While, as usual, it's relatively straightforward to implement it for English and other European languages by breaking on spaces, there are many Asian languages that don't use spaces or have more sophisticated rules. There's a whole family of libraries, with its newest addition called ICU4X that can help with that.

But segmentation makes it way harder to shape text. HarfBuzz maps a list of Unicode codepoints to a list of glyphs (which can as well be longer or shorter than the original). There's no way to map back. Knowing that glyph 4274 exceeds availalble space for a text line is not enough to tell what was the previous available boundary. There are many different approaches possible here, most rely on heavy caching of the shaped text and I decided to leave it for later.

Quick summary

I hope you enjoyed reading about text rendering adventures. If you are curious about some less strictly implementation-related facts about text and fonts, check out my thread on Twitter/X.

If you are curious to see more of the code, check out the zig-text-rendering.

More reading

<-
Homepage

Stay up to date with a newsletter

Sometimes I write blogposts. It doesn’t happen very often or in regular intervals, so subscribing to my newsletter might come in handy if you enjoy what I am writing about.

Never any spam, unsubscribe at any time.