WebGPU: OBJ Loading and Perspective Camera

I have a small blog post sitting in my drafts I thought I would finally finished. I haven't seen many practical examples how to get started with WebGPU and I believe it's a great API and has a great future ahead.

In this post I will show how to:

Load a 3D model from an *.obj file.
Upload the model's data to GPU and render it with basic Blinn-Phong shading.
Add a perspective, orbiting camera that user can control with mouse and keyboard.

If you haven't used WebGPU before but have some beginner to intermediate experience with GPU APIs, this blog post might be for you.

Let's get started!

OBJ parsing

OBJ is a simple text format for storing 3D models and particularly easy to parse.

v 0.723607 -0.447220 0.525725
# ...
vn 0.6363 -0.4776 0.6058
# ...
vt 0.159092 0.118096
# ...
f 1/1/1 20/38/2 19/36/3

v is a vertex, 3 numeric values defining 3D vector.
vn is a normal vector.
vt is for a UV coordinate, hence only 2 values.
f is a face (a triangle). It consists of 3 points separated by space, and each value is a combination of 3 indices separated by /. The first index is the vertex index, the second is the UV index, and the third is the normal index.

This is how a simple *.obj parser looks like:

import { Vec2 } from "./math/Vec2";
import { Vec3 } from "./math/Vec3";

export type FaceVertex = {
  vertexIndex: number;
  uvIndex?: number;
  normalIndex?: number;
};

export type Face = {
  vertices: FaceVertex[];
};

export type ObjData = {
  vertices: Vec3[];
  uvs: Vec2[];
  normals: Vec3[];
  faces: Face[];
};

export function parseObjFile(objFileContents: string): ObjData {
  const vertices: Vec3[] = [];
  const uvs: Vec2[] = [];
  const normals: Vec3[] = [];
  const faces: Face[] = [];

  const lines = objFileContents.split("\n");
  for (const line of lines) {
    const t = line.trim().split(/\s+/);
    if (t[0] === "v") {
      vertices.push(
        new Vec3(parseFloat(t[1]), parseFloat(t[2]), parseFloat(t[3])),
      );
    } else if (t[0] === "vt") {
      uvs.push(new Vec2(parseFloat(t[1]), parseFloat(t[2])));
    } else if (t[0] === "vn") {
      normals.push(
        new Vec3(parseFloat(t[1]), parseFloat(t[2]), parseFloat(t[3])),
      );
    } else if (t[0] === "f") {
      const face: FaceVertex[] = [];
      // We are parsing the rest of the line that started with `f`.
      for (let i = 1; i < t.length; i++) {
        const v = t[i].split("/");
        const vertexIndex = parseInt(v[0]) - 1;
        const uvIndex = v[1] ? parseInt(v[1]) - 1 : undefined;
        const normalIndex = v[2] ? parseInt(v[2]) - 1 : undefined;
        face.push({ vertexIndex, uvIndex, normalIndex });
      }
      faces.push({ vertices: face });
    }
  }

  return { vertices, uvs, normals, faces };
}

Uploading data to GPU

If you are coming to WebGPU with OpenGL background, there will be some similarities. We have to create a shader module. We have to create buffers. But the rest is quite different.

The biggest unit of state is pipeline. Pipeline is basically a combination of shader, depth buffer write/read settings, color blending, and couple other things.

Shader defines compatible bind group layouts. During rendering we select which bind groups should be used. Bind groups contain information such as buffers (for example for storing matrices) or textures.

You can see the code in index.ts in the editor:

This is everything we needed to get some triangles to show up on the screen!

NOTE
Because in WebGPU NDC Z axis is in range [0, 1], we are actually looking at the back half of the model.

Adding perspective – 3D camera

Next we will add a 3D camera. We will need to store a view-projection matrix in a uniform buffer. For that we need a buffer and our first bind group. Bind group layout can be explicitly declared or created automatically. We will do the latter.

In index.ts:

const uniformBuffer = device.createBuffer({
  size: 4 * 16,
  usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
});

// ...

const bindGroup = device.createBindGroup({
  layout: pipeline.getBindGroupLayout(0),
  entries: [{ binding: 0, resource: { buffer: uniformBuffer } }],
});

Inside render():

passEncoder.setBindGroup(0, bindGroup);

Updated shader:

struct VSOut {
  @builtin(position) Position: vec4f,
  @location(0) normal: vec3f,
  @location(1) uv: vec2f,
};

struct Uniforms {
  mvp: mat4x4f,
}

@group(0) @binding(0) var<uniform> uni: Uniforms;

@vertex
fn main(
  @location(0) inPosition: vec3f,
  @location(1) inNormal: vec3f,
  @location(2) inUV: vec2f,
) -> VSOut {
    var vsOut: VSOut;
    vsOut.Position = uni.mvp * vec4f(inPosition, 1);
    vsOut.normal = inNormal;
    vsOut.uv = inUV;
    return vsOut;
}

Now we are looking at the icosahedron from a distance and with proper perspective!

Interactive camera

The next step is to make the camera interactive. I will create a new Camera.ts module for that. Target (what we want to look at) and the distance from it will be the source of truth. Based on that I will calculate camera position.

import { clamp } from "./utils";
import { Mat4 } from "./math/Mat4";
import { Vec3 } from "./math/Vec3";

export class Camera {
  target = new Vec3(0, 0, 0);
  distance = 10;

  scrollDirection = 0;

  wheelTimeout: number | null = null;

  private lastX: number = 0;
  private lastY: number = 0;
  private isDragging: boolean = false;

  constructor(
    public pitch: number,
    public yaw: number,
  ) {
    document.addEventListener("mousedown", this.handleMouseDown);
    document.addEventListener("mousemove", this.handleMouseMove);
    document.addEventListener("mouseup", this.handleMouseUp);
    document.addEventListener("wheel", this.handleMouseWheel);
  }

  handleMouseDown = (event: MouseEvent) => {
    this.isDragging = true;
    this.lastX = event.clientX;
    this.lastY = event.clientY;
  };

  handleMouseMove = (event: MouseEvent) => {
    if (!this.isDragging) return;

    const dx = event.clientX - this.lastX;
    const dy = event.clientY - this.lastY;

    this.lastX = event.clientX;
    this.lastY = event.clientY;

    this.pitch -= dy * 0.01;
    this.yaw += dx * 0.01;
  };

  handleMouseUp = (event: MouseEvent) => {
    this.isDragging = false;
  };

  handleMouseWheel = (event: WheelEvent) => {
    const scaleFactor = 0.04;
    this.distance += event.deltaY * scaleFactor;
    this.distance = clamp(this.distance, 2, 50);
  };

  getPosition(): Vec3 {
    return new Vec3(
      Math.cos(this.pitch) * Math.cos(this.yaw),
      Math.sin(this.pitch),
      Math.cos(this.pitch) * Math.sin(this.yaw),
    )
      .scale(this.distance)
      .add(this.target);
  }

  getView(): Mat4 {
    const position = this.getPosition();
    return Mat4.lookAt(position, this.target, new Vec3(0, 1, 0));
  }
}

Shading

Final step is to add basic simulation of real lighting model. As I mentioned, classic Blinn-Phong will do a great job here. For that we will add a couple of new uniform values:

More uniform values:

struct Uniforms {
  mvp: mat4x4f,
+  cameraPosition: vec3f,
+  lightPosition: vec3f,
+  lightColor: vec3f,
}

Now an important thing to know about how WebGPU approaches buffers. There are padding requirements for uniform structs compatible with std140 layout rules, which state that start of any vector type must align to a multiple of 16 bytes. So we have to both reserve space for them and write them to the buffer followed with a zero.

const uniformBuffer = device.createBuffer({
-  size: 4 * 16,
+  size: 4 * 16 + 3 * 4 * 4, // mat4x4f + 3 * vec3f
  usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
});

Here's the padding with 0s in practice:

const mvp = model.multiply(view).multiply(projection);

-queue.writeBuffer(uniformBuffer, 0, new Float32Array(mvp.data));
+queue.writeBuffer(uniformBuffer, 0, new Float32Array([
+  ...mvp.data,
+  cameraPosition.x,
+  cameraPosition.y,
+  cameraPosition.z,
+  0,
+  lightPosition.x,
+  lightPosition.y,
+  lightPosition.z,
+  0,
+  lightColor.x,
+  lightColor.y,
+  lightColor.z,
+  0,
+]));
queue.submit([commandEncoder.finish()]);

And finally the shading in fragment shader:

let ambientStrength = 0.1;
let ambient = uni.lightColor * ambientStrength;

let norm = normalize(normal);
let lightDir = normalize(uni.lightPosition - fragmentPosition);
let diff = max(dot(norm, lightDir), 0.0);
let diffuse = uni.lightColor * diff;

let specularStrength = 0.5;
let viewDir = normalize(uni.cameraPosition - fragmentPosition);
let reflectDir = reflect(-lightDir, norm);
let spec = pow(max(dot(viewDir, reflectDir), 0.0), 32);
let specular = specularStrength * spec * uni.lightColor;

let objectColor = vec3f(1, 0.5, 0.31);
let result = (ambient + diffuse + specular) * objectColor;

return vec4f(result, 1);

Result

Drag to rotate, scroll to zoom. Keep in mind that it is the camera rotating around the object, but due to lack of background, optical illusion suggests otherwise.

Hope this article was useful! If you are curious how to implement much more modern and realistic lighting model, see my PBR in WebGPU: Implementation Details blog post.