Dec 26, 2018

Quick guide about the concepts of 3D rendering and the maths behind it. Why matrices are so powerful, adding perspective, cameras and lighting.

Many clever people who are far superior teachers than me already covered the topic, so please don't try to learn WebGL from me. But if you ever wanted to get to know it closer and expect something concise and dense instead of a full-blown tutorial – you've got what you asked for.

When you are trying to match the name WebGL with the stuff you usually see, a vision of some 3D rendering framework probably comes to your mind. Reality is much more interesting.

WebGL is a rasterization engine. It transforms things into pixels on the screen, in a viewport chosen by you (a `<canvas>`

for example). What can it transform? Really simple shapes, limited to points, lines and triangles, but nobody uses anything other than triangles. To make it even simpler, it is limited to a box of a fixed size: `[-1, 1] × [-1, 1] × [-1, 1]`

(it's called *clip space* by the way). You can enable a feature called *depth testing* which will respect the fact that some triangles should be in front of the others (otherwise the order in which you specified vertices to the GPU will take precedence). For every vertex defining those triangles, it will execute a *vertex shader* specified by you and the same happens for every pixel in the viewport with a *fragment shader*.

WebGL is stateful. So once we set some uniform matrix to some value, it will stay like that. Once we enable depth testing, it will remain open until we change our mind. It's a very important concept to grasp that, in my opinion, makes many people struggle with coming up with their own ways to do things with WebGL. Once you get it, your adventure with WebGL will be much easier.

WebGL is OpenGL for the web. I won't go into details since nobody cares (or at least nobody should). They are mostly the same and the similarity is there for a reason (not another Java and JavaScript case here).

I wanted to keep it shorter, but it wouldn't make much sense without some cool examples. So here we go, let's render a huge pink triangle. A fabulous one.

In order to achieve that, we will have to:

- provide shaders instructing GPU with how to render vertices and fill pixels
- prepare canvas which will serve us as the viewport
- send vertices to the GPU so it can render them

You can copy and paste all the fragments below into the editor of your choice, save it as a `*.js`

file, import it to a script tag in a small plain `*.html`

and run it in the browser. Just be warned that I am intensively using ES6 features, so in case you want to publish it, consider using Babel to transpile code.

I will start with the shaders. Writing them is an art itself and I am not ambitious enough to cover it here and now. Here are two basic ones, all they do is put points exactly where they said they were (so it assumes that the coordinates are provided in the *clip space*) and render the resulting triangles in pink.

Once again, vertex shader will run for each vertex defined. How to define vertices? We will get to that later.

All the fragment shader does, is for every pixel in the viewport that is covered by the triangle which points had coordinates calculated in `gl_Position`

, it sets its color to pink, i.e. `(1, 0, 1, 1)`

or `rgba(255, 0, 255, 1)`

.

const vertex = ` attribute vec2 a_position; void main() { gl_Position = vec4(a_position, 0, 1); }`; const fragment = ` precision mediump float; void main() { gl_FragColor = vec4(1, 0, 1, 1); }`;

A very simple triangle, just the one we need. It is like a flattened sequence of points: `(x_0, y_0, x_1, y_1, x_2, y_2)`

.

const vertices = new Float32Array([-1.0, -1.0, 1.0, -1.0, 0.0, 1.0]);

In order to set up `<canvas>`

element I use:

const setUpCanvas = () => { const canvas = document.createElement("canvas"); canvas.setAttribute("style", `width: 100vw; height: 100vh`); document.body.appendChild(canvas); return canvas; };

Each of the two shaders must be compiled and then they are combined into a 'program', which is how we will reference them. It is the most tedious and boring part of the setup and the one that you will least likely want to change. So just have a quick glance and copy-paste it to your code.

const createShader = (gl, type, source) => { const shader = gl.createShader(type); gl.shaderSource(shader, source); gl.compileShader(shader); const success = gl.getShaderParameter(shader, gl.COMPILE_STATUS); if (success) return shader; console.error(gl.getShaderInfoLog(shader)); gl.deleteShader(shader); }; const createProgram = (gl, vertexShader, fragmentShader) => { const program = gl.createProgram(); gl.attachShader(program, vertexShader); gl.attachShader(program, fragmentShader); gl.linkProgram(program); const success = gl.getProgramParameter(program, gl.LINK_STATUS); if (success) return program; console.error(gl.getProgramInfoLog(program)); gl.deleteProgram(program); };

I have yet another cool helper which gets a location and creates a buffer for the data. What it means in human language is I am asking WebGL for a pointer by which I will later refer to the data in the shader (`a_position`

as we have seen above) and some place on the GPU for my triangles – the `positionBuffer`

. The moment of transfering data to the GPU is during the `gl.bufferData`

call, where I also hint the GPU that my data will be static and I won't be willing to change anything with it during runtime.

And one important fact: there are two types of data stored on the GPU. The *attributes* and the *uniforms*. GPU splits the attributes between the vertex shader executions, making it perfect fit for the vertices and any acompanying data like normals. The uniforms are different – they stay the same for the whole rendering, making it perfect for having a common matrix or a color for our objects.

const setup = (gl, program, vertices) => { // Clearing with color rgba(0, 0, 0, 0) makes the background transparent. Cool. gl.clearColor(0, 0, 0, 0); const positionLocation = gl.getAttribLocation(program, "a_position"); const positionBuffer = gl.createBuffer(); gl.bindBuffer(gl.ARRAY_BUFFER, positionBuffer); gl.bufferData(gl.ARRAY_BUFFER, vertices, gl.STATIC_DRAW); return { positionLocation, positionBuffer, }; };

Each time somebody changes size of the page, we will want to adjust size of the `<canvas>`

and the viewport. This little helper will serve us.

const resize = (gl) => { const displayWidth = Math.floor( gl.canvas.clientWidth * window.devicePixelRatio ); const displayHeight = Math.floor( gl.canvas.clientHeight * window.devicePixelRatio ); if (gl.canvas.width !== displayWidth || gl.canvas.height !== displayHeight) { gl.canvas.width = displayWidth; gl.canvas.height = displayHeight; } };

When everything is in place, the last function is the `draw`

which will do actual rendering to the screen.

In highlighted lines, we are enabling the array of vertex attributes. It means that more or less we are now telling the GPU that the vertices we've stored in the `positionBuffer`

should be used as the vertices of our shape. What is important here is that we define the size of each vertex as `2`

. It means that each vertex takes two coordinates (`x`

and `y`

), which makes perfect sense since we've declared `a_position`

to be `attribute vec2 a_position;`

.

And how does WebGL know how many vertices to draw, how many vertex shaders to call? We are providing the exact count in the third parameter of `gl.drawArrays`

.

const draw = (gl, program, positionBuffer, positionLocation) => { resize(gl); gl.viewport(0, 0, gl.drawingBufferWidth, gl.drawingBufferHeight); gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT); gl.useProgram(program); gl.enableVertexAttribArray(positionLocation); gl.bindBuffer(gl.ARRAY_BUFFER, positionBuffer); gl.vertexAttribPointer(positionLocation, 2, gl.FLOAT, false, 0, 0); gl.drawArrays(gl.TRIANGLES, 0, vertices.length / 2); };

As I've stated before, WebGL is stateful. It remains at whatever state we've left it. And there is nothing special about this particular function making it 'the rendering one'. It's all about doing the correct setup and setting states. The rendering itself happens in `gl.drawArrays`

.

For example, you could move `gl.useProgram`

to the `setup`

since we are using just one. Enabling the array of vertex attributes could also be moved there along with binding the buffer.

So if you wanted, the whole `draw`

could be reduced to this:

const draw = (gl) => { resize(gl); gl.viewport(0, 0, gl.drawingBufferWidth, gl.drawingBufferHeight); gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT); gl.drawArrays(gl.TRIANGLES, 0, vertices.length / 2); };

The main code for using the bunch of functions that we've just defined goes like this:

const canvas = setUpCanvas(); const gl = canvas.getContext("webgl"); const vertexShader = createShader(gl, gl.VERTEX_SHADER, vertex); const fragmentShader = createShader(gl, gl.FRAGMENT_SHADER, fragment); const program = createProgram(gl, vertexShader, fragmentShader); const { positionLocation, positionBuffer } = setup(gl, program, vertices); // This little trick uses closure to allow our render to be called from // window resize event. const render = () => draw(gl, program, positionBuffer, positionLocation); render(); window.addEventListener("resize", render);

And here is the result, generated from the exact source code as above:

Complete source: github.com↗.

The amount of code that was required to run it was quite extensive, but from this point, it scales alright (Well... almost. We have yet to add matrix calculations).

**tl;dr**: doing calculations on matrices is the most powerful way since you can combine several effects by multiplying matrices, having one resulting matrix to pass to your vertex shader that will do *magic* with your point (and besides that, there are not many viable alternatives so it's what everyone uses).

One thing you have to note if you want to know what exactly happens in those matrices is that in WebGL, **all vectors and matrices are assumed to be column-major**, so it implies that we are doing calculations in so-called *post-multiplication*.

For example, translation by a given vector `[tx, ty]`

in mathematical notation looks like this (and the *post-multiplication* is the matter of the order, mainly `t' = Mt`

):

$\begin{bmatrix}
1 & 0 & tx \\
0 & 1 & ty \\
0 & 0 & 1 \\
\end{bmatrix}
\times
\begin{bmatrix}
x \\
y \\
1 \\
\end{bmatrix}
=
\begin{bmatrix}
x + tx \\
y + ty \\
1 \\
\end{bmatrix}$

While in JS it would be set up like this:

// Note the column-major notation - the one below is the one from // the calculations, but transposed! const matrix = [1, 0, 0, 0, 1, 0, tx, ty, 1]; // Pass ^ to the vertex shader as a uniform const position = [x, y]; // Pass this one as an attribute

And in the vertex shader used like that:

attribute vec2 a_position; uniform mat3 u_matrix; void main() { // We have to multiply matrix by a vector of the same dimension, // hence the ugly conversion. gl_Position = (u_matrix * vec3(a_position, 1.0), 1.0); }

The same goes for rotating and scaling in 2D. You can try them on your own.

$R =
\begin{bmatrix}
c & s & 0 \\
-s & c & 0 \\
0 & 0 & 1 \\
\end{bmatrix}$

$S =
\begin{bmatrix}
s_x & 0 & 0 \\
0 & s_y & 0 \\
0 & 0 & 1 \\
\end{bmatrix}$

Another interesting matrix that comes up in this area is called *projection*. It transforms pixels in ranges `[0, w]`

and `[0, h]`

(respectively for screen's *width* and *height*) into the *clip space*, which is the box you've learned about before.

$P =
\begin{bmatrix}
\frac{2}{w} & 0 & -1 \\
0 & -\frac{2}{h} & 1 \\
0 & 0 & 1 \\
\end{bmatrix}$

Given $w = 640$ and $h = 480$:

$\begin{bmatrix}
\frac{2}{w} & 0 & -1 \\
0 & -\frac{2}{h} & 1 \\
0 & 0 & 1 \\
\end{bmatrix}
\times
\begin{bmatrix}
200 \\
300 \\
1 \\
\end{bmatrix}
=
\begin{bmatrix}
-0.375 \\
-0.25 \\
1 \\
\end{bmatrix}$

Which gives some position in the top left section of the screen (the Y-axis is flipped, which is taken into consideration by the `P`

matrix). I've put together an example with a triangle `[0.0, 0.0, 0.0, 300.0, 200.0, 300.0]`

and it looks like a proof that our math was correct. Perfect!

Source: github.com↗.

Going up by one dimension brings some changes to the code.

Before that, one thing: for various reasons, in computer graphics, we are operating almost exclusively on 4 element vectors and `4 × 4`

matrices. It has its roots in the facts that going up by one dimension makes the calculations a little bit easier (note that we've also used 3D matrices for 2D calculations) and having the forth parameter, `w`

, has its uses to show whether given vector was meant to pe a point or a directional vector (with the former having `w = 1`

and the latter `w = 0`

).

The GPU also has some defaults in place for the vectors, so if we declare that we are taking four element vectors and in the code we are giving it only 3D points, the last one will default to one (the defaults go like: $(0, 0, 0, 1)$).

attribute vec4 a_position; uniform mat4 u_matrix; void main() { gl_Position = u_matrix * a_position; }

The operations from now on will look like:

$T =
\begin{bmatrix}
1 & 0 & 0 & tx \\
0 & 1 & 0 & ty \\
0 & 0 & 1 & tz \\
0 & 0 & 0 & 1 \\
\end{bmatrix}$

When it comes to rotations, one difference from 2D is that there was a need for only one rotation: around the Z axis. Now the other two are also interesting.

$xR =
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & c & s & 0 \\
0 & -s & c & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
yR =
\begin{bmatrix}
c & 0 & -s & 0 \\
0 & 0 & 0 & 0 \\
s & 0 & c & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
zR =
\begin{bmatrix}
c & s & 0 & 0 \\
-s & c & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}$

Scaling is basically the same idea.

$S =
\begin{bmatrix}
s_x & 0 & 0 & 0 \\
0 & s_y & 0 & 0 \\
0 & 0 & s_z & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}$

Upgraded `projection`

(`d`

is for depth).

$\begin{bmatrix}
\frac{2}{w} & 0 & 0 & -1 \\
0 & -\frac{2}{h} & 0 & 1 \\
0 & 0 & \frac{2}{d} & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}$

Here is a simple `1.0 × 1.0 × 1.0`

cube rendered using the maths above. Usage of this simple projection results in distorted shape (there is no aspect ratio taken into consideration) and no perspective, meaning that matters is the depth of the point, not its relative placement to the camera.

Source: github.com↗.

One interesting thing that starts to make sense in 3D is a *perspective*. It's the notion of objects appearing smaller when they are farther away from us.

`n`

– `zNear`

`f`

– `zFar`

`v`

– `fieldOfView`

, an angle in radians

`a`

– `aspect`

(rendering context's width divided by height)

`r`

= `1 / (n -f)`

$\begin{bmatrix}
\frac{v}{a} & 0 & 0 & 0 \\
0 & v & 0 & 0 \\
0 & 0 & (n + f)r & 2nfr \\
0 & 0 & -1 & 0 \\
\end{bmatrix}$

This matrix adjusts units to be in the clip space. Does math allowing us to choose the field of view. Let's us choose the z-clipping space. It assumes there's a 'camera' at `(0, 0, 0)`

and computes what it would take so that stuff at `zNear`

ends up at `Z = -1`

and stuff at `zNear`

that is either half of `fieldOfView`

above or below the center ends up with respectively `Y = -1`

and `Y = 1`

. It computes what to use for `X`

by just multiplying by the `aspect`

passed in. Finally, it figures out how much to scale things in `Z`

so that stuff at `zFar`

ends up at `Z = 1`

.

A camera in graphics programming is an abstract concept of allowing us to look at a certain place in our artificial 3D world.

You can achieve 'camera' by effectively moving the world around the `(0, 0, 0)`

point. The perfect math tool for that is an *inverse* matrix. All you have to do is to rotate and move the camera anywhere you want and inverse the resulting matrix. It will rotate everything else the opposite amount which will effectively make it so the camera stays in `(0, 0, 0)`

and everything else is moved from there.

The matrix that is usually passed to the shaders is called **MVP**, which states for model, view and projection. The first one refers to the object we are rendering, second one is mostly about the camera and projection is the distortion making the whole thing look 3D.

Having in mind that we are using post-multiplication, the whole thing can look like (where `C`

is for camera, `O`

for the object, and `R`

, `T`

, `S`

respectively for rotation, translation and scaling):

$M = P \times (C_R \times C_T)^{-1} * (O_S \times O_R \times O_T)$

Now it's just the matter of passing it to the GPU and multiplying the triangles by that will make it look exactly as we want. Magic.

Lighting is the last of the crucial topics left to analyze and by far the hardest one. As it turns out, camera in the form described above pretty much solves the problem for all use cases. With lighting, it's not that easy. It influences the way things look like in our 3D simulation so much that in each project you will want to use some different approach.

And what is lighting by the way? It's the matter of calculating color of our pixels, based on some rules that we impose on ourselves. We can decide that we will have a directional light, shining in paraller from some source. It can be a point light, going uniformly from a chosen point in space.

For this walktrough, I will cut the topic and quickly go over the directional one.

Directional lighting assumes the light is coming uniformly from one direction. The sun on a clear day is often considered a directional light. It's so far away that its rays can be considered to be hitting the surface of an object all in parallel.

Computing directional lighting is quite simple. Knowing the direction the light is travelling from, and the direction that the surface of the object is facing, we can take the dot product of those two and we will get the cosine of the angle between them.

It means that the dot product will give us $1$ if the light is pointing directly to the surface and $-1$ if in the opposite direction.

We can give the object some color (make it its material) and just multiply it by that dot product.

Here is the basic idea:

precision mediump float; varying vec3 v_normal; uniform vec3 u_reverseLightDirection; uniform vec4 u_color; void main() { vec3 normal = normalize(v_normal); float light = dot(normal, u_reverseLightDirection); gl_FragColor = u_color; gl_FragColor.rgb *= light; }

To put it all together, another example, this time with camera and lighting in place. I took the shape of Software Mansion's logo (it changed in late 2019), since I work there (I don't anymore) and it looks cool. The code includes the shape in an `*.obj`

file, a tiny custom loader of that format, matrix operations for the camera mentioned above and lighting calculations in the shaders. Looks lovely.

Source: github.com↗

Thank you for staying with me this long! I hope that you learned something useful. If anything was not clear or should be explained in more details – please let me know.

WebGL fundamentals↗ – great website teaching WebGL with all the underlying concepts. Covers everything you need to know to get going with WebGL.

opengl-tutorial↗ – another great resource for learning, this time for OpenGL (but as I mentioned before, conceptually it is basically the same thing).

scratchapixel on row major vs column major vectors↗ – a ton of information about maths related to the computations in programming graphics. Highly recommended!

Written by Tomasz Czajęcki.

Tomasz Czajęcki

@tchayen

Engineer @rainbowdotme 🌈. Previously hacking RN with @swmansion. Studying CS master’s in security at @KTHuniversity and @AaltoUniversity. He/him.

Tags