Font Subsetting - shrink down font files to speed up page loads
Fonts are one of the largest resources on any page after images, and can have a big impact on CLS when they vary in size from the underlying system font. Font subsetting allows us to radically shrink font file sizes, speed up initial page loads, and improve our page speed scores.
Background
With modern web design, it's not uncommon to have 4-6 fonts being loaded on each page - core font files, bold versions, italic versions, for each of a couple of different font styles. While compressed font formats like woff2 are now commonly supported, this can still lead to an additional 350-400kb of weight on an uncached page. This impacts page speed scores across the board - from Large Contentful Paint (LCP) and First Contentful Paint (FCP) through to the overall page score. While techniques like self-hosting and cache headers can speed this delivery up a bit, ideally we want to ultimately serve smaller files, without compromising the design vision of the site. Enter font subsetting!
What is Font Subsetting?
Internally, each binary font file is essentially a giant table. It contains a reference for each unicode character code, and alongside it, the font’s representation of that code. Where no representation exists for a given character, an empty cell exists.
Font files will typically support a wide variety of languages, within the same font file. If we're only going to be using some of these languages, then we have an opportunity to shrink the size of the files, delivering a faster experience for our users. Within the fonts we are using, there are numerous cells taken up by values which never appear on site - values for cyrillic and other non-latin languages. What we can do is effectively “purge” these binary files to leave us with only a subset of characters, focused on the latin characters (letters and numbers from English, with accented characters used in Spanish, French, German, etc). Depending on the font files we're using, and the level of language support they have, stripping out these other characters can reduce the size of the font files by up to 60% (~60kb → ~23kb on some typical Google web fonts).
What happens if someone tries to use a character which was removed?
If the browser encounters a character which does not exist in the current font family, it will attempt to load it in the next available font family declared in our css. This can lead to an unpleasant mismatch of fonts within a single word (some letters with different height, density etc). For this reason, even if a site is only serving English-speaking audiences, we will still grab the whole latin set, rather than just a stricter subset based on UK & Ireland. This avoids words like café
rendering é
in a different font. Incidentally, this is how emojis are typically rendered on sites - the main font files do not generally have a rendering for emojis, so the browser fails all the way through to the first font which will render these characters (typically a system font, which is also why some emojis look different on iOS vs android vs Windows machines).
How is the subsetting done?
Manually! Subsetting of each font file can be done using a tool called Glyphhanger. Glyphhanger has a number of options for generating font subsets - extracting only glyphs for a particular character set, or even extrating only the characters which exist on a remote url! The most basic example is to generate a subset which just contains latin characters.
$ npm install -g glyphhanger
$ glyphhanger --LATIN --subset=fonts/*.woff --formats=woff2
Subsetting Roboto-Regular.woff to Roboto-Regular-subset.woff2 (was 65.7 KB, now 15.8 KB)
The above command will subset all woff
files inside the fonts/
directory, creating subset files for each font discovered with just the latin character subset. The formats
option allows us to specify the output format(s) we want - in this case, we're asking glyphhanger to not only subset our woff
files, but to also save the output in the more compressed file format woff2
.
Additional optimisations
If you have a font file where you know you'll only ever use a small number of characters (maybe a specific font for a sports scoreboard style), then making use of the whitelist
option can make for a huge reduction.
$ glyphhanger --whitelist="01234567890-:" --subset=Sports-Font.ttf --formats=woff2
Subsetting Sports-Font.ttf to Sports-Font-subset.woff2 (was 304.25 KB, now 3.85 KB)
In the above example, you'll notice that we haven't just limited our subsetting to woff
font files. Many older sites may still be carrying older, less-efficient file formats like ttf
. With support for woff2 being widespread, this is a great opportunity to really optimise the font stack on site, moving from ttf to woff2 as the primary font supported.
Result
The top half of this image is the network tab for font loading on the article page on a popular news site. There are a number of font variants being served for different parts of the design. The bottom half shows the result for the same article after subsetting the fonts to just the Latin characters. In this instance, the size of the transferred font files on a cold cache has dropped from ~400kb to ~140kb, which is a drop of ~65%.
This file size drop lead, in this particular case, to an LCP score increase of close to 20%, a FCP increase of 10%, and an overall page speed score in the same range. If there are limited languages in use on a particular site, then font subsetting can be a really effective way of quickly improving the optimisation of the site's page speed, and, ultimately, Google ranking!
One caveat here is that some font licences do not permit modification of the source files in any way, even for subsetting. So ensure that the licence in your font file is ok with this type of modification before proceeding!
International PHP Conference
Munich, November 2024
In November 2024, I'll be giving a talk at the International PHP Conference in Munich, Germany. I'll be talking about the page speed quick wins available for backend developers, along with the challenges of policing dangerous drivers, the impact of TV graphics on web design, and the times when it might be better to send your dev team snowboarding for 6 months instead of writing code!
Get your ticket now and I'll see you there!