Checking your static site for dead URLs

Published on

One of the challenges of maintaining a static site is continuously ensuring that links work, whether they are local or external. Sometimes we move content around, and hopefully remember to redirect old paths to new ones, but occasionally a broken link can slip through the cracks. External links can become dead, too, because the site we’re linking to stops working.

Checking this manually obviously isn’t scalable, so let’s build a script which parses every HTML file, checks liveness of each URL, and fails if there are any dead ones.

Getting paths of all HTML files

For this blog post it doesn’t matter how you built your static site, you might have used Gatsby, Eleventy, NuxtJS etc. To get the paths of all HTML files let’s say that the destination directory is dist, we can use a tool like globby:

yarn add globby

Now we can start writing our script for checking for dead URLs:

const globby = require('globby')
 
const checkForDeadUrls = async () => {
  const files = await globby('dist/**/*.html')
}
 
checkForDeadUrls()

Collecting URLs

To collect URLs we’re going to parse every HTML file with PostHTML, which converts HTML content to an AST (abstract syntax tree), operates on that AST using an ecosystem of plugins, and converts that AST back to a string. This is similar to how PostCSS and Babel work with CSS and JavaScript respectively.

The plugin we’re going to use is posthtml-urls, which loops through every URL:

yarn add posthtml posthtml-urls

Also, it’s safe to assume that there are going to be duplicates, so instead of an array we can use a Set, which is a native way to ignore duplicates:

const globby = require('globby')
const posthtml = require('posthtml')
const fs = require('fs')
 
const checkForDeadUrls = async () => {
  const files = await globby('dist/**/*.html')
 
  const urls = new Set()
 
  const ph = posthtml([
    require('posthtml-urls')({
      eachURL: (url) => {
        urls.add(url)
      },
    }),
  ])
 
  files.forEach((file) => {
    ph.process(fs.readFileSync(file))
  })
}
 
checkForDeadUrls()

First we’re creating a PostHTML processor by passing a configured posthtml-urls as a plugin, then we’re looping through every file and processing it. We’re discarding the result of .process() because for our purpose we don’t need to transform the content, we only needed to parse it to collect the URLs.

posthtml-urls is originally intended for transforming URLs, which is useful e.g. for adding a prefix to your relative paths if you’re hosting the site in a subfolder.

Starting a local server

Now that we collected the URLs, we need to check their liveness. In this post I’m differentiating between local and external URLs. For checking local URLs we’ll need to start a local server, one easy way to do this is with BrowserSync:

yarn add browser-sync
const globby = require('globby')
const posthtml = require('posthtml')
const fs = require('fs')
const server = require('browser-sync').create()
 
const checkForDeadUrls = async () => {
 
  // collecting URLs
 
  await new Promise((resolve) => {
    server.init(
      {
        port: 3000,
        server: 'dist',
        open: false,
        logLevel: 'silent',
      },
      resolve,
    )
  })
 
  // checking for dead URLs
 
  server.exit()
}
 
checkForDeadUrls()

Passing open: false will prevent the browser from opening, and logLevel: 'silent' will prevent BrowserSync from logging any information about the server to the terminal.

Checking liveness of URLs

For checking liveness of URLs we’re going to use check-links.

yarn add check-links

Let’s assume that all of our local URLs start with /, so when passing URLs to check-links we need to prepend every local URL with http://localhost:3000, or whatever port you chose for the local server:

const results = await checkLinks(
  Array.from(urls).map((url) =>
    url.startsWith('/') ? `http://localhost:3000${url}` : url,
  ),
)

check-links returns a Map where keys are URLs, and values are objects containing liveness info. We’re going to check whether there are any URLs where status is dead, list them and fail the script:

const deadUrls = Array.from(urls).filter(
  (url) => results[url].status === 'dead',
)
 
if (deadUrls.length > 0) {
  console.error(`Dead URLS:\n\n${deadUrls.join('\n')}`)
  process.exit(1)
}

The entire function for checking for dead URLs looks like this:

const globby = require('globby')
const posthtml = require('posthtml')
const fs = require('fs')
const server = require('browser-sync').create()
const checkLinks = require('check-links')
 
const checkForDeadUrls = async () => {
  const files = await globby('dist/**/*.html')
 
  const urls = new Set()
 
  const ph = posthtml([
    require('posthtml-urls')({
      eachURL: (url) => {
        urls.add(url)
      },
    }),
  ])
 
  files.forEach((file) => {
    ph.process(fs.readFileSync(file))
  })
 
  await new Promise((resolve) => {
    server.init(
      {
        port: 3000,
        server: 'dist',
        open: false,
        logLevel: 'silent',
      },
      resolve,
    )
  })
 
  const results = await checkLinks(
    Array.from(urls).map((url) =>
      url.startsWith('/') ? `http://localhost:3000${url}` : url,
    ),
  )
 
  const deadUrls = Array.from(urls).filter(
    (url) => results[url].status === 'dead',
  )
 
  if (deadUrls.length > 0) {
    console.error(`Dead URLS:\n\n${deadUrls.join('\n')}`)
    process.exit(1)
  }
 
  server.exit()
}
 
checkForDeadUrls()

A good place to run this script would be as part of your deploy process.

Conclusion

When the quality of your site suffers in a way that can be detected programmatically, try building a script like this to save yourself (and primarily your visitors!) some trouble, and continue developing your site with confidence.