I _Really_ Don't Know

A low-frequency blog by Rob Styles

Bringing a Wordpress Blog Back To Life With Eleventy

I stopped writing on this blog at the end of 2012. That's about 8 years ago as I write this and much has changed in the meantime.

It survived on a cheap web host, running wordpress, until October 2017 when I failed to pay the fees on the host and, eventually, they powered down the virtual machine. My blogging career, 1999 — 2017, finished.

Every so often I'd get a little nostalgic, and remember that I enjoyed blogging, and finally, February 2021, I made the decision to try and ressurect my old blog content, and maybe even write some new posts.

Here's how that went…

The blog started out in 1999 as hand-written HTML, moved to Movable Type around 2001, and then to Wordpress back in 2008. Wordpress was the most recent tech, so that's the backup I started looking for.

First up I trawled through some old backup drives that have been hanging about carefully archived for just this eventuality. The most recent backup of the wordpress database I could find was wpau-db-backuppeyaBDED.zip from 18 November 2011, and I had a wpau-files-bak-ipyECvbs.zip file of the web server root folder from the same date. That's a good start, as the posts are all in the db, and any images and other files should be in the files archive.

But, I posted a few posts after 18 November 2011, so that meant I'd have to find a handful of the most recent posts from somewhere else, but it was a good start. The somewhere else turned out to be the Wayback Machine which had been faithfully archiving my blog for me over the years.

I could, perhaps, have just installed an up-to-date version of Wordpress, and imported the posts. I'd rather not be using wordpress.com, though, and I certainly don't want another server of my own — it's 2021 after all.

So, a combination of static site generator and static site hosting seemed like a good first step. I got a few suggestions from friends and Eleventy cropped up enough times to be a good working assumption. GitLab have a great CI\CD and Pages option. so the plan is simple:

old posts -> markdown -> eleventy -> gitlab ci/cd -> html -> pages

I took some inspiration from Josh Can Help's post on migrating from Wordpress to Eleventy but the details ended up a little different.

Step 1. Extracting the old posts

With a backup of the db, the file is simply a sequence of SQL statements that recreate the wordpress tables and then insert the contents. With the sql extracted into a folder, sql-to-import, we can get a clean, new mysql install up and running easily with docker:

docker run \
  -d \
  --name blog-mysql \
  -v "${PWD}/sql-to-import":/sql-to-import \
  -p 3306:3306 \
  -e MYSQL_DATABASE=blog_wp \
  -e MYSQL_USER=wordpress \
  -e MYSQL_PASSWORD=wordpress \
  -e MYSQL_ROOT_PASSWORD=root \
  mysql/mysql-server

Then we can get a bash shell in that container with docker exec -ti blog-mysql bash and in that shell we can import the sql backup with a typical mysql import command: mysql -uwordpress -pwordpress blog_wp < /sql-to-import/blog_wp_wp_20111118_547.sql. When we do this we find mysql has, unsurprisingly, changed some things in the ten years between the export and the import.

It turns out there are three problems:

  1. Date columns back in 2011 used '0000-00-00 00:00:00' as an "empty" value and mysql doesn't like that any more.
  2. The wordpress backup created inserts with missing values denoted by consecutive commas 'libwww-perl/5.811', '', , '209.172.33.53' and mysql reports that as invalid sql. Where we want the default, we have to be explicit: 'libwww-perl/5.811', '', default, '209.172.33.53'.
  3. The export of the db is a UTF-8 file, and mysql is configured for UTF-8, but the shell in the mysql image is POSIX by default. This causes the file to import (silently) with the wrong character encoding.

We can solve these issue fairly easily, in part because the db is only a quick way to get at the data and isn't being kept.

For the date values, we can turn off mysql's strict mode to allow these. The command is simply SET sql_mode = ''; and we add that into the top of the sql file.

For the character encoding, we set the shell's character encoding before doing the import: export LC_ALL=en_GB.UTF8.

For the invalid sql inserts, things are bit more tricky. They only occur in one table, wp_redirection_logs, so we could write a regex replace and fix them all, but in reality I don't want anything from that table, so I've just removed those inserts from the backup completely.

With the data imported, we can take a look at the tables and decide what we want to export back out. We could process against the db, but getting the content out as JSON should be easier, as we can run our conversion over and over with no dependancy on mysql, and safer long-term as we can archive that more modern export.

After looking through, I decided I wanted the posts, the comments, and the tags (called terms in wordpress). We can use the relatively new mysqlsh to export directly to json files, and use those in some minimal node code to create the markdown files for eleventy.

mysqlsh --sql --json --uri=wordpress:[email protected]/blog_wp \
  -e 'SELECT * FROM wp_posts' \
  > /sql-to-import/wp_posts.json


mysqlsh --sql --json --uri=wordpress:[email protected]/blog_wp \
-e 'SELECT p.ID AS post_id, p.post_title, wtr.term_order, wt.name, wt.slug \
    FROM wp_posts AS p \
    JOIN wp_term_relationships AS wtr ON p.ID = wtr.object_id \
    JOIN wp_terms AS wt ON wtr.term_taxonomy_id = wt.term_id;' \
> /sql-to-import/posts-to-tags.json

mysqlsh --sql --json --uri=wordpress:[email protected]/blog_wp \
  -e 'SELECT * FROM wp_comments' \
  > /sql-to-import/wp_comments.json

I also did a straight SELECT * FROM export for every table, so I have them all in json for the future.

Step 2. Converting to markdown

Now the backup is nicely converted to json we can simply require it in some simple conversion code. Total size is only 2.1M so it's no problem.

The conversion code is pretty naive, and simply steps through each post adding the tags and doing a fair number of corrections for old object embeds in HTML; HTML into markdown; absolute URLs from my old domain into relative URLs; and a few more bits.

You can read the wordpress-to-markdown code on GitLab here - it does the following things:

  1. Makes changes to specific posts based on their ID. This allows e.g. replacement of old embedded flash players with current embed code.
  2. Runs a collection of changes on every post. This does things like change line endings from "\r\n" to a uniform "\n"; changes some html tags to markdown; replaces a lot of internal links with their new form.
  3. Mixes in the tags/categories from the wp_terms table we exported earlier.
  4. Adds and <!-- excerpt --> marker to allow the first part of posts to show on the homepage and other collection pages.
  5. Write each post out into its own .md file.

It's not pretty, or very smart, it just converts enough of the old movable type and wordpress HTML into markdown to render properly, and fixes internal links and other breakages caused by the entropy of 8 years.

It also doesn't deal with comments… I could have mixed in the comments in a similar way to the tags, but Eleventy has other options for mixing in data, so I figured I'd leave comments for later.

Step 3. Setting up Eleventy

Eleventy is super easy to set up. You tell it where your input markdown files (and templates) are, where you want the output HTML files to go, and that's about it.

I added in moment for date formatting in the templates, and some custom options for markdown-it to add attributes to outbound links.

Eleventy's simple config file is on gitlab here

Step 4. Setting up GitLab CI/CD and Pages

One of the great things about a static site is that hosting can be super, super, simple. There are so many hosting options I'm not even going to list the top ones. Static page hosts are all pretty similar and I hadn't looked at GitLab CI/CD and Pages before, so that's what I chose.

GitLab asks for a .gitlab-ci.yml in the root of the repository, and if it's valid, then CI/CD starts running for you.

It took me a couple of minutes to find an example config and tweak it to both build the eleventy site and then deploy it to pages. The GitLab CI/CD Pages config is on Gitlab here.

That got the site up and running, and I added in the DNS needed to point https://rob.styles.to at the pages host, and the site was live.

Step 5. Comments

The last step I wanted to get in place on the blog was historic comments. I'm not worried about allowing new comments for now, though I plan to look at that later.

Initially all I wanted to do was add in old comments that had been approved. With the wp_comments table exported as json, making that data available in the templates is trivially easy.

First we tell Eleventy we have a data folder, data: "../data", and then we add a comments.js file in there to load, reshape, and export our comments file. You can read the comments loading code on gitlab here.

That makes the comments available to any template, so in each post we can simply iterate over them:

if (comments[`${legacy_wordpress_id}`])
  h2 Comments
  for comment in comments[legacy_wordpress_id]
    p.comment
      time(moment(comment.comment_date).format('YYYY-MM-DD'))= moment(comment.comment_date).format('D MMMM YYYY')
      span.author!= comment.comment_author
      p.content!= comment.comment_content

That's it for now. Historic content converted and hosted, and the simple ability to new posts in markdown as and when I get inspired.