I _Really_ Don't Know

A low-frequency blog by Rob Styles

Moving Stuff Finished :-)

Yesterday evening I followed the excellent instructions from the WordPress Codex on Migrating from Movable Type to WordPress. This proved to be extremely easy with all the posts, categories and comments coming through perfectly.

I had some files (photos, ppt etc) manually uploaded, so logged in to move those over. As well as migrating I moved the blog from /blog/ up to the root of the site. That all went swimmingly and I picked a nice shiny new theme called demar.

The only thing left was to sort out redirects for all the old links - so I don't lose all the link love I've managed to build up over the years.

After reading through the various options I had a few issues. I'd tinkered a bit with the MT URLs in the past and had a lot of legacy stuff hanging around. I decided I'd just do it with apache's .htaccess. Not a choice for everyone, but my regex skills aren't too shabby, so I figured I'd start there.

The URLs fell into a few different patterns:

/blog/atom.xml, index.xml, etc - the feeds. For now these can all redirect to /feed/ so we start with

RewriteEngine On
RewriteRule ^blog/atom.xml$ /feed/ [L,R=301]
RewriteRule ^blog/index.rdf$ /feed/ [L,R=301]
RewriteRule ^blog/index.xml$ /feed/ [L,R=301]

Feeds dealt with I moved on to the root of the blog, adding

RewriteRule ^blog/$ / [L,R=301]
RewriteRule ^blog$ / [L,R=301]

and then onto the archives, where we start to get trickier. First we have categories which take the form /blog/archives/cat_somewhere_I_put_stuff.html. WordPress creates a different pattern by default - /categories/somewhere-i-put-stuff. Not too hard, first we pull out the words, then glue them back together again.

RewriteRule ^blog/archives/cat_([^_]*)_([^_]*)_([^_]*)_(.*)\.html$ /category/$1-$2-$3-$4 [L,R=301]
RewriteRule ^blog/archives/cat_([^_]*)_([^_]*)_(.*)\.html$ /category/$1-$2-$3 [L,R=301]
RewriteRule ^blog/archives/cat_([^_]*)_(.*)\.html$ /category/$1-$2 [L,R=301]
RewriteRule ^blog/archives/cat_(.*)\.html$ /category/$1 [L,R=301]

Each of these regexs pulls out categories of 4 words long, 3 words, 2 words and 1 word respectively. If you have categories with more words in then you'll need to add longer versions of these, ordering them longest first.

Next the monthly archives, in MT /blog/archives/2004_04.html and in WP /2004/04/

RewriteRule ^blog/archives/([0-9]{4})_([0-9]{2})\.html$ /$1/$2/ [L,R=301]

easy enough.

Then we have the individual posts. These fall into two groups, name based files and numbered files. The name based files are all /blog/archives/categoryname/postname.html. I had thought these were going to be a pig, but I discovered completely by accident that if you simply use the postname part of that then WordPress figures out which post you meant and redirects to its nice new WP URL. Sweet.

RewriteRule ^blog/archives/[^/]*/(.*)\.html$ /$1 [L,R=301]

The exception to this turns out to be posts that have a hyphen in the name. MT strips the hyphen, leaving WP with a name that doesn't match. I put a rule in specifically for the one post I have that was affected:

RewriteRule ^blog/archives/personal/beijing_sightse.html$ /2008/05/04/beijing-sight-seeing/ [L,R=301]

Which just leaves the numbered posts: /blog/archives/000217.html. The numbered entries proved to be tricky. While you can just append the number /1234 like so and WordPress will fid a post for you, the posts weren't matching up. As many of these had been indexed by Google and linked by others I wanted to hook them up to the right posts.

Fortunately I had Movable Type still rendering my site as static HTML files, so with a quick bit of bash magic I pulled out the numbered posts and made rules to map them to the Movable Type post name based permalinks (which we already did rewrite rules for above):

find . -type f \
| grep "^\./[0-9]*\.html$" \
| xargs grep permalink \
| awk '{print $1 " " $17}' \
| sed -e 's%^\./%RewriteRule ^blog/archives/%' \
      -e 's%\.html:%.html$%' \
      -e 's%href="[http://www.dynamicorange.com%%'](http://www.dynamicorange.com%%') \
      -e 's%">Permalink</a><br%%' \
> rules.txt

Each line of rules.txt ends up looking like this

RewriteRule ^blog/archives/000285.html$ /blog/archives/semantic-web/vocabs.html [L,R=301]

which results in a second redirect to just /vocabs and then a third as WP works out where to take you finally. Not great to be bouncing around so much, but much better than losing the link.

Good luck if you decide to make the same move.


At one remove

[...] If you are thinking of doing a similar move (and are of the tech inclination) I’d recommend Rob Styles post on moving from Typepad to Wordpress for information on dealing with redirecting URLs etc - something I struggled with (and still [...]