Moving from Wordpress to Pelican

I finally got the time to move my blog from Wordpress to another platform. Wordpress is great for casual blogs, but if one needs to write mathematical expressions or publish any code the support is abysmal - bizarre ad-hoc markup that constantly gets broken by the visual editor.

Hence, after reading this, I moved to Pelican and now host the static html on GitHub.

pelican-import helped with the extraction of the old blog, but this little script has a lot of drawbacks. First of all, it did not work in the default rst format, which is not that bad given that I prefer Markdown. However, even in Markdown it had issues, especially with images. I do not think that I will ever try to correct all the issues, so here are the workaround that I employed.

Firstly, all the \(\LaTeX\) and sourcecode tags were not translated. I used the following hackish solution for that problem ran over the content folder.

perl -pi -e 's/\[sourcecode\]/\`\`\`/g' *
perl -pi -e 's/\[\/sourcecode\]/\`\`\`/g' *
perl -pi -e 's/\[sourcecode language="python"\]/\`\`\`python/g' *
perl -pi -e 's/\\\$latex/\$/g' *
perl -pi -e 's/\\\$/\$/g' *
perl -pi -e 's/\\\\n/\\n/g' *
perl -pi -e 's/\\\^/\^/g' *
perl -pi -e 's/\\\*/\*/g' *
perl -pi -e 's/\\_/_/g' *
perl -pi -e 's/\\\\/\\/g' *
perl -pi -e 's/\\#/#/g' *

Then I had to correct all the badly escaped unicode slugs. Some of my post are written in Cyrillic, which is escaped in urls. The problem is that pelican-import failed to notice the difference and then pelican produced links that do not correspond to the filenames. Another hackish solution for this problem:

for i in `dir -1`;
do
    name=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0]);' `grep Slug ./$i | cut -d" " -f2-`);
    mv ./$i $name.md;
done; 

for i in `dir -1`;
do
    old=$(grep "^Slug: " $i);
    new=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0]);' `echo $old | cut -d" " -f2-`);
    perl -pi -e "s/$old/Slug: $new/" $i;
done;

Finally, I modified the categories and tags, because pelican does not support subcategories. In addition, I corrected the Author field with perl -pi -e 's/Author: stefankr/Author: Stefan Krastanov/g' *.

After addressing all the problems with the content, I had to try to transfer all the comments. I dislike the idea of moving from one walled garden to another (like Disqus) so I deployed the great simple comment server Juvia and imported all the Wordpress comments in it (there were hiccups, but all issues were reported to the bugtracker). I had to modify slightly the embedded javascript to address the fact that the new pages have .html as a suffix but that was straightforward to do with topic_key : (location.pathname.indexOf(".html") == -1) ? location.pathname : location.pathname.slice(0, -5). The expression had to be like this to ensure that both the versions with and without the ".html" suffix work.

Anyhow… Welcome to my blog.