I finally got the time to move my blog from Wordpress to another platform. Wordpress is great for casual blogs, but if one needs to write mathematical expressions or publish any code the support is abysmal - bizarre ad-hoc markup that constantly gets broken by the visual editor.
pelican-import helped with the extraction of the old blog, but this little
script has a lot of drawbacks. First of all, it did not work in the default
rst format, which is not that bad given that I prefer Markdown. However, even
in Markdown it had issues, especially with images. I do not think that I will
ever try to correct all the issues, so here are the workaround that I employed.
Firstly, all the \(\LaTeX\) and
sourcecode tags were not translated. I
used the following hackish solution for that problem ran over the
perl -pi -e 's/\[sourcecode\]/\`\`\`/g' * perl -pi -e 's/\[\/sourcecode\]/\`\`\`/g' * perl -pi -e 's/\[sourcecode language="python"\]/\`\`\`python/g' * perl -pi -e 's/\\\$latex/\$/g' * perl -pi -e 's/\\\$/\$/g' * perl -pi -e 's/\\\\n/\\n/g' * perl -pi -e 's/\\\^/\^/g' * perl -pi -e 's/\\\*/\*/g' * perl -pi -e 's/\\_/_/g' * perl -pi -e 's/\\\\/\\/g' * perl -pi -e 's/\\#/#/g' *
Then I had to correct all the badly escaped unicode slugs. Some of my post are
written in Cyrillic, which is escaped in urls. The problem is that
pelican-import failed to notice the difference and then
pelican produced links that
do not correspond to the filenames. Another hackish solution for this problem:
for i in `dir -1`; do name=$(perl -MURI::Escape -e 'print uri_unescape($ARGV);' `grep Slug ./$i | cut -d" " -f2-`); mv ./$i $name.md; done; for i in `dir -1`; do old=$(grep "^Slug: " $i); new=$(perl -MURI::Escape -e 'print uri_unescape($ARGV);' `echo $old | cut -d" " -f2-`); perl -pi -e "s/$old/Slug: $new/" $i; done;
Finally, I modified the categories and tags, because
pelican does not support
subcategories. In addition, I corrected the
Author field with
perl -pi -e 's/Author: stefankr/Author: Stefan Krastanov/g' *.
After addressing all the problems with the content, I had to try to transfer
all the comments. I dislike the idea of moving from one walled garden to another
(like Disqus) so I deployed the great simple comment server
Juvia and imported all the Wordpress
comments in it (there were hiccups, but all issues were reported to the
fact that the new pages have
.html as a suffix but that was straightforward
to do with
topic_key : (location.pathname.indexOf(".html") == -1) ? location.pathname : location.pathname.slice(0, -5).
The expression had to be like this to ensure that both the versions with and
".html" suffix work.
Anyhow… Welcome to my blog.