Moving my Project 365 Posts from WordPress to Tumblr
Last week I published a post in which I talked about my plans to make some changes to my blog. The first step was to move my Project 365 posts from this blog (which is on WordPress) over to a new platform. I thought I would walk you through my process in case someone decides to go down the same path.
I looked at different blog platforms for the Project 365 posts — another hosted WordPress site, a WordPress.com site, Tumblr, and various sites dedicated to Project 365. The specialized Project 365 sites had some nice calendar views of photos, but I was looking for a more personalized appearance. The photos should look like my site, not my photos on someone else’s site. I also eliminated the WordPress options as well because my goal was to make posting these photos quick and easy and WordPress didn’t offer that. Too many steps to upload and post so WordPress is out. That left Tumblr.
Tumblr proved to be the winner. Quick and easy photo posting, attractive templates, and I could customize the URL to use my own domain. Unfortunately there wasn’t an easy to import the posts to Tumblr. I probably could have found something eventually that moved the photos over, but it was faster to manually move the content. So I dragged ~160 photos to Tumblr, copied the post text, added hashtags, and changed the date.
I know it sounds like a lot of work, but it actually was quite simple and went faster than I expected. Remember, the reason I picked Tumblr in the first place was how easy it was to post. The only problem I ran into was the odd decision on Tumblr’s part to not have titles for photo posts. Instead they take the first part of the caption and use that for the title. Hitting return after the title text and making it bold approximated a title enough for my needs. I’m hoping that Tumblr adds this at some point. The photo on the right shows what editing a photo post in Tumblr looks like.
Once the posts were in, the next step was to setup 301 “Moved Permanently” redirects for all the old links. Not that my old links are worth all that much from a search engine perspective, but I’d rather maintain them and make sure they get updated on Bing and Google. The simplest way to do that was by editing the .htaccess file for my site and adding a RewriteRule. Now, regular expressions have never been my favorite to work with, but I finally put together an expression that matched my 2013 Project 365 posts. Since I’m only moving posts from 2013 for now, I had to match on that, as well as on text that is unique in each permalink. What I came up with was this:
^archives/2013/[01][0-9]/[0-3][0-9]/[0-9][0-9][0-9][0-9][0-9]-2013_([1-9]|[1-9][0-9]|[1-3][0-9][0-9])365
I’m sure I could have shortened, but I’m a little paranoid about the wrong post getting matched. Now I could redirect the posts, but looking at the Tumblr link format, I realized this approach wouldn’t work. A Tumblr URL looks like this:
http://my365.ladewig.com/post/54150644369/veeam-dancing-robot-2013-179-365-back-in
See that number after “post” in the address? You can ignore everything after that, because so does Tumblr. It is only there for SEO purposes. No matter what information I extracted from the WordPress URL, none of it could be matched easily to the address on Tumblr. My only alternative was to setup a one-to-one match in .htaccess for every post. Since I doubt anyone has bookmarked any of these photos, I only need to maintain this redirection long enough for the search engines to recrawl the links and update their site. My next step then was to come up with the URLs and match them.
To get a list of permalinks from Tumblr, I used Tumblr2WordPress to create an XML file containing all of my posts. The file this site creates is intended for importing into WordPress but just needed to pull the permalinks from it. Once I had the file, which is actually RSS, I decided to use PowerShell to get the data I needed. Now I haven’t used PowerShell to work with an XML file in some time, so it was off to the Internet and the Hey, Scripting Guy! Blog for help. I found two posts (here and here) that helped. I wound up using this command to list all of the permalinks.
select-xml -path .\tumblr_ladewig365.xml -xpath "//item" | select-object -expandproperty node | sort link | select link | export-csv -Path .\tumblr_export.csv -notype
This reads the XML file “tumblr_ladewig365.xml” that I created earlier, looks at the item node where the post info is contained, and then sorts by the link and exports it to a CSV file. Now I can bring that file into Excel. I sorted by permalink since the number Tumblr uses is sequential. Therefore the links would all be in the order they were posted. Since the only posts on the site were Project 365 photos, I didn’t have to worry about doing any filtering of the output.
Next I need to get the links from my WordPress site. I initially went to the database on my web host and tried to pull out the links via query, but I determined that the permalinks aren’t actually stored in the database. You can find the default link, but the SEO-friendly permalink that WordPress generates is not there. So back to the Internet I went, and after a couple searches I found the following PHP that Mike Schinkel provided on Stackoverflow.com which I could use to dump the info I needed to a file. I modified it a bit, so if you want to see the original, check out the previous link.
= '2013-01-01'"; return $where; } add_filter( 'posts_where', 'filter_where' ); $posts = new WP_Query('post_type=post&posts_per_page=-1&post_status=publish&tag=Project365&order=ASC'); $posts = $posts->posts; header('Content-type:text/plain'); foreach($posts as $post) { $permalink = get_permalink($post->ID); echo "\n{$permalink}"; } ?>
I copied the output from this file into the spreadsheet where I had previously pasted the Tumblr data, and compared the two to see if everything matched. There should be one WordPress post for each Tumblr post. Except they didn’t match. It turned out that on several posts I had neglected to tag the post with the “project365” tag specified in the query. Or I inserted a space in the tag. In one case, I apparently never posted the photo. It took a few iterations, but I finally had both columns lining up. A few mass edits later to fix up the regular expression and I have a series of lines like this:
RewriteRule ^archives/2013/01/01/39252-2013_1365_pickled_tomatoes\.html$ http://my365.ladewig.com/post/53069879424 [NC,L]
I added a few to the .htaccess for testing and… Voilà! It worked. I copied the rest into the .htaccess file, tested a few more pages, and everything worked. I edited the flags to add “R=301” to tell the requestor that the page has been permanently moved. That’s what tells the apps and the search engines to update their information.
One thing to note: Because of how WordPress handles permalinks, the RewriteRules had to be pasted into the .htaccess file before the block added by WordPress. It won’t work if you put it after that block. Look for this section:
# BEGIN WordPressRewriteEngine On RewriteBase / RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] # END WordPress
and make sure your lines are added before it. It took me quite a bit of testing to figure that out.
I’ve now deleted the Project 365 posts for 2013. So a bit less clutter in the blog, the old URLs work, and people who want to see the photos have a shiny new place to go see them. Now I need to do the same for 2011 and 2012. I guess I should be thankful I didn’t make it through an entire year on either of those attempts.