Feb 03, 2016
Table of contents:
Last week we looked at converting WordPress HTML into normal HTML and Markdown (Rendering Markdown and HTML in Ruby).
This is a great first step, but we also need a way of taking the generated WordPress export and migrating the data to the new application.
This needs to be an automated process so I can keep periodically running it during development, and when I put the new version live I’ve already got a process and I just run.
There are a couple of existing solutions for converting WordPress data to a new format, but because I’m not migrating to an off-the-shelf CMS, none of them would work for me.
Fortunately it’s not too difficult to do the job ourselves.
In today’s tutorial I will be walking through how I wrote my migration task.
Whenever I want to migrate the data I’m going to want run a command in Terminal to kick things off. Rails uses Rake (Understanding and Using Ruby Rake) for command line stuff so we can write our own Rake task for running the migration.
First we need to generate a new Rake task:
rails g task wordpress import
This will generate a new import
task that is namespaced under wordpress
.
If you look under the tasks
directory under the lib
directory you should find a new file called wordpress.rake
:
namespace :wordpress do
desc 'Import WordPress data'
task import: :environment do
end
end
Next we need to take the XML export that WordPress generates and read it into a structure we can work with.
In order to do this, I will be using the Nokogiri gem.
Add the following line to your Gemfile
:
gem 'nokogiri'
And run the following command in Terminal:
bundle install
Next I’m going to create a new directory under lib
called word_press
and a new file called data.rb
:
module WordPress
class Data
end
end
In order to pass the XML into Nokogiri, we first need to read the file.
I’ll handle this in the initialize
method:
attr_reader :doc
def initialize
file = File.expand_path('wordpress.xml')
file = File.open(file)
doc = Nokogiri.XML(file.read.gsub("\u0004", ''))
end
In this example I’m hard coding the path to the XML export. You could pass this as an option from the Rake command, but because this is specific to my application, and it’s never going to change, I don’t mind hard coding it.
Finally I’m going to provide a single method for getting the posts from the export:
def posts
doc
.xpath("//item[wp:post_type = 'post']")
.collect { |post| WordPress::Post.new(post) }
end
Nokogiri provides an xpath interface for traversing the XML structure. I’m only interested in the posts so that’s the only bit I need.
I collect
over the array of results from the xpath query and create an array of new Post
objects that will be returned from this method.
For my application, I’m using the posts as an entry point for getting all of the data from the export.
The next step is to create Data Objects for each of the types of data you want to migrate.
module WordPress
class Post
def initialize(doc)
@doc = doc
end
end
end
By wrapping the Nokigiri element in a Ruby class, I can make any customisations and conversions as the object is read.
For example, if you just want to pass the data on, you can simply provide a method and return the value:
def title
@doc.xpath('title').text
end
def slug
@doc.xpath('wp:post_name').text
end
But if you want to convert to a different format, you can encapsulate that in the method.
For example, in last week’s tutorial I was converting the WordPress HTML into Markdown and regular HTML.
I can deal with this conversion process inside of this class:
def content
content = @doc.xpath('content:encoded').text
content = format_syntax_highlighter(content)
content.gsub(/[\n]{2,}+/, "\n\n")
end
def html
Render::HTML.new.render(markdown)
end
def markdown
return @markdown unless @mardown.nil?
@markdown = Render::Markdown.new.render(content)
end
def format_syntax_highlighter(text)
text.gsub(%r{\[(\w+)\](.+?)\[\/\1\]}m) { |match| "\n```#{$1}#{$2}```\n" }
end
To the outside world, this conversion process is completely hidden.
You can also create more classes to encapsulate related entities. For example, each post will have related comments so I can repeat the process of collecting these related items:
def comments
@doc.xpath('wp:comment').collect { |comment| Comment.new(comment) }
end
Now I can deal with the comment specific formatting in it’s own object.
Finally back in the wordpress.rake
task we can deal with the actual importing process.
This will basically mean taking each object from the WordPress data export and creating new Active Record objects and relations.
namespace :wordpress do
desc 'Import WordPress data'
task import: :environment do
# Get the WordPress data
data = WordPress::Data.new
# Import the posts
data.posts.each do |data|
article = Article.new
article.title = data.title
# etc
article.save!
end
end
end
The structure I’ve decided on for my CMS is more complicated than a regular blog and so this provides a nice opportunity to create the object graph for each article. That is something I definitely could not of done if I had used a general purpose solution.
In today’s tutorial we’ve covered a couple of interesting areas of Ruby development including creating Rake tasks as well as the very useful Nokogiri gem.
By encapsulating each chunk of data from the WordPress export as a class we can deal with whatever conversion details we require.
Although there are many existing solutions for migrating data from a WordPress blog, none of them came close to satisfying my requirements.
Hopefully if you are looking to do the same, you can use these last two posts as a foundation for building what you need.