cult3

Rendering Markdown and HTML in Ruby

Jan 27, 2016

Table of contents:

  1. The problem that I faced
  2. Setting up the structure
  3. Converting WordPress HTML to Markdown
  4. Converting Markdown to HTML
  5. Conclusion

As you may of already guessed, I’m currently in the process of converting Culttt from a WordPress blog into a Ruby on Rails web application.

I love that I’ve got this far with WordPress, but sometimes you just need to scratch your own itch. Fortunately this itch is easy to scratch for me because I’m a developer and I enjoy doing this stuff anyway.

One of the decisions I’ve made is I want to write all my articles in Markdown, and then let the application generate the HTML for me.

But the problem I face is that I’ve already got nearly 700 posts in WordPress flavoured HTML.

I need to convert my existing WordPress HTML into normal HTML and to Markdown, and I definitely do not want to do that by hand!

Fortunately as we are programmers, we can let the computer do the hard work for us. I didn’t get into this game to give myself boring and repetitive work.

In today’s tutorial I will walk you through how I implemented this conversion process.

The problem that I faced

So before I get into the actual implementation, first I will describe the problem that I faced.

I’ve currently nearly 700 articles that I need to migrate to the next version of Culttt. These existing posts are stored in a database as WordPress flavoured HTML.

When I save “WordPress HTML”, I mean the HTML doesn’t have p tags and I’m using WordPress tags for code blocks.

So I need to convert all of my existing articles into regular HTML to remove the WordPress flavouring.

Going forward I also want to start writing my articles in Markdown. This means I will also need to generate a Markdown version for each of the existing articles incase I need to go back and make a change.

Setting up the structure

So I’m going to need a way to convert my WordPress HTML into Markdown, and a more general purpose way of converting Markdown to HTML.

When I need a service such as converting one format to another, I usually stick it in the lib directory.

My general rule is, if it is related to the domain of the application, it should go in app, but if it is a general purpose tool, it should go in lib.

By default the lib directory won’t be autoloaded, so we can add that path in the application.rb file under the config directory:

config.autoload_paths << Rails.root.join('lib')

Next, under the lib directory I’m going to create a render directory to group this code together under a namespace.

Converting WordPress HTML to Markdown

Over the years I’ve been pretty consistent with my WordPress HTML authoring and so for me this job isn’t too difficult.

In order to have a consistent output, I’m going to first convert my WordPress HTML into Markdown, and then generate each article from the Markdown, rather than converting it to straight up HTML, and then generating the Markdown.

I found it was actually easier to go from WordPress HTML to Markdown, and then to normal HTML.

Instead of reinventing the wheel, I’m going to be using the html2markdown gem.

Add the following line to your Gemfile:

gem 'html2markdown'

And run the following command in Terminal:

bundle install

Next we can create the class for generating the Markdown. I’m going to be wrapping the html2markdown gem in some customisations, so it makes sense to encapsulate this in a class:

module Render
  class Markdown
    def render(content)
      page = HTMLPage.new(contents: content)
      page.h1 { |node, contents| "# #{contents}" }
      page.h2 { |node, contents| "## #{contents}\n" }
      page.h3 { |node, contents| "### #{contents}\n" }
      page.code { |node, contents| "`#{contents}`" }
      page.markdown
    end
  end
end

First I create a new instance of HTMLPage and pass it the contents I want to convert.

Next I’m defining the customisations I want. Depending on the HTML you are converting or the Markdown format you want your customisations may differ.

Finally I will return the converted Markdown.

There isn’t a whole lot of value of writing tests for a class like this to be honest. The responsibility for actually converting is not in our hands and so writing tests to make sure it works is going to be a waste of time.

If you want to test the wrapping class is working correctly, you could write something like this:

require 'test_helper'

class MarkdownTest < ActiveSupport::TestCase
  def setup
    @markdown = Render::Markdown.new
  end

  test 'should render markdown' do
    assert_equal(
      '# Hello World',
      @markdown.render('<h1>Hello World</h1>').chomp
    )
  end
end

But like I say, there isn’t a great deal of value here. Either the output looks right or it doesn’t.

Converting Markdown to HTML

Now that I’ve got all of my posts in consistent Markdown, I can now convert them into the final HTML format that will be rendered when an article loads.

Once again, instead of writing my own Markdown parser, I’m just going to use an off-the-shelf solution. I will be using the redcarpet gem.

Add the following to your Gemfile:

gem 'redcarpet'

And run the following command:

bundle install

I’m going to be converting Markdown into HTML whenever I write an article, or whenever someone leaves a comment. To ensure the HTML that is generated is consistent, I can encapsulate this process as a class:

module Render
  class HTML
  end
end

I use code blocks quite a lot on Culttt and so I want a nice way of styling these chunks of code with syntax highlighting.

I’m going to use Pygments and so I will need the pygments gem.

Add the following line to your Gemfile:

gem 'pygments'

And run the following command in Terminal:

bundle install

Next I can create my own HTML Renderer by extending the Redcarpet HTML renderer and defining the block_code method:

module Render
  class HTMLWithPygments < Redcarpet::Render::HTML
    def block_code(code, language)
      if language
        Pygments.highlight(code, lexer: language)
      else
        "<pre>#{code}</pre>"
      end
    end
  end
end

Next I can finish off my HTML class:

module Render
  class HTML
    MARKDOWN_OPTIONS = {
      no_intra_emphasis: true,
      tables: true,
      fenced_code_blocks: true,
      auto_link: true,
      strikethrough: true,
      space_after_headers: true,
      superscript: true,
      with_toc_data: true,
      underline: true,
      highlight: true
    }.freeze

    def initialize
      @renderer =
        Redcarpet::Markdown.new(Render::HTMLWithPygments, MARKDOWN_OPTIONS)
    end

    def render(content)
      @renderer.render(content)
    end
  end
end

First I define some options for how I want the HTML to be rendered.

In the initialize method I create a new instance of Redcarpet::Markdown.new and pass it my HTMLWithPygments class and the MARKDOWN_OPTIONS hash.

Finally I can define the render method which simply delegates to the Redcarpet renderer.

As I mentioned earlier, you could write a test to make sure your wrapper is working correctly:

require 'test_helper'

class HTMLTest < ActiveSupport::TestCase
  def setup
    @markdown = Render::HTML.new
  end

  test 'should render html' do
    assert_equal(
      '<h1>Hello World</h1>',
      @markdown.render('# Hello World').chomp
    )
  end
end

But there isn’t much point in going nuts and testing that the conversion process is working correctly as that is not our responsibility.

Conclusion

I’m sure every programmer in their career will be tasked with converting one format of something into another format.

HTML is a particular awkward format to convert because you can get away with murder when writing HTML, and so trying to convert inconsistent HTML can be a nightmare.

Fortunately in my case, this process wasn’t too bad because I’ve been pretty strict with how I write my WordPress posts.

Once of the beautiful things about what do is that we can avoid the long and tedious job of manually converting hundreds of articles by writing a simple script to do the job for us.

Philip Brown

@philipbrown

© Yellow Flag Ltd 2024.