Feb 25, 2013
Table of contents:
One of the most important aspects of building scalable websites is how you manage the caching of data. The problem with learning about cache is, it only really comes up when you fighting to keep your site online, which by that point, it’s too late. Implementing a cache should really be one of the fundamental aspects of building a website because it will become so important in the future. You should think of cache as an equally important aspect of your website as the database itself.
In this post I will introduce you to caching and how to use Memcache.
Firstly, it’s important to understand exactly what cache is. Dynamic websites are usually powered by some kind of persistent data storage. So for example, WordPress websites store all of their data into a MySQL database. When someone requests a blog posts, the server requests that blog post from the database, and sends it to the user’s browser.
This is all well and good when the website is small because the server is able to handle these requests quite quickly. This breaks down however, when the site starts to grow.
Each time a page is requested, multiple queries are required to generate the page. So for example, you might require separate queries for the blog post, author information, comments and the sidebar. Once you have a good amount of traffic, processing all of these requests at the same time starts to become a grind. Making a database request can be quite costly in the sense that is takes times to hit the database and get the data returned.
A cache is simply a middle ground between the request and the database. So the request will first check to see if the cache has the data that it requires. If the cache does have the data, it can be quickly returned. If the cache does not have the data, a normal database query can be requested. Hitting the cache is much quicker and less costly than hitting a database.
So in the example above, the majority of the content that we are requesting from the database can be held in cache. The blog post will probably never change once it has been published so there is no need to ever hit the database. If the post were to be updated, the new content can simply be added to the cache.
As you can see, cache is a very simple concept, but it will be critical to building a large scale website. Nearly ever popular website on the Internet today would not be possible without cache, and so it’s imperative that you learn about it if you want to build something that big too.
Fortunately, there are many out-of-the-box solutions for handling cache and they are really easy to set up. You can also set up a caching system from day one as it’s not something that only applies to big websites. So when the day comes that you have a huge spike in traffic, you will already be sorted.
Memcached is one of the most popular and well known caching system. It is used by Facebook, YouTube and Reddit amongst many other high profile websites.
Memcached is relatively simple to set up and get going with. This isn’t a tutorial for setting up Memcached because it will be different depending on your setup and what programming language you are intending to use.
Memcached works by storing your data as key-value associative array. So essentially, every piece of data that you want to store will have a unique key. When you want to retrieve that data, you can simply request it by the Id.
For example, say the blog post we want to cache has a post_id of 1567. We could store the content of this post in Memcached using the key “post_id_1567”.
Another common use of cache is when you have a dynamic home page that is requested a lot. Say for example you have an online shop that displays the top 10 most popular products. With Memcached you can cache the top 10 list so each new page request isn’t hitting the database. You can also set this particular cache to expire at a set interval. So for example, you might want to let it expire once an hour or once a day. When the cache expires, the next request will update the data with the latest results.
I won’t be providing a complete solution for using Memcached with PHP as this post is aimed at encouraging more developers to start implementing caching, rather than to give a general purpose solution. The following code should be taken as closer to pseudo code than what you would actually want to use.
If you are looking for a PHP Memcached solution, take a look at Laravel’s cache system that also has options for other caching drivers or take a look at Packagist.
To use Memcached, we need to make a connection to it just like you would to a MySQL database. Memcached uses a client-server architecture. The client requests data and the server sends it back.
So the first thing we need to do is to make a connection to the server. First we instantiate a new instance of Memcache, then we make a connection and store it in a variable called $cache.
The two arguments in the connect method are Memcached Host and the Memcached Port. Memcached uses port 11211, and we are using Localhost.
Finally, I’m saving the connection into a variable so we can test to make sure a connection was made later when it comes to save data or retrieve it.
// Instantiate new instance
$memcache = new Memcache();
// Make a connection
$cache = $memcache->connect("127.0.0.1", "11211");
Storing data in Memcached is incredibly easy because all you have to do is construct an array and send it to the server. Hopefully if your application is well structured, this should be really easy.
The process for storing data into Memcached is really straight forward. First we need to store the data in the database, then we store it in the cache. This means we have two copies of the data, the one in the cache that we will look for first, and the one in the database.
So first we insert the data into the database:
// Query
$query = "INSERT INTO posts (title, post, date) VALUES ('$title', '$post', '$date')"
// Run the statement and save into a variable
$success = insert_data($query);
// Return the last inserted id
$id = last_id();
I’ve wrote the above code just in pseudo code so you get the point without having to go into the details of how you insert data into a database. If you are unsure how to do this stage, I’ve already written two posts on using PDO.
Next we need to insert the data into Memcached.
First we test to make sure the query ran successfully. If something went wrong there’s no point in storing the data in Memcached too.
if ($success) {
}
Next we need to create a unique Id to store the data in Memcached. We can just use the return Id from the database to make this easy to find in the future and because it is already associated with that post.
// Set the key
$key = "post_id_" . $id;
Next we need to create the associative array to store the data:
// Create associative array
$post = ["title" => $title, "post" => $post, "date" => $date];
And finally, we can set the data to Memcached:
// Save data to Memcached
$memcache->set($key, $post);
Requesting data from Memcached is just as simple as inserting it. First we see if we can find the required data in the cache. If it can’t be found then we simply fall back to the database.
So first we see if the cache is available:
if ($cache) {
}
Next we attempt to get the post data from the cache:
$key = "post_id_" . $id;
$post = $memcache->get($key);
We check to see if the post was not found in the cache. If the data was not found in the cache, we can simply try the database as a fallback.
if (!$post) {
// Search for data in the database
// Use PDO stuff here
}
And there you have it, getting started with Memcached is literally that simple! If you have structured your application nicely, that should of taken you no longer than 10 minutes.
Using Memcached is such a no brainer. You might not get any noticeable results for a long time, but it will make your life a lot easier if you start to get traction.
Finally, if you are looking for a PHP framework that makes caching simple, take a look at Laravel.