Jeroen Moors


Ceph: Removing the last single point of failure
Written on 15 June 2013

A classical multi web server setup

A classical high available setup exists most often out of multiple web servers, database servers and maybe a redundant load balancer.

The user generated content (UGS) like images, video, … can be uploaded on any web server and must be available on all web servers. The easiest way to achieve this is to add a NAS to the setup and store all UGC on the NAS.

While all big vendors will tell you their NAS solutions are fully redundant, your NAS will go down at some point in time. For sure. One day a power supply needs to be replaced. Off course you’ve got a support contract covering this kind of interventions. An engineer will come onsite, unplugs one of your five power supplies. Takes one out, and puts the new one in. Unfortunately he succeeds to create a shortcut while bringing the replacement in place. Bang! Game over!

The previous story might sound a bid odd, nevertheless it’s a true story. I’ve got plenty of these stories…

Storing your data in a cluster

Instead of storing your precious data on a single device, store it in a cluster. While this might sound more expensive than a NAS, it might be even cheaper if you’ve got a solution that uses commodity hardware.

Ceph’s object store

Ceph is a free software unified storage platform. It provides multiple interfaces, one of them is an object store called RADOS (Reliable Autonomic Distributed Object Store).

Thinking of the files on your NAS as objects might be a switch in mindset, nevertheless it’s an easy one to make.

Building a first cluster

Any RADOS cluster is build with two types of building blocks: storage daemons and monitoring daemons. Their service in the cluster is indeed what you would expect by their names.

In case of a test cluster, you need at least two storage daemons and one monitor daemon. For testing purpose all components can run on a single machine. Use the quick start guide to get a cluster up and running in, yes, 5 minutes.

For a production server you will of course use multiple servers, spreading the different components on different servers. As the monitoring daemon provides a critical function the cluster you would like to get more then one. As a majority voting system is used, you always need an odd number to avoid issues. Three monitors is a reasonable number for most production clusters.

Interfacing with a cluster

If you’re used to use Amazon S3 to store your data, you could use RADOSGW. It will provide a compatible Rest API to your cluster.

It’s also possible to connect directly from PHP to your cluster using phprados.

Make PHP and RADOS chat

The current master branch of phprados was created by Wido Den Hollander and provides an almost one-on-one mapping to librados, the C interface to RADOS.

Since a few months I’ve been adding an object oriented interface providing a very easy to use API. It’s currently not yet moved into the master branch, but can already be found at Instruction on how to build the module are available in the INSTALL file.

PHP Examples

Read data from a local jpeg file and store it in the cluster.

connect($mon_host, $key, $pool);
    // Read a local file
    $data = file_get_contents("/tmp/some_image.jpg");
    // Write the data to the cluster, using 'my-image-id' as its unique key
    $rados->write("my-image-id", $data);

Retrieve the image stored in the cluster and return it to the client.

connect($mon_host, $key, $pool);

    // Read the data from the cluster
    $data = $rados->read("my-image-id");

    // Set the right header for image output
    header("Content-type: image/jpg");
    // print the data to the client
    print $data