The outage on Amazon’s Elastic Compute Cloud (EC2) in April took down multiple public websites using the service including Foursquare, Reddit, Quora, and Hootsuite. The assumption made by the companies that run (and generate their revenue) from these websites was that because it was in the cloud, their sites were resiliently hosted.
It is possible to host resiliently in the cloud but the same rules of engagement apply to more traditional models:
- Choose private cloud where possible. Private cloud providers are used to dealing with businesses and understand that they need to be completely transparent with their customers about how and where the infrastructure operates.
- Ensure that your private cloud solution is hosted in at least two diverse data centres. These can either be with the same hosting provider or with two (or more) providers. Hosting with separate companies reduces the chance of the data centres sharing infrastructure, but using one company with diverse infrastructure can be more effective as there is a single point of contact and proper ownership of the solution.
- Have multiple paths for network connectivity to the data centre, either through private links back to your corporate network or transit to the public internet.
- Size your hosting infrastructure to ensure that if there is a failure, the remaining infrastructure can handle all the traffic on its own.
- Replicate and backup your data – not just between the cloud hosting environments but also archive back to your corporate environment to protect against corruption of data being replicated across sites.
- Test the failover works, both as part of the go-live process and at regular, scheduled intervals throughout the lifetime of the solution. Failing over a working site can seem a bit strange but it allows for any issues to be dealt with in a controlled manner, and can give your customers confidence that the resilient design works well in practice.
Applying these traditional rules to the cloud hosting model will give you the best of both worlds: the scalability and flexibility of the cloud model but with the uptime and availability of the traditional hosting environments.