Cloudera Director 2.0 - Truly cloud ready?
We were excited to see the recent release of Cloudera’s ‘Cloudera Director’ – pitched as the easiest way to deploy enterprise Hadoop environments on the Cloud.
When we initially tried Cloudera Director 1.x, we weren’t particularly impressed. In fact, we elected to ditch it entirely and built our own simpler provisioning tool for AWS using Docker (mainly because spinning up a cluster for our CI environment using Cloudera Director seemed far more complex than was actually necessary)
However, I’m quite excited about this latest iteration.
Maintaining a High Availability platform is critical to so many of the typical workloads we push through a Cloudera cluster. Often, these are Spark Streaming jobs which need to run 24/7 with sub-second latency, providing business critical data to multiple downstream systems. We’re comfortable configuring YARN, HDFS, Kafka, Zookeeper and Spark etc for fault tolerance, but this doesn’t help anyone if one or more of your physical hosts goes pop.
So, having Cloudera Director capable of ‘Cluster cloning and cluster repair’ sounds fantastic. All systems should be designed with sufficient capacity to ensure that a single node failure doesn’t result in catastrophic failure, however even spare capacity doesn’t mean that your Ops chaps can put their feet up and worry about something else (perhaps whether they should be called DevOps engineers, rather than Infrastructure Engineers?).
When you get an alert telling you that a node has died, or just dropped off the map, someone needs to go running to go and spin up a crisp new EC2 instance (from a pre-configured AMI if you’re lucky, or from a vanilla OS and built it up if you’re not) and add it back into the cluster, provision services and send work its way.
Cluster repair suggests that this would be automated. Big thumbs up from me if that ends up being the reality..!
Another pretty cool feature is the ‘Automatic job submissions that spin up and terminate clusters on a per job basis, without manual cluster lifecycle management’ – so if you have a job that runs once a day, then you can automate the whole job lot – from provisioning infrastructure, deploying services, running your code and then finally spinning down your hardware. The potential cost savings here are huge – depending on the number of nodes you’re running, if you can save just a few minutes on each job run, then this can add up to many hundreds of dollars each month.
So… This looks to be a pretty significant step forward in both workload automation and High Availability for Cloudera on cloud deployments. We’re looking forward to trying it out – we’ll report back when we do