image image 17th February 2016

Announcing DataShed Labs event - getting hands-on with Apache Spark & Hadoop

Here in the Shed, we spend a great deal of our time evaluating new technologies, frameworks and approaches to existing challenges… and often come away more confused than before.

With so much change in both the number of available tools, and the baffling names assigned to new Open Source creations (Drill, Falcon, Flink etc) we sometimes struggle to keep track – and most importantly, we struggle to work out where to invest our effort.

The same can be said for many of our clients – often they find themselves sticking with the tried-and-tested approaches, rather than risk wasting a huge amount of time on research that comes to nothing.

Introducing the DataShed Labs

In an effort to give developers a view of the tools and technologies that we think are worth trying, we’re hosting a series of hands-on events in Leeds, where developers get to grips with some new kit.

Our first labs event it all set for the 13th April – starting at 1pm. Working in partnership with cap-hpi, we’ll be taking 20 developers, architects and general geeks on a day’s exploration of Apache Spark.

The day will involve a deep dive into Spark and associated technologies, which should help the attendees understand what’s possible with the tools – and whether it’s of relevance to their industry or business.

What’s Spark?!

Apache Spark “is a fast and general engine for large-scale data processing” – basically, it can do many of the things we traditionally employ ETL (data transformation – extract, transform, load) tools to do. An added benefit of Spark is that it runs on a distributed cluster – meaning that it can scale to volumes of data traditionally very hard to get to with ‘traditional’ data processing tools.

Most importantly for us, it can run in a (pseudo) streaming mode, or in as a standard batch process.

You might want to have a read of Roger’s blog post on it…

We’ve found that, coupling Apache Spark with Apache Kafka, and good old MySQL can result in high-performance, high-scale data processing pipelines, particularly in the increasingly event-driven world we find ourselves in.

So – if you’ve heard of Spark, and are interested in just having a go, or if you are considering Spark for a project, get in touch. We’d love to have you along.

So what’s the plan?

The day will look something like this:

  • Set up a Hadoop cluster on your laptop (don’t worry, we’ve stuck the whole thing into Docker containers so you can get running quickly!)
  • Load a dataset into Kafka
  • Try out Spark Streaming, Spark SQL, and even plug it into a few data visualisation tools
  • … anything else your brain can think of. We’ve a couple of interesting data sets to play with, so let’s leave some of the thinking up to you


What’s the catch?

There’s no such thing as a free lunch – we know… and we have the receipts to prove it.

Those attending the event will be expected to contribute £100 to one of two local charities.

Where do I sign?

Drop us a note at – and we can go from there. We only have space for 20 geeks to join us… space is tight!



We Love Data

Want to know more?

Drop us a line – we’re always happy
to chat – we promise we’ll keep the
geek speak to a minimum (unless
that’s your bag in which case we’ll
happily comply)!