Sunday, June 25, 2017


From Science to Revolution: The Rise of Data Engineering



by Brian Hills is Head of Data at Innovation Centre, The Data Lab

Over five years ago a new group of heroes emerged from the rise of the Big Data era.  This group would turn Big Data into golden nuggets of insight with the alchemy of machine learning, software engineering, data visualisation and a sprinkling of business savviness.  They would deliver unbound value to industry and make geeks cool.  Data Scientist was the sexiest job of the 21st century.

In reality many organisations have struggled with the emergence of this field and up to 80% of data projects fail to deliver on their objectives.  Coupled with the skills shortage in this area, businesses have faced challenges in evolving organisational models that enable them to realise the full value from hiring data scientists and some simply followed the trend by relabelling their business intelligence (BI) teams to Data Scientists overnight!

One of the key learnings from the past few years is that success with data cannot be dependant on data scientists alone.  This group, often PhD level, are skilled in applying the scientific method to uncovering valuable insights from data.  They will form hypothesis, test the hypothesis by applying complex methods to data and then generate knowledge from the results.  However, using only this approach within a business creates a cottage industry that limits ability to scale and generates a number of significant risks.

As a result, there is increasing recognition that success with data is a ‘team sport’ and one of the key emerging roles in this team is the Data Engineer.  Here’s five reasons you should consider hiring Data Engineers to work with Data Scientists:

 

  1. Data Scientists want to solve difficult problems; they need the fuel (data) to do this.

Data Engineers build the production strength systems from which data flows to the Data Scientists.  The more you can minimise the need for Data Scientists to collect and cleanse data the higher the chance you have of maximising value from them (and maintaining their interest to work with you!).

 

  1. Data Scientists are master craftsmen/women and are a bottleneck

Data Engineers take the algorithms and outputs from data scientists and deploy into an automated production environment.  This enables organisations to scale the impact of their data teams and monitor the performance of algorithms at scale.

 

  1. Organisations face increasing regulatory requirements and the need to be transparent on algorithm and data usage.

Organisations have increasing volumes of requests for information from regulators and customers.  The response to these requests needs to be accurate and timely with a full audit trail through an engineering process rather than data scientists.

 

  1.  The Data Engineering toolbox is different from The Data Science toolbox

There are hundreds of tools, languages and technologies in the data space.  It is unrealistic to expect Data Scientists to be fully accomplished engineers and vice versa.  For those who are, expect a battle to retain that talent!

 

  1. Data Scientists are often deployed as a strategic resource to solve point to point challenges across an organisation.

Data Scientists never remain static.  They are a precious resource and are frequently moved across departments and functions to solve complex challenges.  This short term view helps to tackle immediate challenges but limits longer term gain – engineers can take these point solutions and build them into core product.

However, Data Science is slowly turning into Data Revolution.  The tech giants have been leading the way in the development of this area and many including AirBnB, Facebook and Uber publish their stories and learnings on blogs.  Universities are also adapting their courses to offer modules and entire degrees in Data Engineering generating new pools of talent.

The alchemy is changing: we are learning from the experience of the past five years and bringing together people with many different skills into data teams.  The key ingredient to delivering success with data in the future will be collaboration.