Why Open Source is Driving the Big Data Market
by Patrick Booth, VP UK & Ireland, Talend
The big data market is moving at lightning speed. But when it comes to solutions, there’s a widening chasm between the legacy approach and next-generation developers and vendors.
While the legacy approach has worked well over the years and still has its place in what is becoming a huge market, there are many signs that open source solutions will be better placed to help business optimise the advantages that big data analysis brings.
But first who are the legacy vendors? Typically, they have their own large internal teams, dedicated to building proprietary, bespoke software. They have solid products, reliable technology and well-funded research and development projects.
So what’s the problem? Their products are fine for businesses doing things in the way they have always done them. But they are usually built on a traditional architecture which may not easily adapt to the big data environment. In other words, they are great for any organisation with straightforward requirements around data integration, data quality and extract and transform and load (ETL), but only if they plan to retain their existing processes.
However, the catch comes when a business wants to do something new – launch a new project or initiative or even go through an organisational transformation. It’s then that the challenge begins, mainly around cost and flexibility.
For example, if this business starts to look for additional functionality – around data ingestion, for example, there are likely to be licensing issues. Businesses buying perpetual software could discover it’s a one-way investment. Once bought, the software can be hard to modify and then almost impossible to cancel if a business needs change.
To add to this situation, legacy architectures are often unwieldy and it can be difficult for an organisation to adapt to meet the demands of an evolving big data project or environment.
In short, their wings are clipped before they even start entering the world of big data. They just won’t have the agility to do so.
It may have been better for them to consider a flexible, licence-based, open source environment. Open source vendors tend to favour a subscription model, so organisations can reassess the situation at regular intervals – there’s no lock-in to a certain number of licences. This brings a new flexibility combined with important costs savings.
However, there are further, important benefits too. One of the main advantages is access to a collaborative environment and a partnership approach to product development. Unlike in the legacy, proprietary world when there is only one team of developers, in the open source arena, there are multiple organisations working to the same aim.
Take the latest, high-impact big data Apache projects where there is a joint effort of businesses, other organisations and individuals working on each one and, as a result increasing the pace of innovation and enabling teams to select the best developments.
With open source, not only is the innovation of a higher quality, but it’s also faster. By extension, if the organisation is focused on developing an enterprise-class solution, it will be far easier to ensure innovation lies at the heart.
In fact, because of these benefits, open source is rapidly becoming the standard approach in the big data world, just as cloud has moved from being a disruptive force into the mainstream. It is helping to drive innovative new technologies such as Apache Spark and subsequently Spark Streaming, as well as helping to fuel emerging projects such as Apache Beam. For legacy vendors it would take a major effort to support these projects – and by the time they did so, the open source world would have moved on.
Organisations that choose an open source first approach will be best suited to take advantage of new big data trends such as Spark Streaming and other new developments into the future. Few businesses these days can expect to always operate in the same way as they have done in the past and to assume they will do so could be dangerous. Next generation and open source gives them the option to adapt to the future.