How to cope with open source data in the age of artificial intelligence
As the Data Economy booms, differentiating and standing out from the crowd is crucial, just look at IBM’s acquisition of The Weather Company in 2016.
When data replaces code as the secret sauce for analytics, it should come as no surprise that an open data movement exists, which like the open source movement, seeks to ensure useful big data sets are freely available to all.
Moshe Kranc (MK), CTO at Ness Digital Engineering, talks to Data Economy (DE) on how calls for open data stand to impact both large and small companies, as well as the future of Artificial Intelligence (AI).
DE: How are companies adjusting to this Open Data movement?
MK: Some companies are aggressively safeguarding their data, while others are taking a contrarian position, by releasing their proprietary data to the public, as a means of generating PR kudos.
One key example of the latter approach is Uber Movement, which uses data from the billions of rides Uber has provided, to let planning agencies and researchers track car travel times between any location at any time of day.
DE: Why should the Uber Movement be taken seriously when it comes to Open Source Data?
MK: Uber deserves major kudos for releasing rides data. Having reached the conclusion there was no way they could directly monetize this data, Uber could have just sat on it.
Instead, they recognized the value of goodwill potentially generated by releasing this data. In the long term, they may even derive some financial benefit.
As they negotiate the right to run taxi services in cities across the globe, Uber often runs into opposition from local taxi drivers. It certainly helps their cause to have generated goodwill with urban planners and residents.
DE: Are there any downsides to releasing your data?
MK: Companies that offer goodwill gestures in terms of releasing data must be careful to not inadvertently violate customer privacy. AOL Research made a similar goodwill gesture to Uber’s in 2006, when they released their search logs to better help researchers tune their search algorithms.
Although the data was anonymous, an industrious New York Times reporter successfully traced specific searches by the same anonymous user to locate that user. The resulting lawsuits ensured that no one would ever release search logs again. Let’s hope Uber’s goodwill gesture meets a better fate than AOL’s.
DE: In terms of big data, how are companies approaching this concept in the market?
MK: Some companies have taken a very aggressive approach to benefitting from Big Data. I experienced an example of this several years ago, when a book I authored in 2004 went out of print.
Several months later I was surprised to discover that my book was available, as scanned page images, on Google Books, even though I had never given them permission.
I wrote to Google to complain and received a response explaining that Google scanned any book that went out of print in order to analyze the text so they could improve their natural language processing (NLP) algorithms.
Google offered me a choice: receive $5 for the rights to the book, or have the book removed from Google Books. I took the $5.
In retrospect though, I regret my choice, because it gives Google an unfair advantage over competitors in training its NLP algorithms to have such a vast corpus of books.