From Discovery to Selection: 10 Things We Learned About Machine Learning and Data Science
The last few months have been action-packed as we’ve been preparing for Seattle’s third batch. During the startup discovery and selection process, we learned a ton about markets, products, innovation, industry maturity, and technical forces that come in play as newer and newer companies come into existence. And we’re seeing how they innovate and drive advancements in the field of data science.
Some of the things we learned were counterintuitive; others were expected. But most importantly, the large pool of applications from around the world helped us see the field through different lenses. It was interesting to see contrasting approaches to problem solving among startups. For example, a startup solving a particular problem in Western Europe might take an entirely different approach than a startup addressing a similar problem in South-East Asia.
After hundreds of hours of interviews, personal meetings, city tours, and remote meetings, the following are our top 10 takeaways. This obviously doesn’t cover everything, but reflects a snapshot in time of a rapidly evolving industry.
- Adoption of Data Science technologies, so far, follows power-law. 80% of interviewed startups’ revenue comes from four industries : Retail (traditional/online), Healthcare, Fintech, and Urban Informatics. Energy, Public Sector, and Aviation are widely tipped as emerging sectors.
- In a “winner-takes-all” market, there is no clear enterprise-scale winner. Startups were looking at multiple, enterprise-grade players for very specific needs. We didn’t hear a consistent name or technology dominating the field of machine learning and data science (which is a healthy development for an emerging discipline).
- The label “machine learning” is often abused. With any hot technology, is it inevitable that the label will be plastered indiscriminately on things that have little or nothing to do with it. More so with Machine Learning since the term can apply to a broad spectrum of processes and strategies.
- ‘Spark’ has the real spark. Spark (because of its speed) appears more natural and convenient for most programmers. Many startups believe it will reinvigorate Hadoop in the coming times.
- Accuracy of models is path dependent. Models aren’t perfect, and neither are the outcomes. Most startups claim to have “reasonable models.” However, they spend more time creating newer models as opposed to training them. Accuracy eludes the model due to multiple factors—lack of the “right” dataset, algorithm, human errors/judgements, evidence bias, etc.
- “There is an API for that.” Gone are the days of creating everything in-house. Welcome to the age of plug-n-play APIs. Startups no longer see the need to hire PhDs or super-specialized experts. Time to market is 1/10th what it used be, even five years ago.
- It’s not just the math. Interpretation and contextualization trumps everything. Interpretation is as much an art as science. Many early-stage companies often struggle to find the right talent to make sense of the outcomes.
- Data is cheaper than ever. But where is the clean data? Nine out of ten Startups complained about lack of refined data despite the (literally) universal abundance of “free” data sources. Curation remains a challenge.
- The struggle for data sources continues to climb. The emergence of “free” data has immensely helped an emerging field like machine learning move beyond university campuses and big corporates. Where startups get that data (and who supplies it) will become major issues for machine learning in all its forms.
- Innovation overcomes geographical barriers. Democratization of product delivery—cloud, open source, and access to ‘reasonable’ technical talent—makes it easy to “create anywhere, deploy everywhere.”
People often ask me (in my opinion, given my exposure to the wider ecosystem), if the data science industry has come of age. I always tell them that this is just the beginning—in both creation of platforms and products as well as their applications. The ‘range’ of available technology is certainly wide enough for even the not-so-technical people with brilliant ideas to create compelling products and take them to market. From a timing standpoint, there has never been a better time to try to solve large, complex problems where, historically, the tools have been out of reach.
What do you think, has data science come of age?
To stay updated on our Machine Learning Accelerator, as well as our other accelerators located around the globe, make sure to follow us on Twitter @MSAccel. Check back on the blog soon when we announce the startups that will make up Seattle’s third batch!