Why open source MLOps is awesome
The most important thing happening in machine learning right now is the emerging space of technologies aimed at productionising machine learning, otherwise known as MLOps.
If you’ve already begun your journey into MLOps, then you’re all too familiar with the overwhelming amount of choice. There’s a dizzying constellation of tools and platforms to pick from — so where do you start?
On the one hand we have software-as-a-service platforms which wield their massive marketing budgets and make a lot of noise, often drowning everything else out. But these tools tend to be subscription-based, offering little flexibility and locking you in to a vendor.
By contrast, there’s a vibrant ecosystem of free and open source tools with devoted communities around them. This world is the polar opposite of platforms; free to use and modify, as flexible as you need, and no vendor lock-in.
Open source wins
A significant choice faced by teams adopting MLOps is whether to pay for a ready-made platform, or build the right solution themselves using open source tools.
A closed source platform can seem tempting at first; the promise of outsourcing hard work to a third party has its allure. But such platforms force you to do things in a rigid way which, from our experience, hinders progress in the long run.
Open source tools, on the other hand, offer:
Flexibility: Readily available for anyone with the right skills to use.
Ownership: Can be installed on your own infrastructure so that you fully own the solution.
Cost-effectiveness: You can pick and choose the best tools for your needs, and use and modify them freely.
Agility: React quickly to changing needs, innovate and collaborate effectively.
Open source is all around us. For instance, the most popular operating system in the world is not Windows or MacOS, but Linux, the open source OS that dominates the server market and powers almost everything on the Internet.
The most popular tools used by software developers and data scientists are open source too: Git, Python, R, Jupyter, PyTorch… the list goes on. In nearly every area of technology, open source has won out to become the standard. We see no reason why MLOps should be any different.
Open source flavours
The term open source has a lot of different meanings, so we need some definitions.
Licenses: First, open source doesn’t simply mean that the source code is available. What’s important is the license terms that come with it. The Open Source Initiative have come up with a comprehensive definition of open source; the gist is that code must be free to distribute and modify.
Commercial-backing: A lot of open source projects are non-commercial, maintained by volunteers working in their spare time. But commercialisation isn’t incompatible with open source, and indeed plenty of great projects have commercial backing, providing better confidence when it comes to things like support.
‘Fauxpen’ source: some tools describe themselves as open source, but on closer inspection turn out to be proprietary projects masquerading as open source. These projects lose out on the core benefits of open source described above.
Open source MLOPs
So having picked the open source path, your next question would be which tools?
We’ve done a lot of research on exactly this question, covering all of the key areas that a data science team would need — things like version control for data, or real-time monitoring for models.
Staying true to our convictions, we’ve open-sourced this research through a Github repo called Awesome Open Source MLOps, which is accompanied by a series of in-depth guides to each area of MLOPs.
One of the great things about the open source ecosystem is how much variety you can find. But as you can imagine, the hard part is picking the right combination of tools for your specific needs. That’s where we come in: we help our clients to navigate through the world of open source MLOps, picking the right tools, and steering your away from the pitfalls associated with proprietary products.