en de
Back

Online Magazine

Machine learning platform – what is it and why do you need it?

Did you know that 90 percent of ML solutions never make it into production? The right ML platform can play a decisive role in preventing this from happening.


by Sunanda Garg

Businesses all over the world see the benefits that data, artificial intelligence (AI) and machine learning (ML) can have. However, getting there is everything but easy.

In a 2019 Accenture survey, 84 percent of executives noted that they would not achieve their growth objectives without scaling AI. To do so, companies can benefit from a so-called ML platform. However, in today’s crowded vendor landscape, making an informed decision (‘build vs. buy’) on a platform for scaling AI can be tricky.

Thus, this article aims at providing a starting point for you to engage with the topic of ML platforms. It first explains what an ML platform is and why your organization needs one and then continues with an overview of ML platforms in the market and how you can find the one that suits your organization best.

What is a ML platform?

In many areas of research and industry, artificial intelligence and machine learning are becoming increasingly popular for problem solving. While a huge effort is made to further improve the performance of ML models, the true bottleneck has shifted towards moving such ML solutions out of the labs and into production.

As ML gains foothold across organizations, teams developing ML solutions are struggling with the complexities surrounding the ML lifecycle. As a result, solutions that help to manage the end-to-end ML lifecycle are in high demand. Not surprisingly, cloud providers are offering native services to support different aspects of the ML lifecycle and an increasing number of software providers are expanding their offerings in this direction. Additionally, more and more start-ups are pushing innovative solutions and ML services to the market.

So, what is a ML platform?

A ML platform is defined as a collection of services that covers steps encompassing the end-to-end ML lifecycle and helps organizations continuously develop, deploy, integrate, and monitor their AI and ML solutions.

Why does your organization need a ML platform?

ML models are often developed in a very complex, even incomprehensible, way. Particularly, the complexity of ML solutions and their development process is due to the following factors:

  • Different programming languages and tools: Data scientists usually begin the ML lifecycle with different sets of programming languages (e.g., Python and R) and tools (Jupyter Notebook, RStudio, PyCharm) in different versions.
  • Different libraries: Moreover, they tend to use different libraries (e.g., TensorFlow, scikit-learn, XGBoost) for applying algorithms such as logistic regression, neural networks, tree-based classifiers etc.
  • Small samples of data: In many cases, the data scientists create a prototype using a small sample of data, which would normally be supported by the available resources (e.g., memory and CPUs) of their laptops or workstations.
  • Specific features and metrics: The engineered datasets (also known as features) created from raw data, are used to train the models locally and the performance and insights are visualized with the help of suitable metrics (e.g. within Jupyter Notebooks).
  • Potential incompatibility with business requirements: The tools, frameworks and data requirements during this model development phase might or might not be compatible with the production-grade solution required by business.

The solutions developed in such a way generate little to no value and business impact. To unleash their full potential and generate value continuously, they need to be taken out of the experimental phase and integrated deeply within the business processes of the organization. However, this is impossible without certain components such as

  • an efficient data pipeline,
  • continuous integration and deployment process, and
  • artefact and metadata management.

Only a comprehensive set of ML components – meaning a ML platform – enables this transition to the next phase of the ML lifecycle, in which the carefully trained and selected ML solution makes it out of the lab into the real world. After all, you would not want your models to belong to the astounding 90 percent which never see the light of the day.

In theory, everyone can build their own ML platform. That is what many of the tech giants have done, e.g., Uber (Michelangelo), Airbnb (Bighead), Facebook (FBLearner), Netflix (Metaflow) and Apple (Overton) to name a few. However, while this might be possible for organizations with huge engineering teams, it is not a feasible approach for the majority of organizations. To overcome this hurdle and bridge the gap, enterprise ML platforms increasingly provide an answer.

Which ML platforms exist in the market?

To cope with the complexities of managing the ML lifecycle, more and more ML teams are looking towards PaaS solutions. Several vendors and cloud providers are offering end-to-end ML platforms and/or services, including AWS (Amazon Web Services), Microsoft Azure, Google Cloud Platform, Databricks, Dataiku, H2O.ai, and several others.

Cloud providers offer several possibilities to effectively support most, if not all, components of the ML lifecycle and provide flexibility, but at the cost of relatively high cloud engineering effort. On the other hand, proprietary solutions like Databricks, Dataiku and Domino offer out-of-the-box services encompassing the ML lifecycle which may simplify your ML journey at a certain cost of flexibility.

The burning question: Which is “the best ML platform”?

There is no shortage of choices regarding ML platforms in the market but the natural question any ML team or organization starting out may ask is – which one is the best one? We can safely assume that there is no ‘one ML platform to rule them all’. Or in other words: there is no such thing as the best ML platform. No matter how good a product or service provider claims to be, there will be competitors with a similar offering portfolio. Instead, the right question to ask is:

Which ML platform would be best suited for my organization?

The choice of a ML platform depends on three things:

  1. Your team’s skill level: If your team is comprised of more citizen data scientists than engineers, your choice of platform will be quite different from the one of an engineering-heavy team. A platform is only well-suited for your team’s needs, if it simplifies their day-to-day tasks, which in turn speeds up your ML journey.
  2. Your types of ML use cases: The platform should support the various types of ML use cases your organization is trying to solve as well as the ones on your roadmap. Use cases have individual requirements, such as computer vision, natural language processing (NLP), big data analytics etc., and some platforms support one (or more) better than the other.
  3. The technical debt of your organization: The selected ML platform should reduce this technical debt. Almost each platform offers various types of solutions which address different components of the ML lifecycle (e.g., model registry, experiment tracking, feature store). Selecting a ML platform that helps you fill in the gaps will expedite your organization’s ML journey.

After considering all of the above, to help find the most suitable ML platform for your organization, a ML capability framework is required which helps you assess all the above mentioned aspects consistently across several platforms. Making the right decision is especially important because long-term vendor lock-in can be costly.

Your contact

MORE TECH TALK:

TechTalk
Digital transformation

Who needs a digital strategy?
TechTalk
AI in business Automation AI

Thoughts on GPT-3
TechTalk
Digital transformation

Database check-up
Read