Back to overview

Online Magazine

ML platforms and how to assess their capabilities

A machine learning platform increases your ML solution’s chances of making the cut from the lab into production. But how do you find out which ML platform best suits you? With the help of an evaluation framework.


by Bhaumik Pandya

As illustrated in this article, in order to increase the chances of your ML solution to make the cut from the lab into production, it is wise to employ a ML platform. To find the platform that best meets your organizational requirements, you need a framework which helps assess the capabilities of ML platforms. One example for this is the Accenture assessment framework that encompasses the end-to-end ML lifecycle.

To understand this framework, it is first necessary to define the functional capabilities a ML platform should cover to enable continuous development, integration, deployment, and monitoring of a ML solution.

What functional capabilities should a ML platform have?

To quote Ron Schmelzer: "There's no such thing as the machine learning platform". In a world where lines are drawn through an already complicated software and service environment, functional capabilities set a baseline for what is necessary to bring a ML solution to production. The result is a capability map (Figure 1) that covers the end-to-end ML lifecycle and comprises five functional areas: Data Ingestion & Storage, Experimentation Zone, Continuous Integration, Industrialization Zone and Data Presentation.

Figure 1: The functional areas and components which a ML platform should ideally cover
to enable an end-to-end ML lifecycle.

Each functional area (example indicated in purple) consists of components (example indicated in pink) which ensure the distinct requirements imposed on the respective functional area. Projects and use case implementations usually start by ingesting the collected data (Data Ingestion & Storage), followed by experimenting and developing solution algorithms (Experimentation Zone). ML models created in the process are then tested, integrated, and deployed to production (Continuous Integration). Additionally, the models are validated and continuously monitored (Industrialization Zone). The utilization of AI applications and ML solutions is possible through different types of endpoints, such as REST-API, Batch Service and Dashboards (Data Presentation).

In particular, the individual functional areas entail the following:

  • Data Ingestion & Storage
    Data is a first-class citizen in ML. So is availability of high-quality data, as training and serving models is a fundamental capability of an ML platform. A data pipeline provides the data required for model training. For the data pipeline to ensure a connection to different data sources (batch and streaming, structured and unstructured), a ML platform ideally provides various connectors. The data and storage area moreover provides functionalities for data quality testing, data transformation and data versioning.

  • Experimentation Zone
    The development of ML models is an iterative process in which the first step is to experiment with different algorithms and configurations (e.g., different hyperparameters) to achieve the best model result. To make this process traceable and automated where needed, a ML platform should support functionalities to compare, share, reuse and collaborate on model development. Thus, it should provide a component that centrally manages this experimental development and stores, tracks and evaluates the model versions, along with pertaining artifacts and metadata. The support of visualization options (e.g., for metrics) is advantageous.

  • Continuous Integration
    The training of models is usually carried out using programming languages such as Python (or sometimes R) and takes place in a different environment than when used later in production. This brings along two challenges:

    1. Models must be able to be persisted.
    2. Models have to be portable between different environments.

    This requires a uniform model format such as ONNX or PMML for storing models (including artifacts) and their dependencies. It must also be possible to assign the stored models to the parameters and metrics from the training runs. These functionalities are often covered by a model store including a model registry. Just as relevant is a feature store with which the storage, re-usability, and provision of the features, both during model training and model serving (e.g., generation of predictions in productive environment) are facilitated. Feature store and model store are crucial for the continuous automation, integration, and provision of models across the entire ML lifecycle, but they do not replace the entire ML CI/CD process. The ML platform thus has to support building a pipeline, automating processes, and packaging as well as deploying models to a suitable environment. A crucial factor thereby is the ability to integrate external tools for orchestrating and automating CI/CD workflows.

  • Industrialization Zone
    For the long-term success of a ML solution, it is crucial that the performance of ML models in production not decrease over time and the model be at least monitored and validated against a baseline model. Ideally, the models would be continuously retrained. A ML platform should therefore provide model monitoring to track relevant model KPIs and to trigger an automatic model retraining via a predefined schedule or via certain triggers, ideally monitoring KPIs. In addition, the ML platform should either provide its own visualization functionality or support a simple connection to monitoring tools (e.g., Prometheus, Grafana).

  • Data Presentation
    To finally be able to serve the results or predictions from the model to consuming applications, a ML platform should provide model serving functionalities. This includes topics such as model orchestration and the testing of models (e.g., A/B tests). Other essential functionalities for the data presentation are model insights i.e., the ability to share findings from the model training phase with different stakeholders. Visualization options in the ML platform, e.g., in the form of a dashboard or integration with visualization tools such as Tableau or Power BI, are very helpful.

How do you evaluate your ML platform?

After having introduced the five functional areas of an ML platform and the components that ensure their necessary requirements, let us define the assessment indicators that will enable the evaluation of different ML platforms.

Scoring of a component
The ML platforms are assessed based on the coverage and maturity scores of each of their components. While the coverage score indicates the availability of the component on a certain ML platform, the maturity score indicates the component’s capabilities, robustness, and readiness for production.

Scoring of a functional area
As shown in Figure 2, the overall coverage and maturity score of a functional area is the cumulative average of the scores of the components, which comprise the respective functional area:

Figure 2: Scoring of component and functional area with respect to coverage and maturity.


The coverage score of a functional area provides information about the percentage of covered components, whereas the maturity score of a functional area indicates the overall capability and readiness of the underlying components.

Why do we need two distinct assessment indicators: coverage and maturity?

Let us imagine a Platform ‘A’ which has only a little coverage of components but high maturity in the covered components and another platform ‘B’ that covers every component but with little maturity. Without a distinction of coverage and maturity, both platforms might score the same, but from an operational perspective, this makes a huge difference. You would either need additional tools to enable the full ML lifecycle or lack the capability to fully industrialize your workflow.

Conclusion

In order to find the ML platform that best fulfils your requirements, you need to evaluate the components (and thereby the overarching functional areas) of different platforms. For the evaluation, the components’ coverage score (availability of the component) is just as important as its maturity score (capabilities, robustness, and readiness for production of the component).

CHECK OUT THESE TECH TOPICS:

TechTalk
AI ethics AI in business

Tech Talk Audio: Introduction to Responsible AI
Hack of the Week
Data analytics

Data visualization with "small multiples"
TechTalk
Digital transformation AI in business

TechTalk Audio: Business Analyst

Your contact