MLOps (Machine-Learning DevOps)

^{06 November 2020}

Deploying machine-learning models takes about 25% of data scientists' time, while 75% of models never go beyond the experimental phase, resulting in a significant productivity bottleneck. Automated deployment and retraining processes decrease the time to market and reduce operations cost. This is the subject of machine-learning operations - or in short MLOps. MLOps is an extrapolation of the DevOps approach to include the machine-learning modelling life cycle. Models degrade over time because the input changes. MLOps introduces continuous retraining, model monitoring and evaluation of performance. It saves a version history not only of code, but also of data and models. Versioning of data allows data scientists to keep track of where their data came from, and versioning of models allows to efficiently keep track of the model quality in the development process. Note that MLOps is not to be confused with AIOps, which merely augments the DevOps framework with AI techniques.

Data scientists' pain points

Data scientists will tell you that their productivity is largely impaired by manually having to keep track of

machine-learning models,
code versions,
hyperparameters,
metrics,
ideas tried, and whether they worked or failed.

They will have trouble pinpointing the best model that had been trained two weeks prior, in order to reproduce it and to run it on full production data, or to rerun it with a more thorough parameter sweep.

From DevOps to MLOps

MLOps expands the DevOps integration stage with data & model validation. It expands delivery with the complexity of ML deployment. The team shall be able to successfully train a model and make it ready for production usage in a streamlined fashion, thereby significantly improving performance & agility. When in production, models need to be updated on a regular basis as new data comes in. Updates and changes need to be traceable. Continuous training of ML models has to encompass the whole model life cycle:

Model training pipeline (TFX, MLflow, Pachyderm, Kubeflow)
Model Registry (e.g. platforms such as Data Bricks or Google)
Model Serving (Deployment)
Model Monitoring
CI/CD orchestration

Instead of simply training offline and then deploying into production, a multistep pipeline to automatically retrain and deploy models is put in place. Automatic restart of training cycles can be triggered either as scheduled, when new data arrives, when there is a difference between training and live data, or when a degradation in model performance is observed. Summary statistics will be tracked and notifications sent out or models rolled back when values deviate from expectations.

Summary

The machine-learning product development process is not only about a software package. Beyond unit-testing of methods and functions, it requires validation of data, parameters and code together in a system. Machine learning is experimental in nature. The MLOps concept expands DevOps with specific processes that help data scientists to not get lost in the model zoo or in a swamp of deployment issues.

MLOps (Machine-Learning DevOps)

Data scientists' pain points

From DevOps to MLOps

Summary

Stay up to date with our free newsletter

0 comments

Leave a comment