MLOps and model building pipelines: navigating the new frontier of AI

Article

MLOps and model building pipelines: navigating the new frontier of AI

Krzysztof Niksinski

MLOps Engineer

We've come a long way. We've gone from the perceptrons, through Hopfield networks, backpropagation support vector machines, random forests, unsupervised deep learning, generative adversarial networks, and all the way to large language models. Machine learning (ML) is now more popular and accessible than ever. This has led us to a new era of connectionism(1), a theory stating that intelligence can emerge(2) from simple elements. Deep learning puts this into practice and introduces a whole new programming method.

This means that we can now teach computers to do complex tasks that were previously impossible with algorithms alone. And to work effectively on new models, scale them, manage datasets, track training, and increase automation and visibility, we need machine learning operations – MLOps(3).

MLOps blending machine learning with DevOps principles sits at the intersection of machine learning, data science, and DevOps. It streamlines and automates the development of machine learning models. This approach covers everything from dataset creation and model design to training and deployment. Additionally, MLOps enhances machine learning workflows by incorporating CI/CD pipelines, fostering collaboration, and ensuring reproducibility. It also allows for ongoing monitoring of model performance in production and enables the creation of a continuous loop of improving and releasing better versions.

In this article, I’ll describe a traditional machine learning workflow and then compare it to an MLOps-driven approach to show why it's highly relevant in the current AI landscape.

The traditional machine learning workflow

Traditional machine learning workflows are isolated. They are either on the developer's machine or another single development environment. While these can be accessible, like Google Collab notebooks, they are not scalable, nor do they offer features like dataset versioning or automated deployments.

In these traditional workflows, data is gathered from various sources like databases, logs, and external datasets. It then needs to be manually reviewed and cleaned to remove duplicates, correct errors, and fill in missing values. Dataset separation for training, validation, and evaluation is done by hand. In addition, transformations such as normalization or feature extraction have to be done manually. Automation is non-existent in this traditional process, which is not easily reproducible. As a result, collaboration between your ML developers, ops team, and data scientists becomes incredibly challenging.

During the model training phase, the training process is usually run on local machines or single servers without any automation or optimization frameworks. Instead, the models are trained and tested by launching scripts. Cross-validation can ensure the model generalizes effectively, but this isn't automated. Meanwhile, the deployment process involves exporting the model to a file and moving it to a server manually or with some loading and serving scripts. Monitoring consists of periodically checking logs and dashboards to ensure the system runs as expected.

And old-school ML workflows face even more issues. These include challenges related to data scaling, tracking and repeating experiments, deployments, and keeping up with model performance. On top of all of that, traditional processes result in barriers between ML engineers, data scientists, and operations teams because they lack a common, integrated platform for model development and releases.

In summary, standard machine learning pipelines involve:

Manual Integration: Most steps rely heavily on human intervention, which are prone to mistakes.

Siloed Workflow: Data scientists, engineers, and IT operations work in silos with minimal collaboration and communication.

Limited Reproducibility: Reproducing experiments and training models is difficult due to a lack of version control for data, code, and models.

Ad-hoc Monitoring: Model performance and system health monitoring aren't systematic. This makes it challenging to detect and respond to issues quickly.

Lack of Scalability: Scaling the process is difficult, as the infrastructure isn't designed for continuous and large-scale operations.

These pipelines have often proven inefficient and error-prone, which has led to the development of new practices to fix these problems. MLOps does this by adding more automation, integration, and collaboration across the whole machine learning lifecycle.

Introduction to MLOps

Machine learning systems are also software systems, and as such, DevOps practices apply to them as well. As a prerequisite, all MLOps projects should be built on DevOps foundations. With it, MLOps brings automation and integration to model building workflows because the whole process, including all the testing and checks is codified and kept in a centralized repository. Devops is a collection of technical and management practices that were invented to speed up the process of releasing well-engineered software. Which ensures speed, stability, scalability, and security. The base components of DevOps are:

Continuous integration (CI) - the process of automated testing and improving the software quality based on those tests, all of which is run on a build server or platform.

Continuous delivery (CD) - an automated method for deploying the software with minimal human intervention.

Microservices - a practice of splitting big software projects into small services with distinct functions and few dependencies.

Infrastructure as code (IAC) - defining the infrastructure components using code that is checked into a source repository and deployed using automation.

Monitoring and instrumentation - crucial for making informed decisions about any system's performance and reliability.

Actively managing projects - involves breaking them into small parts. This lets you track progress easily and avoid striving for perfection.

MLOps is a behavior and method. We can measure its success by counting how quickly ML models go into production or measuring the model’s impact on business ROI. However, the most important metric is the model's operational efficiency. The cost, uptime, and staff needed to maintain an MLOps system predict the machine learning project's success or failure.

MLOps Workflow: A Detailed Overview

MLOps has made machine learning pipelines more integrated, automated, and robust. It ensures better collaboration between data scientists, ML engineers, and operations teams while simultaneously enhancing every step of traditional workflows in some way.

In MLOps, data collection and preprocessing is done by automated Extract, Transform, Load (ETL) ingestion pipelines. Tools like Data Version Control (DVC) are used to version datasets, ensuring reproducibility while scripts for data cleaning and transformation are automated and orchestrated using tools like Apache Airflow or Kube Flow pipelines.

MLOps uses workflow orchestration tools to define training pipelines, enabling model training to trigger automatically based on new data or changes in the codebase. Those tools (like MLFlow) also include automated model versioning, which ensures that each model can be tracked and reproduced.

Another difference between machine learning-powered pipelines and traditional workflows is evaluation, which is done continuously. This process uses a separate dataset to test the model performance, compare it with previous versions and ensure improvements. Evaluation metrics are automatically reported and often integrated with dashboards and alerting rules for easy monitoring. Automated checks verify the required criteria to make sure the model is good enough to be released. The deployment is automated using a CD system and does not need human intervention.

MLOps models offer a good an effective basis for containerized microservices to ensure consistent environments across development, testing, and production. This is in part because the whole infrastructure is managed using IaC tools like Terraform or AWS Cloudformation. Containerization of the models makes them easy to scale. Serving platforms like TF Serving, KFServing, or AWS SageMaker help with this. They enable both vertical and horizontal scaling.

On top of all of this, MLOps offers a few additional features. An automated pipeline handles both real-time inference and scheduled batch job processing efficiently. The monitoring and maintenance of model performance and health can be managed with tools like Prometheus, Elastic, or Grafana. Alerts can be set up for any anomalies, enabling drift detection monitoring to detect drops in model performance caused by statistical changes in the incoming data. In such an event, automated retraining pipeline can be triggered either when the drift is detected or a new dataset is gathered.

The key characteristics of MLOps enhanced pipelines are:

End-to-End Automation: Automation spans the entire machine learning lifecycle, from data ingestion to model deployment and monitoring.

Improved Collaboration: Automated pipelines improve coordination between data scientists, engineers, and operations teams due to standardized processes and shared tools.

Scalability: MLOps workflows are designed to scale easily with increasing data and model complexity.

Reproducibility: The use of version control and automated processes ensures that models can be reproduced and audited.

Continuous Feedback: Continuous monitoring and feedback loops help in maintaining and improving model performance.

Infrastructure Management: Infrastructure is managed as code, ensuring consistency and repeatability across environments.

	Traditional machine learning workflow	MLOps workflow
Integration	Manual. Steps rely heavily on human intervention prone to mistakes.	Automated. Integration of different steps with minimal human intervention reducing errors.
Collaboration options	Limited. Data scientists engineers and IT operations work in silos.	Improved. Standardized processes and tools ensure better collaboration between data scientists ML engineers and operations teams.
Reproducibility	Limited. Reproducing experiments and training models is challenging.	Enhanced. The use of version control for data code and models ensures reproducibility and auditability.
Monitoring	Ad-hoc. Model performance and system health monitoring aren't systematic.	Continuous monitoring and feedback loops help maintain and improve model performance.
Data gathering and separation	Manual. All done manually including data cleaning and transformation.	Automated ETL pipelines and orchestration tools manage data collection cleaning and transformation.
Scalability	Challenging. Scaling the process is difficult because the infrastructure isn't designed for continuous and large-scale operations.	Scalable. Designed to scale easily with automated deployment and containerization.
Deployment	Manual. Deployment involves exporting the model and moving it to a server manually.	Automated. Deployment is automated using CI/CD systems ensuring minimal human intervention.
Infrastructure management	Managed manually leading to inconsistency.	Managed as code ensuring consistency and repeatability across environments.
Experiment tracking	Minimal. Tracking and repeating experiments is difficult without automated tools.	Comprehensive. Automated experiment tracking and versioning tools ensure each model can be tracked and reproduced.
Feedback loops	Non-existent. Lack of continuous feedback makes it hard to improve models in production.	Continuous feedback and monitoring help in proactive maintenance and model improvement.

Benefits of Implementing MLOps

MLOps delivers several significant benefits(4) while solving traditional challenges faced in machine learning projects. One of the primary advantages is how MLOps addresses scalability by automating various workflows. This means that as projects grow, the system can handle increased demand without requiring a lot of manual effort. Additionally, MLOps enhances experiment tracking, ensuring that models can be consistently replicated. This is crucial for maintaining accuracy and reliability in machine learning applications. This approach also improves model performance monitoring and management once the models have been released, ensuring that they continue to perform well and adapt to any changes in the data.

MLOps offers more advantages beyond these solutions to traditional challenges. It enables faster deployment cycles, meaning that new models and updates can be rolled out more quickly, keeping businesses agile and responsive. These workflows also support better regulatory compliance and security, which improves protections for sensitive data and adhering to industry standards. Finally, by streamlining processes and enhancing efficiency, MLOps enhances the ROI for AI initiatives, making it a valuable strategy for any organization looking to leverage the power of machine learning.

Case Studies

Amazon scaled its e-commerce machine learning with MLOps by automating tasks and improving tracking. This improved its recommendations and stock management.

Airbnb also used MLOps to streamline ML processes, ensuring reproducible models and quick deployment. As a result, the business is now better at predicting preferences and refining prices.

Spotify employs MLOps for its recommendation algorithms. This strategy scales models and ensures a consistent user experience across different markets.

Capital One turned to MLOps for AI regulation and security. It now manages data in a security compliant way. This move has increased trust in its financial services.

Real-world case studies show the powerful impact of MLOps in various industries.

For instance, one leading e-commerce company faced difficulties in managing and scaling its machine learning models due to rapid growth and an increasing volume of data. By adopting MLOps strategies, the organization automated workflows, streamlined experiment tracking, and ensured consistent model reproducibility. This transformation led to more efficient deployment cycles and improved model performance monitoring, enabling the company to adapt quickly to market changes and enhance customer experiences.

Another case involves a financial institution that needed to ensure regulatory compliance and security for its AI initiatives. MLOps provided robust solutions for better compliance and secure management of sensitive data, resulting in increased trust and reliability in their services.

AI Image Sorting

Discover the Power of ML in image sorting with Reffine

As we explore the intricacies of MLOps and its transformative impact on AI workflows, it's essential to recognize how AI can revolutionize specific business operations.

At Reffine, we created an AI Image Sorting Tool tailored to streamline digital showrooms and enhance user experiences. It:

Saves time: Automatically sort images in seconds, reducing manual effort.
Boosts productivity: Allow your team to focus on high-value tasks.
Helps you scale effortlessly: Manage thousands of images consistently across multiple locations.

Learn more

Future Trends in MLOps

Future trends in MLOps are shaping the way organizations develop, deploy, and manage machine learning models. Here are some essential concepts that will guide the use of these workflows:

Automation will play a key role in MLOps. It will be driven by advances in automated machine learning (AutoML) and automated data preprocessing. This will reduce the need for manual intervention, allowing data scientists to focus more on strategic tasks.

Better Collaboration Tools: MLOps is becoming more common. Tools that help data scientists, engineers, and business stakeholders work together will emerge and integrate seamlessly with existing development environments, improving efficiency and communication.

Edge MLOps: With the growth of IoT devices and edge computing, MLOps will extend beyond the cloud. This means deploying and managing machine learning models directly on edge devices, which can process data locally and reduce latency.

Explainability and Transparency are key to making machine learning models easier to understand and less hidden. This trend is driven by the need for regulatory compliance and the demand for ethical AI practices. Meanwhile, the tools and frameworks that provide insights into how models make decisions will become more sophisticated.

Future MLOps will have more advanced techniques for monitoring and maintaining models in production. This includes real-time monitoring, anomaly detection, and automated retraining to ensure they remain accurate over time.

Integration with DevOps: The lines between MLOps and DevOps will continue to blur. Organizations want more integrated development processes, leading them to add MLOps to DevOps pipelines. This will unify application and model development.

Security and Compliance are critical. Machine learning models are becoming common in vital applications. Ensuring their security and compliance with rules will be crucial. Future MLOps frameworks will add strong security measures and compliance checks to protect data and models from breaches and ensure adherence to legal standards.

Conclusion

MLOps is built by applying Devops principles to machine learning projects. It is extended by additional best practices specific to machine learning, related, for example, to dataset management and model monitoring. It adds automation, scalability, collaboration and observability. Already, MLOps is becoming vital for machine learning business applications because it increases ROI and sustainability. Why not reap the benefits?

1. The Curious Case of Connectionism, Berkeley, Istvan S. N., Open Philosophy, 2019, pp. 190–205, doi: 10.1515/opphil-2019-0018. S2CID 201061823

2. Connectionism and the emergence of mind, Flusberg, S. J., & McClelland, J. L., In S. E. F. Chipman (Ed.), 2017, in: The Oxford handbook of cognitive science, Oxford University Press, pp. 69–89

3. Machine Learning Operations (MLOps): Overview, Definition, and Architecture, D. Kreuzberger, N. Kühl and S. Hirschl, in: IEEE Access, vol. 11, 2023, pp. 31866-31879, doi: 10.1109/ACCESS.2023.3262138

4. Who Needs MLOps: What Data Scientists Seek to Accomplish and How Can MLOps Help?, S. Mäkinen, H. Skogström, E. Laaksonen and T. Mikkonen, 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN), Madrid, Spain, 2021, pp. 109-112, doi: 10.1109/WAIN52551.2021.00024

Other resources:

Practical MLOps, Noah Gift, Alfredo Deza, O'Reilley, 2021

How AI Works From Sorcery to Science, Ronald T. Kneusel, 2023

One Company, 100 Different Markets: The Strengths of Smart Customization in Automotive Digital Solutions

Each market has unique requirements, preferences, and regulations that must be addressed to deliver an optimal customer experience. This necessitates smart customization in automotive digital solutions to enable companies to cater to diverse market needs effectively.