Breaking the Silos Between Data Scientists, Engineers & DevOps with New MLOps Practices

Published on
03/02/2021 03:55 PM

Effectively bringing machine learning to production is one of the biggest challenges that data science teams today struggle with. As organizations embark on machine learning initiatives to derive value from their data and become more “AI-driven” or “data-driven”, it’s essential to find a faster and simpler way to productionize machine learning projects so that they can make business impact faster.

The Challenges of Productionizing Machine Learning

The challenge of productionizing ML is real — studies show that over 80% of AI projects get stuck in the development phase, are partially successful, or end up consuming far more time and resources than was projected. The different tasks required to successfully deploy ML initiatives into production takes up a serious amount of time, effort, and energy. It takes about 8 to 24 weeks to build a good model if teams have access to good data. However, many organizations spend 18 months or more trying to get their models into production.

While there are several reasons for enterprises failing to complete and/or derive value from their data science initiatives, one of the most painful is the fact that often data scientists, engineers, and DevOps teams work in silos.

Data science teams are often cut off from DevOps and engineering teams and typically use their preferred open-source tools in their local environment (code IDE, Jupyter notebooks, etc.) for building models. However, it’s easy to reach a point where these tools are unable to suitably scale along with the data in terms of memory, processing, training, and deployment.

For one, traditional machine learning workflows consist of several different steps ranging from extracting and preparing datasets for training to training, validation, and deployment. The real problem arises when teams need to productionize models and create a data product out of all the component artifacts (code, data, and model) and do so at scale. Models in production have to handle incoming streams of data from various operational databases in real-time, prepare it at scale using distributed frameworks, train the model with several permutations and parameters and incorporate real-time features needed to serve the model.

Furthermore, teams need to incorporate a feedback loop and a defined process to version and support the model and its constituent components in a production setting. This would enable teams to track when something breaks or identify model drift, thus triggering maintenance activities and retraining of the model.

All this becomes really complex once data scientists need to transition the entire work done on their Jupyter notebooks to a production environment and distribute at scale. And every time there’s changes to the data/data preparation logic, the customer requires new features to be added or the model needs to be retrained (due to drift), the entire cycle has to be repeated again.

Furthermore — not only do teams need to manage the software code artifacts but they also have to handle the machine learning models, data sets, parameters, and the hyperparameters used by said models. What’s more, these artifacts need to be managed, versioned, and promoted through various stages before being deployed to production, making it harder to achieve repeatability, reliability, quality control, audibility, and versioning through the lifecycle.

Doing this in a traditional engineering environment is extremely difficult, if not impossible), leading many enterprises to abandon their data science initiatives mid-process.

Also, data science and engineering teams are composed of experts with different skill sets, working processes, backgrounds, and varying degrees of exposure and preferences for open source tools. Essentially, each team ends up working in silos (using different tools, methodologies, and practices), leading to friction, stifled collaboration, and inadequate knowledge-sharing between teams, ultimately increasing the complexity of ML workflows and pipelines.

Solutions for Consideration

There’s a need for advanced technologies and solutions engineered to take cognizance of all these challenges and enable teams to build, train, and deploy machine learning models quickly and easily. This will reduce friction, facilitate collaboration, and accelerate machine learning production and operationalization.

Aside from this, deploying AI effectively and at scale requires enterprises to leverage agile development practices and a governable, scalable solution to facilitate end-to-end collaboration among teams, enabling them to automate most of the complexities inherent in deploying models to production.

On one hand, data scientists work with data to extract features and develop/train models that work best with the data to achieve predictive and prescriptive insights. As such, they need tools for data wrangling, preparation, rapid prototyping, data visualization, parallel experimentation, and model training at scale.

On the other, machine learning engineers need tools to develop products and solutions that leverage these models into a service or application that delivers business value. This requires that both teams collaborate seamlessly and that models in production are efficient, secure, reliable, and scalable.

MLOps emerged as the silver bullet to these challenges. It’s a logical methodology that standardizes and automates data science and machine learning lifecycles while enabling seamless communication/collaboration between teams. MLOps, in its truest form, aims to accomplish the following objectives as efficiently as possible:

  • Streamline and standardize machine learning lifecycles to prepare for increasing regulation and policy
  • Create a truly cyclical and easily reproducible lifecycle for the modern ML model
  • Improve model versioning, monitoring, tracking and management
  • Enable knowledge-sharing, enhance collaboration and reduce friction between teams
  • Reduce the time and difficulty of pushing models into production

Optimizing Your Machine Learning Workflows

Implementing MLOps enables data scientists, ML engineers and DevOps teams to work together and seamlessly scale their processes around model training, data management, and deployment.

To build a seamless ML workflow, you first need to understand the business context and value of the model, the KPIs/success metrics of what the model should achieve, and the expected ROI once the model is deployed to production. Since the outputs of the model can’t be consumed in and of themselves, you’ll need to figure out a way to integrate the insights generated by the models into components to internally or externally deliver the results to consumers.

However, successful products are built by teams of people across various functional areas within the organization. Teams often create models in one language and then rewrite when the need arises to deploy on another framework. This introduces significant risks and significantly delays deployment.

To ensure smooth machine learning workflows, we need to embrace tools and techniques that simplify and facilitate seamless migration of code from exploration and development to production. This will enable teams to move from exploration and development to production faster in a scalable, repeatable, and controllable way. We also need to enable disparate teams to create a culture of collaboration and overcome the political, cultural, and technology barriers by adopting new ways of working on data science projects.

Speeding Up Model Deployment by Eradicating the Silos

Enterprises need to build a pipeline that consists of modular, reproducible workflows that can be modified and reused for future deployments and leverage an end-to-end orchestration tool that streamlines the entire pipeline.

This is where robust, open-source technologies like MLRun (Iguazio's open-source pipeline orchestration framework) come in. MLRun enables DevOps teams to work with the tools they’re most comfortable with  and seamlessly move the tested code into a distributed compute environment (or complete a machine learning pipeline) with a single command.

MLRun integrates with the Nuclio serverless functions technology and Kubeflow Pipelines to facilitate robust and effective ML pipelines by automating the MLOps process and bringing CI/CD + practices to data science projects.

The Iguazio Data Science Platform 

The Iguazio Data Science Platform incorporates all of these technologies together, enabling enterprises to bring machine learning projects to production at scale and in real time, in an open and managed environment, while cutting time to production and costs of AI infrastructure.

 

Join 34,209 IT professionals who already have a head start

Network with the biggest names in IT and gain instant access to all of our exclusive content for free.

Get Started Now