Processes enable the people
Originally posted on Kaskada’s “Machine Learning Insights” blog here.
MLOps can empower us as data scientists to bring more of our models to production faster. In part 1 we covered the ML lifecycle and in part 2 discussed how to select tools to instrument the ML lifecycle. Here in part 3, we’ll talk about how you can change your processes to enable people as you’re beginning to adopt MLOps at your company.
As a data scientist, if you didn’t come from a software engineering background, it’ll be helpful to read up on DevOps when you’re first looking to adopt new MLOps processes on your team. Instead of reinventing the wheel entirely, we can pull forward the ideas that worked and modify them to suit our needs. It will also be helpful to have a common touch point with the software and devops engineers on your team.
tl;dr: When DevOps was introduced into the software development lifecycle it was to accelerate shipping applications and keep them relevant with feedback — by closing the loop between development and operations. A lot of these ideas are transferable, but we’ll need a few adjustments because the ML lifecycle includes three axes: the data, the model, and the code itself.
Process flow before MLOPs
Remember the image below from part 1, illustrating the complexity of your current process flow? Your data science team and data engineering team are using different tools in your experimental and production environments. On top of all that tooling, your data science process might look something like this:
- Cleaning and transforming data manually for the problem you’re trying to solve
- Iterating through 30 different features to understand what’s predictive
- Training models and tuning parameters locally
- Validating model performance
- Handing your model off to an ML engineer to code
- Waiting for data engineering to build new online data sources
- Waiting for ML engineers to train and validating the production models
- Troubleshooting sub-par performance of the production models
- Picking the next analysis or model challenge of your quarterly goals list
Today’s process might enable your team to be able to make updates to your ML models in production once a quarter. The promise of MLOps is to empower us as data scientists to bring more of our models to production faster. Let’s take an example of how this might be possible with new processes to enable your teams.
Example process with new MLOps tooling
Now that you’ve introduced new tooling to instrument the lifecycle as discussed in Part 2 of this series, your team can institute new processes that allow for automation, collaboration, measuring performance, and auditing usage of data sources, features and models. With the right processes you can shorten the time-to-production to days instead of quarters seen in the figure below.
One thing to notice between the two lifecycles is the reduction in churn between various owners, on different teams, multiple times. A new process for you as a data scientist to kick off the week prioritizing model updates might look like this:
- Reviewing model performance in production
- Reading product team updates for upcoming product features
- Writing product feedback and hypothesis to ask for telemetry for new features
- Assessing new data sources available from the last round of product updates
- Prioritizing new theses to test and prod models to update
- Using your feature studio to access production data to engineer new features
- Experimenting with new features to test hypothesis
- Registering new features, hyperparameters and models in a library
- Writing tests to ensure new features stay within parameters
- Serving new feature APIs in staging with business case to prioritize for production schedule
- Validating and monitoring models
While the above is an example of what your data science team might do in a given week, your org will change as well. They’ll want to adapt new processes to enable their people, and teams to succeed as well. In the above example data engineers are no longer the sole owners of monitoring performance of the ML models in production, and they are not responsible for rewriting feature engineering code to bring models to production. Instead they’re utilizing the new feature APIs that the data scientists are building directly.
How will you utilize the platforms and newly gained visibility to take ownership over various parts of the lifecycle? This brings us to a conversation next time about the culture that enables MLOps. Stay tuned for part 4!