Machine Learning (ML) has business potential in almost all industries, and therefore more and more companies are experimenting with its value. Experiments are necessary, but rarely create lasting change and value in a company. It is only when the ML application is successfully put into production and an operational setup is designed around it that insights and automation can unfold day after day.
The transition from experimenting with ML to launching business-critical ML applications is often difficult and can create nervousness in any IT organization. ML solutions are complex, and there are several moving parts at play compared to a typical software application. This is because data is constantly changing, where the model is trained on yesterday's data but needs to be applied to production data - tomorrow's data. In the pursuit of optimizing ML models, its characteristics, called hyperparameters, will also change over time.
From a top-down perspective, an ML project can be divided into four phases: business clarification, development, deployment, and operation. Many companies want to "release" the developers, which is the company's data scientists, from the original ML model when it goes into production so that developers can do what they do best, and the model is instead handled by a centralized unit - such as an operations or support team in IT or the BI department. It can be a good idea, but it is not entirely simple and there is a risk that models in production will deliver poor performance and incorrect and unintended results to the detriment of business value and reputation.
In recent years, the concept of ML Ops has emerged, which frames a set of best practices for the ML model lifecycle management process. Here, principles from traditional software development are adapted to ML and contribute to standardization and streamlining of processes to the extent possible. In this blog post, we will bring ML Ops down to earth and point out five elements that we find improve the chances of successfully launching ML applications and harvesting scalability opportunities where they exist.
1. Keep the lights on
A machine learning (ML) application will often output numbers such as sales forecasts, predictions of which machines should be maintained to avoid breakdowns, or how the gates at a treatment plant should be adjusted to withstand the pressure of an upcoming rainstorm. The most important aspect of an ML system is that it provides valid information to the business and that the solution is up and running.
To ensure that the solution is running, work must be done on updating strategies, rollbacks, and life signals from the application. If you have a business-critical ML solution, you can only underestimate the importance of incorporating tests that help ensure that an update is not sent that inadvertently breaks something that previously worked well.
It is therefore important to put on the end user's glasses and design a range of tests that truly check the functionality and results that the end user values the most. For example, integration with the database should be tested, the model's predictive ability on a known data set, or the UI functionality if it is important to the user.
2. Continuous monitoring of validity strengthens creditbility
Data forms the basis of any ML model, and as previously described, data is always in motion - whether it's text, images, sound, or tabular data. The fundamental characteristics that define new data can suddenly deviate from what the model was trained on. For example, the COVID-19 pandemic has posed challenges for many production forecasting models, which struggle to automatically handle such a significant disruption to the data. COVID-19 is a clear example that we can all relate to, but often the danger lies in the unseen. An ML model often becomes less and less accurate over time as the underlying data characteristics change.
One solution, which is often within reach, is to retrain the model to ensure that it adapts to new data. Whether it is advantageous to automatically retrain a model often depends on the domain, but many times there is a need for a manual analytical view of the model's performance and characteristics to ensure that the retrained model remains valid. For this process, it is helpful to work with a gold dataset, which the business is accountable for, validates, and maintains on an ongoing basis. This provides a common language for performance and thus a more unambiguous understanding of whether the retraining or model update was beneficial or not.
Similarly, it is advantageous to critically examine the ML model's predictions on an ongoing basis. Here, one can work with control groups or A/B tests to illuminate what actually happens when the business does not act on the model's results. This reveals self-fulfilling prophecies, where, for example, customers change their behavior because the business contacts them as a result of the ML model's predictions.
3. Find the scaling potential
The development of ML models is highly tailored to the individual application, and the development phase therefore does not scale well across different applications. For example, the HR department's model for flight risk among employees will only be sparsely applicable to inventory forecasting in the sales department. Of course, this does not apply to areas that can be seen as more standardized, where the development of a power forecast model for one wind farm can be adapted to new wind farms. Fortunately, there are other areas in the supply chain where scalability is truly achieved.
Often, deployment pipelines for an ML application can be used directly in the next one, as many of the technology requirements often recur. For example, they must be able to interact with an SQL server as well as a range of cloud services, install certain Python/R packages, and be part of a Docker context to ensure stable execution capability over time. One of the newer additions is the ability to configure multistage pipelines, which define the Continuous Integration and Continuous Deployment phases based on source code. Similarly, the configuration of the infrastructure on which the solution will run can be specified from code, called infrastructure-as-code, and together they create the perfect framework for scaling, as deployment pipelines are based on source code that can be reused.
Similarly, standardization can be leveraged within operations. An ML application is often seen in the context of other data applications, where, for example, the nightly ML prediction job should run right after the data in the data warehouse is updated. Therefore, triggering and scheduling of the applications are handled centrally, and thus the pipeline that handles the ML application can be reused across applications. Processes for setting up and monitoring application logging, setting up alarms, and creating support tickets, among others, can also be reused.
4. Who does what?
Two ways to organise within data science:
Overall, there are two models for organizing data science profiles. The central and functional organizational model, where machine learning and data science profiles are centrally located in a shared service function. The advantage of this model is that there is not much idle time, as the critical mass is larger, and knowledge sharing has optimal conditions. The disadvantage is that data science is not directly tied to a critical business area, making it more difficult for the business to understand how the team can be utilized. Ideas often arise from the data science team, and thus may not necessarily align with the business's roadmaps.
The decentralized organizational model, where competencies are tied to the business's responsibilities and products, is used by many large companies where data insights are central to their business. Here, data science profiles sit alongside data engineering teams, often under the same product managers, and are thus directly linked to a business-critical function. The disadvantage is that knowledge sharing and sparring is more difficult, as the team of data science profiles is often smaller in this model.
Of course, a mix of the two models can be used, with a greater focus on project-based work, which can introduce overhead when it comes to knowledge sharing and communication.
The right choice, of course, depends on the company's size and assessment of the potential with data science and machine learning. The right choice depends on whether the company has committed to a data strategy where data science is an integrated part of the decision-making process or whether data science is viewed more as a provider of statistics and ad hoc analyses.
Role distribution for deployment:
The responsibility distribution for the development of the ML model is clear. However, the role distribution for the deployment and operation of ML applications is often under discussion, and there is no single responsibility model that works everywhere. When the responsibility for the model is transferred from the developer to the next person in the chain depends on the maturity level of ML in the company, as well as the experiences with deployment frameworks and DevOps methods in the development team. Before the ML model is tested in a test environment, it is not really tested where the battle must be fought. This speaks for keeping developers as an important part of the deployment process, also because it is difficult for others to understand the errors that occur along the way. For example, is it due to the source code, integration with external services, errors in the Docker image, or is the hardware the bottleneck? Newer cloud tools make it easier to increase reusability and thereby get developers closer in the process.
Role distribution for operations
In large organizations, there is often a central unit responsible for operating ML and other data applications. Given the complexity involved, it can seem like an overwhelming task to keep something running in production that is constantly changing and that one has not developed oneself. To address this, it is good to formulate a set of concrete requirements and standards for developers to adhere to before the model is handed over to production. The solution should be set up with logging, alarms, documentation of error types, and a support plan, so it is clear who does what in various scenarios. It is also advantageous to introduce an Acceptance Gate, where the operations team evaluates whether the model's support tools meet the requirements before the operation of the ML model is handed over.
5. Standards, standards, standards
Just like standards help the operations team in the handover of the ML model, you can work with recognizability and structure in the deployment phase. ML can typically be divided into some overarching architectural groupings such as batch, online, and streaming models. Here, you can maintain deployment templates within each type that new projects can use.
It is also an advantage to choose common tools for, for example, code storage, tracking of ML experiments, storage of ML model objects - such as AzureML, store private Python packages, etc. It can also be as simple as ensuring a common folder structure and imposing a brief description of how to get started with an ML project, so developers can go cross-functionally and already know folder structures and file names. An example is Cookiecutter, which can be used to create a template for starting new projects.
Often, ML replaces an existing manual process in a business that was easier to understand and keep running. Therefore, an ML application has the important task of doing better and preferably creating visibility around the improvement. There is enormous potential in ML applications, and unfortunately, neither developing the models nor addressing the transformation around ML models in production is entirely easy.
In this blog post, we have attempted to describe some of the factors that can demonstrate where you can expect reusability and scaling within the ML lifecycle while also showing the way to establish robust ML systems that are validated, thoroughly tested, and help the business reap the potential day after day.