AI is beyond the hype and entering a phase of industrialization. The belief about the imperative of AI is widespread— universally across industries. However, only about 5% of global enterprises is ready for industrialized growth of AI. To reap the real benefits, organizations need to be able to scale AI solutions. This all sounds obvious: everybody talks about AI and scaling is the name of the game.
But what does scaling AI mean exactly? And what requirements does that pose to an organization’s data and technology stack? How is that different from most legacy infrastructures? And how do you build the technical capabilities to meet these new requirements?
Scaling AI isn’t easy. Getting one or two AI models into production is very different from running an entire enterprise or product on AI. And as AI is scaled, problems can (and often do) scale, too. For example, one financial company lost $20,000 in 10 minutes because one of its machine learning models began to misbehave. With no visibility into the root issue — and no way to even identify which of its models was malfunctioning — the company was left with no choice but to pull the plug. All models were rolled back to much earlier iterations, which severely degraded performance and erased weeks of effort.
Organizations that are serious about AI have started to adopt a new discipline, defined loosely as “MLOps” or Machine Learning Operations. MLOps seeks to establish best practices and tools to facilitate rapid, safe, and efficient development and operationalization of AI. When implemented right, MLOps can significantly accelerate the speed to market. Implementing MLOps requires investing time and resources in three key areas: processes, people, and tools.
Processes: Standardize how you build and operationalize models.
Building the models and algorithms that power AI is a creative process that requires constant iteration and refinement. Data scientists prepare the data, create features, train the model, tune its parameters, and validate that it works. When the model is ready to be deployed, software engineers and IT operationalize it, monitoring the output and performance continually to ensure the model works robustly in production. Finally, a governance team needs to oversee the entire process to ensure that the AI model being built is sound from an ethics and compliance standpoint.
Given the complexity involved here, the first step to making AI scale is standardization: a way to build models in a repeatable fashion and a well-defined process to operationalize them. In this way, creating AI is closely akin to manufacturing: The first widget a company makes is always bespoke; scaling the manufacturing to produce lots of widgets and then optimizing their design continuously is where a repeatable development and manufacturing process becomes essential. But with AI, many companies struggle with this process.
To standardize, organizations should collaboratively define a “recommended” process for AI development and operationalization, and provide tools to support the adoption of that process. For example, the organization can develop a standard set of libraries to validate AI models, thus encouraging consistent testing and validation. Standardization at hand-off points in the AI lifecycle (e.g., from data science to IT) is particularly important, as it allows different teams to work independently and focus on their core competencies without worrying about unexpected, disruptive changes.
People: Let teams focus on what they’re best at.
AI development used to be the responsibility of an AI “data science” team, but building AI at scale can’t be produced by a single team — it requires a variety of unique skill sets, and very few individuals possess all of them. For example, a data scientist creates algorithmic models that can accurately and consistently predict behavior, while an ML engineer optimizes, packages, and integrates research models into products and monitors their quality on an ongoing basis. One individual will seldom fulfill both roles well. Compliance, governance, and risk requires an even more distinct set of skills. As AI is scaled, more and more expertise is required.
To successfully scale AI, business leaders should build and empower specialized, dedicated teams that can focus on high-value strategic priorities that only their team can accomplish. Let data scientists do data science; let engineers do the engineering; let IT focus on infrastructure.
Tools: Pick tools that support creativity, speed, and safety.
Finally, we come to tools. Given that trying to standardize production of AI and ML is a relatively new project, the ecosystem of data science and machine learning tools is highly fragmented — to build a single model, a data scientist works with roughly a dozen different, highly specialized tools and stitches them together. On the other side, IT or governance uses a completely different set of tools, and these distinct toolchains don’t easily talk to each other. As a result, it’s easy to do one-off work, but building a robust, repeatable workflow is difficult.
Front running organizations that are able to industrialize AI at scale do the following three things exceptionally well:
- They set direction and follow course. AI adoption is a journey that is completed step-by-step. It requires vision, strategy and a game plan that combines quick wins with a route to scale. Sustained commitment from senior leadership is key to long term success. You have to go all in, but with smart focus and grit.
- They have a playbook for AI solution development and implementation. AI-powered business innovation follows a typical life-cycle from idea or proof of concept to a tested prototype, MVP and eventually production-grade solution that is implemented. A proven approach to doing this repeatedly enables to gradually develop and scale new AI opportunity areas and become a truly AI-powered organization.
- They have a technology platform that supports AI at scale. AI solutions pose new requirements for technology architecture. Companies that manage to scale AI have developed standardized platforms that allow rapid development of AI solutions in a robust and sustainable manner.
When picking MLOps tools for your organization, a leader should consider:
More often than not, there will be some existing AI infrastructure already in place. To reduce friction in adopting a new tool, choose one that will interoperate with the existing ecosystem. On the production side, model services must work with DevOps tools already approved by IT (e.g., tools for logging, monitoring, governance).
Whether it’s friendly for data science as well as IT.
Tools to scale AI have three primary user groups: the data scientists who build models, the IT teams who maintain the AI Infrastructure and run AI models in production, and the governance teams who oversee the use of models in regulated scenarios.
Of these, data science and IT tend to have opposing needs. To enable data scientists to do their best work, a platform must get out of the way — offering them flexibility to use libraries of their choice and work independently without requiring constant IT or engineering support.
MLOps tool must make it easy for data scientists to work with engineers and vice versa, and for both of these personas to work with governance and compliance. In AI product development, while the speed of collaboration between data science and IT determines speed to market, governance collaboration ensures that the product being built is one that should be built at all .
With AI and ML, governance becomes much more critical than in other applications. AI Governance is not just limited to security or access control in an application. It is responsible for ensuring that an application is aligned with an organization’s ethical code, that the application is not biased towards a protected group, and that decisions made by the AI application can be trusted.
Five principles to build a scalable AI platform
On the highest level, the scalability of AI requires platform capabilities that combine two previously separated domains in data architecture: the operational or transactional side of data and the analytical use of data for analysis or modeling. Below are five principles to build a scalable AI stack.
1. Algorithms as micro-services
The classical way to put a machine learning model into production is to build a pipeline that ingests the latest input data, runs the model with that input, and stores the output in a database. The application(s) using the predictions can subsequently fetch the data from this database. Although there is nothing inherently wrong with this approach, it is not very scalable. Modern ML use cases require predictions on demand, potentially with thousands of requests every second. So the prediction pipeline cannot be scheduled in advance. Moreover, it needs to be able to scale rapidly depending on the workload.
2. A factory approach to building and managing algorithms
Producing algorithms as micro-services requires an assembly line with proper quality management and control mechanisms. Sort of a six sigma methodology for the production of algorithms:
Standardized & automated workflow.
Instead of building a specific workflow for each individual use case, a standardized workflow (assembly line) is needed to enable scalable production of algorithms. Consider that models are not developed one off, but require ongoing maintenance and retraining. Hence, doing this at scale repeatedly for thousands of models requires a fully standardized and automated assembly line that can be replicated for any new AI solution.
Performance monitoring & inventions.
Six sigma aimed at reducing variance in the manufacturing process: the upper and lower limits of quality have to lie beyond 6 times the standard deviation, implying a defect rate of 3.4 in a million. Similarly, model predictions will fall within a bandwidth. Monitoring this out of sample performance, how the model performs in practice rather than on historic data, is critical to keep algorithms in check.
In Europe, new legislation requires organizations to be able to explain algorithmic decisions if they impact the customer. Responsible AI is becoming a hot topic. Therefore, the ability to trace back model versions, including the accompanying training data, is becoming a sine qua non. Like equipment and manufacturing parts have serial numbers, algorithms require a transparent and traceable production flow. The model assembly line therefore needs to store all relevant artifacts (inputs, outputs and scraps) in an organized manner.
3. Multiple data integration routes
The previous two principles are fully new and inherent to the arrival of AI. The third principle relates to data management and is often a source of struggle with legacy environments.
Innovation with AI requires more flexibility in data integration than most data architectures traditionally allow. Enterprise data warehouses are set up to provide a single source of truth on which data products can be built. The high integrity and efficient re-use of data comes at the cost of rigidity and high upfront investment.
4. Multi-pattern data exchange: from batch to event-based
Another key feature of the data architecture for scalable AI deployment is the support for multiple data exchange patterns, in particular for the use of real-time data. This is where traditional data architecture for IT integration — connecting the application landscape — merges with modern use of data for analytical purposes (providing real-time BI or AI).
5. Leverage cloud components for agile development
The fifth principle is to apply agile development methods to your platform infrastructure. Legacy data architecture and infrastructure is characterized by monolithic, vendor-based systems that were meant to function as one-size-fits-all platforms. These infrastructures have been notoriously behind the curve — rapidly becoming outdated or even obsolete even before they were fully implemented. Never ending platform migrations with nirvana end-state views became the norm.
In the race to scale AI and realize more business value through predictive technology, leaders are always looking for ways to get ahead of the pack. AI shortcuts like pre-trained models and licensed APIs can be valuable in their own right, but scaling AI for maximum ROI demands that organizations focus on how they operationalize AI. The businesses with the best models or smartest data scientists aren’t necessarily the ones who are going to come out on top; success will go to the companies that can implement and scale smartly to unlock the full potential of AI.