How AI Model Testing and Monitoring Protect Retailers’ Bottom Lines


According to the World Economic Forum, retail spending on AI is predicted to grow from $4.9B in 2021 to $52B by 2029. Artificial Intelligence and Machine Learning (AI/ML) have the potential for far-reaching impact on retailers and brands. High-quality ML systems have been proven to deliver real ROI in use cases such as marketing propensity, search, recommendations, forecasting, improving customer experience, supply chain management, fraud detection and prevention, and more. 

Here are some ways retailers are already using AI and ML today: 

Marketing Propensity

  • Track how propensity for products and brands changes over time (e.g. seasonality, drift) overall and for important segments
  • Understand root causes of propensity drift and seasonality to inform retraining, feature engineering and stakeholder collaboration
  • Automatically test, evaluate, explain and debug large numbers of models to save time and improve performance


  • Track forecasting errors (e.g., MAPE) overall and at a segment level across model runs
  • Report and debug error drift by understanding segment level errors and automatically determining how much features contribute to drift
  • Inform retraining strategies with advanced evaluation, explainability, and RCA. Automatically test retrained models

 Customer Experience and Churn

  • Monitor and debug overall and segment scores, accuracy drift from changes in customer behavior, competition, etc.
  • Analyze and improve customer segment accuracy for segments like usage, profitability, tenure, etc.
  • Ensure models do not display significant unfair bias
  • Explain drivers of churn predictions to inform customer actions


  • Monitor and understand the root causes of false positives and negatives. Accurately explain to stakeholders (investigators, CSRs, etc.) and reduce errors
  • Systematically iterate on model accuracy and adversarial response with monitoring, RCA, improved retraining strategies, and automated testing of retrained models
  • Analyze and improve segment (e.g., source of data, products, customer type, etc.) performance

But these AI models only deliver quality results when the models themselves are of high quality. There are two challenges that retailers face in getting models to be effective at driving business results. The first challenge is developing a quality model in the first place. According to Gartner, 54 percent of Machine Learning (ML) models never make it into production, primarily because data scientists can’t demonstrate their effectiveness.

Better quality assurance (QA) practices can address these challenges. Think about where software deployment was in the 1990s – inconsistent quality led to slow adoption in enterprises. The technology wasn’t yet trustworthy. When automated software testing and monitoring solutions emerged, enterprise software quality greatly improved, and companies adopted software rapidly, leading to a great technological revolution.

ML is at a similar inflection point. Over the past year, QA tools have emerged that are making a big impact on model quality, helping companies accelerate their AI deployments. Specifically, automated test harnesses are helping ML teams ensure model quality during development, so that retailers can have confidence that the models will perform well. And that leads to the second challenge - ensuring that models stay performing in live use. There is good news on that front, too -  automated monitoring tools are helping to ensure models continue to perform as expected in production. Key benefits of automated testing and monitoring include:

Driving performance and quality

Monitor, debug, and test models throughout the lifecycle for more optimal performance outcomes. 

Improving AI trustworthiness

Leverage powerful explainability tools to gain new visibility and understanding into model behavior to improve outcomes and stakeholder collaboration.

Scalable support for ML portfolios

Integrate and scale essential monitoring and testing capabilities in the AI stack - across a large volume of models and model types -  while saving time and costs for ML teams.

Ensuring that model performance drives business performance - a use case example

In one case, a major retailer was using a machine learning model to provide optimized responses to search requests on their website. For example, the model ensured that the company was offering an optimized list of barbecue grills when a site visitor types “barbecue grill” into the search box. The data science team developed a new search optimization model that, overall, had higher accuracy - it served recommendations upon which users were most likely to click for more information. However, when they looked closely at the performance of the model across their key business segments, they noticed that the new model actually underperformed the prior model in terms of maximizing the revenue conversion of the most profitable segment of customers. When 20% of your customers produce 80% of your profits, a model that underperforms on the smaller segment that actually drives your overall profits is not the right model to promote to production. Since the retailer had the ability to look closely at model performance by segment and its relationship to the overall business goals, they were able to ensure that the truly higher performing model stayed in production.

While AI can provide tremendous benefits to retailers, companies need to be careful in how and what they deploy. Testing and monitoring play a key role in ensuring that retailers achieve the results that they intended, and achieve a high return on their AI investments.

Will Uppington is the co-founder and CEO at TruEra.