How Big Data Turns the Tide on Fraud and Shrink

Businesses, including retailers, are losing 5 percent of revenues to fraud every year, according to the Association of Certified Fraud Examiners (ACFE) 2014 Global Fraud Study. This is a staggering number, especially when applied to the 2013 estimated gross world product, which translates to nearly $3.7 trillion in potential projected global fraud loss.

Many apparel retailers are taking action to combat fraud by employing new technologies to monitor high-risk retail transactions using big data tools, real-world and proven statistical modeling, and predictive analytics. This is helping reshape retail loss prevention operations to deliver a better customer shopping experience, while effectively protecting company bottom lines.

Technology enhances the collection and monitoring of data
Retailers collect data from many sources, including store sales transactions, store video, traffic counters, alarms, merchandise movement, loyalty programs, ecommerce click paths and more. A large retailer collects millions of transactions and hundreds of millions of line items per day; to that, they add 30 to 60 GB of video per store, per day. For a 1,000-store retailer, this could total 22 petabytes per year (the equivalent of 23,068,672 gigabytes).

Conventional systems like exception-based reporting and data-mining systems uncover direct relationships that occurred in the past on a single identifier. But big data analytical tools take analysis to a new level by detecting the connections among seemingly unrelated identifiers to reveal underlying larger groups of transactions and individuals. For example, The Retail Equation's Verify-3® return authorization solution snaps into action to prevent fraud and abuse in real-time during the return process, not the day after, while simultaneously linking in all related information to an individual. This type of response is not possible with conventional systems because they simply cannot process the complex analytics  and deliver accurate answers fast enough to authorize a transaction in process.

Many companies and solution providers have approached the data size or analytics problems by investing in bigger, faster hardware, but where the work is isolated to a small number of machines. This works up to a point; however, the massive amounts of data building up in the system and the complex analytical methods required to unearth the information is more resource-intensive than most companies are willing to devote to the loss prevention function.

Big data-oriented companies achieve high-processing speeds by using special tools like the Hadoop platform to split data into thousands of chunks and distribute the load across a very large number of machines.  In fact, TRE has developed custom software to operate in this environment to tackle very tough analytical challenges.  Additionally, tools such as IBM's PureData (formerly Netezza) data warehouse appliances are also used for added throughput and operate on a similar parallel processing architecture, which decreases processing time by more than 90 percent versus traditional hardware/software deployments. This means that a query that may take five hours to process using SAS on a conventional single-server system takes only minutes on a Hadoop or PureData parallel processing architecture. In the field, a return authorization completes in milliseconds.

Knowing precisely what to look for
Predictive algorithms and machine learning techniques rely on big data tools to quickly improve the shopping experience and reduce return fraud and shrink simultaneously. Companies can process the data from all the transactions in the chain and identify suspicious behavior indicative of any form of return fraud/abuse including renting/wardrobing, returning stolen merchandise, receipt fraud, price arbitrage, price switching, double dipping, ORC, check fraud and tender fraud. When an individual attempts to make a return, systems perform calculations, and in a fraction of a second, predict the likelihood of whether the return is fraudulent.

While the vast majority of consumers (about 99 percent) are approved, shoppers whose actions are highly suspicious are warned or denied. This is an important point to notice: The system allows and supports generous return policies so profitable consumers enjoy a fast and pleasant return process, including those who make many, many returns. In fact, the most valuable consumers tend to have a very high number of returns, which is why it is best not to rely solely on simple return-velocity calculations, but rather use big data to identify fraudulent patterns of behavior in real time.

Complex queries also can be used to identify organized retail crime (ORC) rings and fraudulent returners by linking seemingly independent events. The following diagram shows a cluster of suspicious purchase and return events.

At the center of each dandelion-like cluster is one person (or one ID). The thin lines connecting the clusters show a "hidden" connection, such as a gift card being passed among conspirators. On their own, the clusters may appear to be legitimate (high-volume buyers often return many items), but the value of the returns exceed the value of purchases, and the connections indicate probable fraud by a group or an individual using multiple identities. Software-as-a-Service applications like Verify-3 can halt the group's returns immediately, and special reports can provide investigators with the information necessary to pursue a case.  The big data tools allow companies to maintain and update the linked identities on more than a billion linkages each day.

The amount of return-related fraud is a staggering — a $9 billion- to $16-billion-dollar problem according to the 2013 Consumer Returns in the Retail Industry report released in January 2014 by the NRF and The Retail Equation. Many of these losses are preventable using the technology available today. Fraudsters depend on system delays and lapses in judgment by the cashiers and associates on the front lines. However, when big data analytics replace subjective decisions, fraud and shrink diminish substantially, reducing return rates by an average of 8.2 percent and shrink by 12.95 percent. A $2-billion retailer would realize about $15 million in savings per year, and retailers see a steep decline in return rates beginning immediately after the system is live.

David Speights, PhD., is the chief data scientist at The Retail Equation, a firm that optimizes retailers' revenue and margin by shaping behavior in every customer transaction.
This ad will auto-close in 10 seconds