Solving Big Retail Challenges with Big Data

Press enter to search
Close search
Open Menu

Solving Big Retail Challenges with Big Data

By Nicole Giannopoulos - 09/10/2013
New models are emerging in retail in reponse to disruptions in the marketplace caused by the consumerization of technology and other mega-forces. One of the biggest has been in the area of big data. Giant retailer Sears Holdings has emerged as a leader in this area, embedding big data technology and skillsets into the overall enterprise.

Even in today’s data-obsessed analytics environment, collecting and keeping every tiny granule of retail data down to the level of each individual transaction and Web click might seem like overkill. But Sears Holdings has discovered that data storage costs are now low enough and analytical tools have grown powerful enough that this approach to big data has shortened the time needed for analytics projects by 60-70%, while also improving promotion conversions, lowering inventory levels and boosting sales.

“Traditional enterprises in various industries, including retail, have a huge opportunity to leverage the value of big data tools,” says Aashish Chandra, divisional vice president of Sears Holdings Corporation and GM of MetaScale. “The tools can help enterprises to become more efficient, transform to become more relevant to customers and provide new revenue streams for those who have the foresight to utilize it.”

The Big Data Journey
Sears Holdings is a large enterprise by any measure, with massive amounts of data and challenges that are typical of any large enterprise: difficulty meeting production schedules and service level agreements (SLAs), multiple copies of data with no single version of the truth, ETL (extract, transform, load) complexity and high cost of the software needed to manage it, data latency, enterprise data warehouses unable to handle the load, mainframe workload over capacity, and escalating costs.

The retailer wanted to build an agile enterprise that would be nimble, quick and operate at the speed of business. The transformation from legacy and proprietary technologies to a cloud-based, open-source big data platform allows for nimbler systems that support the business at materially lower costs. To do this, Sears worked with several different solutions before it selected Hadoop.

“That’s what triggered us to actually look into a different solution,” says Chandra. “And Hadoop wasn’t the first. We weren’t so lucky to come across Hadoop on the first go. We made mistakes on our way and eventually looked into Hadoop, less than three years ago, with a small proof of concept, including just eight nodes. Before we knew it we found it to be the answer to a lot of questions that we were unable to solve. That’s the reason we started to look deeper into Hadoop.”

“We have learned a great deal about many things,” continues Chandra. “From a Hadoop standpoint, our footprint has more than 1,100 nodes in our environment today, across multiple clusters such as production, integration and backup.”

Transforming the Enterprise
Sears credits its use of the Hadoop open-source framework for providing database solutions that can affordably store data in one place, apply tools to it and allow the retailer to consume it in an easy way. For Sears, Hadoop is the new mainframe, and it is treated and governed like one.

Although Hadoop has been around and used for analyzing unstructured data for some time, the retailer has become a leader in leveraging the platform for structured data in traditional enterprises to eliminate ETL complexity, data latency, costs and much more.

“Hadoop and the big data tools have the potential to be a game changer in mainstream enterprises,” says Chandra. The solution massively reduces data latency. With traditional ETL techniques it can take hours or even days between the time the data is created and the time it can be utilized. With modern big data tools latency can be reduced to minutes or seconds and transform the way ETL is thought about.

The idea is to use Hadoop as the single centralized data repository rather than as a rental space, and rely on dimensional modeling to turn it into an integration platform with low latency. While the concept of using a data hub isn’t new, what is different is to have this data hub under the complete control of the Hadoop ecosystem. “A retailer’s data strategy should include Hadoop framework in its overarching ecosystem,” says Chandra. “Hadoop has a place in that ecosystem; as a retailer, we need to use the right tools in the right place and for the right reasons.”

“We had multiple copies of data and no single version of the truth. We had really complex ETL as in traditional enterprises. Cost of software and cost of management for ETL were extremely high and the projects would take really long because of the time it took to set up ETL data sources. The data latency was so high that the time of producing the data to the time of actually leveraging that information would take days, even weeks in some cases. The data was stale by the time it came back,” he continues.

Improved Performance
A typical mainframe, ETL or data warehouse batch has a string of jobs with input, steps of computation and output. Sears has developed techniques to look at the highly CPU intensive part of that string of jobs and break it out. That data is then brought onto Hadoop, processed and put back into the mainframes or data warehouses.

“The output is exactly the same with no disruption to business, but behind the scenes we have cut the processing times manifold (10, 20, or even 50 times faster) while reducing the CPU consumption on mainframe or data warehouses significantly. Consequently, reducing our licensing costs as well,” notes Chandra.

The retailer not only simplifies and modernizes the code to make it easy to maintain, but it also improves the performance and significantly reduces costs by reducing workload on mainframes or data warehouses.

“We have successfully eliminated one mainframe machine from our footprint and are well on our way to removing another,” says Chandra. “We have saved millions in licensing costs from moving the workload off of ETL and the data warehouse. Eliminating mainframes for other large enterprises could save up to tens of millions of dollars.”

On the Leading Edge of Big Data
Sears Holdings operates under the mission to serve, delight and engage its members while they shop their way. With more than two years of broad and practical experience managing large Hadoop clusters and big data tools, the retailer found itself a new way to execute this mission. As a leading integrated retailer with an extensive customer loyalty rewards program and retail infrastructure that includes more than 2,500 stores, Sears Holdings was on the leading edge of big data management and analytics.

Leveraging its Fortune 100 heritage and experience with big data tools, Sears Holdings launched a venture with MetaScale, a big data start-up and wholly owned subsidiary, to help enterprises unlock the potential of its data. MetaScale offers end-to-end services for Hadoop and big data, including design and build, hosting, performance tuning, training and managed services.

As GM of MetaScale, Chandra heads up big data and legacy modernization. He explains Sears’ journey to establishing the venture, “When Sears began its big data journey several years ago, it took quite a while to figure out how to use Hadoop in a large enterprise. There were no easy answers or anyone to show us the way. For instance, we had to ask: How do you ingest data from strange things like a mainframe information management system or DB2 database software and Oracle tables? How do you put data back again afterwards if you need to? How do you make Hadoop secure in a large enterprise? How do you rewrite the legacy code and to what language? Those skills are not easy to learn. It took us over a year to figure them out. MetaScale was formed to help other companies do that, only much more quickly.”

“One cannot underestimate the learning curve; it’s not easy to learn big data skills,” he notes.

Reinventing Retail
“To meet the shopping expectations of our technology savvy consumer with instantaneous access to information, traditional enterprises need to transform themselves so they can conduct real-time data analysis to derive critical insights and intelligence for better and fact-based business decisions,” says Chandra. “Fortunately, since Apache Hadoop and most of the big data tools are open source, enterprises can launch a big data POC with minimal investment and setup a Hadoop cluster on commodity hardware to get started.”

Even the way Sears handles archiving has changed. The retailer was so inundated data that it could only store a small portion of it. Now, the retailer grabs as much with data as possible and has the ability to hold onto that data forever with Hadoop Distributed File System (HDFS).

More importantly, it can now run queries on that data to analyze and report on it. “The data warehouses would be too expensive to store the data and there is no longer a need to put the data away in archiving systems or tapes,” notes Chandra.

Rather than developing a hypothesis in advance, as would be required in a traditional data warehousing environment, big data technologies enable organization to load and analyze the data first, understand where it leads, and then clean it up to make it suitable for ongoing production work.

This ability to defer time consuming data preparation significantly shortens the time needed to make data-driven and fact-based decisions. With the use of big data, retailers can make insight-based decisions on inventory, dynamic pricing or personalized promotions at the speed of profitability.

Business in Real-Time
“The business is moving to a real-time world, wanting information now, so that better and fact-based business decisions could be made with crucial insight,” says Chandra. “To enable the business, we needed to move at the speed of business so that business can stay competitive and for that batch-processing wasn’t a viable option anymore.”

Today, retailers must create innovation and become first movers who invest in smart experiments on the leading edge. They can’t wait for innovation to happen and jump on the bandwagon with everyone else. Investment in retail tech labs, such as MetaScale, should occur on the periphery of the enterprise, aimed at highly targeted functions, operating independently as a separate division. This will allow for a productive culture of creativity and deliver results by failing forward faster.