How e-retailers like Amazon use data to project sales and set prices.

If you’ve ever wondered how online retailers can afford to sell off their inventory for rock-bottom prices in the days following Thanksgiving, the answer to that question now primarily lies in the data those retailers are collecting every time you shop.

Retailers have long relied on historical sales data to help create discounting strategies. If sales are not on track to meet or beat the previous year’s numbers, retailers will often offload slow-moving inventory by dropping the price, either to directly undercut competitors’ advertised rates or appeal to buyers who might balk at the cost of the latest product update but purchase a heavily discounted last-season model. However, with the rise of data science — an interdisciplinary field that has exploded in popularity in the past few years — retailers are taking a more predictive approach.

Half of all holiday shoppers plan to buy online this season according to Deloitte, and even 66 percent of those who don’t will look for items online before heading to a brick-and-mortar store. All of those clicks and page views aren’t going unnoticed by companies like Amazon, which is now on track to become the largest clothing retailer in the U.S. During Cyber Monday 2015, Amazon sold 23 million items discounted at various levels worldwide, from a $25 reduction on its Fire TV stick to 40 percent off a Bissell carpet cleaner. The prices of the thousands of individual products discounted that day weren’t assigned randomly; they were algorithmically derived to drive the highest number of sales.

Data scientists are the driving force behind pricing in such cases. The primary duty of these data analytics experts, who currently hold America’s best job according to Glassdoor, is to build, test, and train data models that spit out predictions about which customers are mostly likely to purchase which items at what price and when. The complexity of such tasks is why data scientists command impressive salaries and have been affectionately dubbed the “unicorns” of analytics. But the truth is decidedly less glamorous.

As an online shopper navigates through Amazon, the retailer logs nearly every interaction he or she has with the site: the items viewed or purchased, when those items were viewed or purchased, whether or not the user has a Prime membership, how the user accessed the site and from what type of device, and more. In some cases, the data resulting from daily online customer behavior spans hundreds of millions of rows on a data table and is virtually undecipherable without the assistance of an algorithm 


It’s the job of a data scientist to identify the data that’s relevant to the task at hand, which in the case of a massive sales event like Black Friday or Cyber Monday would include a combination of macroeconomic indicators, historical purchase behavior, item popularity, and profit margins. That data is cleaned to remove inaccurate or irrelevant information and fed to the appropriate algorithm, which identifies relationships between different features and returns predictions about the likelihood that a given set of customers would purchase a particular item at a particular price during a short-term sale 

So how does a retailer like Amazon select and price the thousands of items it ultimately discounts? Items are ranked for appeal based on how consumers have engaged with those products in the past, where they fit into current purchasing trends, available inventory, and whether similar items have sold well during previous Black Fridays. The discount amount itself is determined based on predicted sales coupled with the likelihood customers will purchase at a certain discount and how much additional volume is required to offset the revenue reduction from that discount. Data scientists dramatically increase the accuracy of these predictions by using models that continuously adapt to, or learn from, the available streams of incoming data through process called machine learning.

Amazon has already announced its plans to hire 120,000 seasonal workers this year, a 20 percent increase over last year. The retailer has said it expects 25 percent revenue growth this holiday quarter, a prediction that is similarly algorithmically derived. The cost of hiring seasonal workers to fulfill a surplus of Cyber Monday orders, for example, could very well be a feature a data scientist would use when building a data model to determine pricing strategy.

Even beyond Black Friday and Cyber Monday, data science is powering dynamic pricing on retailers’ websites. The data models being used to optimize pricing, which can be adjusted to compensate for the significant behavioral changes of consumers during the holidays, are active throughout the year. Statistics vary, but McKinsey & Company has estimated that retailers using big data analytics could boost their operating margins by up to 60 percent. So it’s little wonder that retail giants with the resources to do so are powering their pricing with data science.


DataScience provides tools, infrastructure and expertise for data scientists.