Big data, small data: size doesn’t matter, value does

Here is the why & the how…

Sinan Gurman

By: Sinan Gurman
Blog

“The ability to decide which data to heed, which to ignore, and how to organize and communicate information will be among the most important traits of business executives in this century”.

You may think this is old news… because it is old news, but it is still very true.

The how of capturing business value from data continues to change and evolve faster than ever before. And your ability to adapt and lead in such a fast-paced, data intensive environment makes all the difference between winning and losing business.

What makes data valuable?

It is not the size of your data that matters. From my experience, there are 5 key value levers behind sales, marketing and supply chain data:

1. Quality (of accuracy & completeness)

First things first, data is never 100% accurate, and as the old adage goes, “garbage in, garbage out”. So, the big question we should ask ourselves is: “Is the data accurate enough for us to make better business decisions?” Remedies to improve data quality start at the data collection source in-market and may vary by channel, by retailer, and even by category. We should be very careful about data quality implications while developing analytics solutions and generating insights.

For example, when regular and diet versions of the same soda brand are purchased together, point-of-sale (POS) cashiers might scan only one of the products twice. Since both products are equally priced, this is more convenient for the cashier. This results in a POS data inaccuracy at the universal product code (UPC) level for these products. Unfortunately, we don’t have 2-3 years to wait for accurate historical UPC level sales data to build demand forecasting models while cashiers are trained to correctly scan all UPC items.

To address this in the short term, we may choose to train our models at the combined brand pack level (regular and diet combined) and allocate the results to each UPC based on the weights of historical shipments instead of historical POS at the retailers. Intelligent algorithms, such as conditional interruption patterns, and fuzzy logic algorithms represent two widely used methods to solve data related quality issues. There are, of course, many other methods – too many to go into in this article – and you should be prepared to develop new ones with every new data source.

The big question: “is the data accurate enough for us to make better decisions?”

2. Granularity: more granularity = more value

a) With increased granularity, our ability to find common ground across data dimensions, and therefore map and integrate with other data sources also increases. And with integrated data across more customer touchpoints, we can drive more insightful business decisions and actions. As Brian Kalms mentioned in his recent article ‘Unlocking the value of data for retailers’, “Mapping data structures and process flows may not sound strategic or even interesting, but it is one of the fundamental principles underpinning successful data driven organizations.”

b) Increased data granularity also increases the overall analytical value. That said, we should always first ask the question: what level of granularity do we really need to answer a particular business question? Some situations and analytics require more granularity than others. For example, if you have a demand pattern interruption algorithm that can detect ‘out-of-shelf’ (OOSh) instances at store/hour/UPC level, you obviously need at least hourly, or even transaction-level granularity of your sales data as an input. If you only have daily aggregated data, you cannot analyze intraday OOSh occurrences by the hour of the day. You also cannot recover lost sales by optimizing delivery frequencies, in-store replenishment quantities/times, or shelf allocation and assortment.

Imagine having integrated, accurate inventory and sales data, knowledge of supply chain and in-store labor constraints, and an advanced neural-net demand forecasting capability on top. With that capability, you could ensure the necessary product will be in the back room, and you would have a predictive OOSh solution worth millions of dollars in incremental value.

Not many retailers or consumer packaged goods (CPG) companies have cracked the complexity of this yet. One of the solution providers that I’ve had the opportunity to work very closely with in this area is Data Ventures. They’ve already solved each piece of the puzzle (OOSh, neural-net demand forecasting etc.), proven value in the market and are now on track to stitch all of this together to solve a complex multi-billion dollar problem for the industry, end-to-end.

“Focus on high market coverage data sources for basic reporting and more granular data sources for root cause analysis, decision-making, and optimization.”

3. Market coverage

What market and sales coverage do we get from a particular data source? More coverage is obviously better for visibility and standard reporting. That said, most data sources that provide high coverage also tend to be highly aggregated (e.g. syndicated across stores and time), meaning the critical details needed for strategic decision making are lost.

When there is access to more granular data, such as direct retailer POS feeds for CPGs (even if they only cover a handful of retail accounts, or smaller geographical regions), a strong business case is created. This is to obtain and harness that data based on the incremental analytical value it can provide. Most leading CPGs have been heavily investing in these data sources over the years.

A good rule of thumb is to focus on high market coverage data sources for basic reporting (e.g. syndicated POS data) and more granular data sources for root cause analysis, decision-making, and optimization (e.g. direct POS feed from a retailer).

4. Is the data personally identifiable?

This is probably the most valuable aspect of data in order to enable 1:1 level segmentation, tailored communications and engagement with customers. Unfortunately, considering the vast universe of all available data, personally identifiable data is typically low in market coverage for most retailers and CPGs. An organization’s ability to link (integrate) external, personally identifiable information, such as social media comments or sentiments, to internal customer transactions and loyalty databases is limited in its execution.

The amount of personally identifiable social media data that was being linked back to internal transaction databases by leading corporations used to be in the range of 2-5% or less over the last 5 years. We now see some leading companies working around a 10-20% linkage range, and I expect this number to grow significantly, further enriching the customer insights in the future.

Historical transactions of an individual or a household are extremely valuable from an analytical perspective. In addition to typical product, location and time dimensions at the point-of-sale, it is the 4th data dimension. With granular or syndicated POS data sources (that are not personally identifiable), we have to statistically model incrementality, factoring in cannibalization and pantry loading effects. Without these, an accurate promotional return on investment (ROI) cannot be calculated.

On the other hand, with personally identifiable historical transaction data we can precisely measure cannibalization from one product to another and pantry loading of the household. This allows us to calculate true incrementality and ROI for all trading partners. There is a significant difference in the confidence level of business decisions made by using these two different types of approaches: statistically modeling vs. precisely measuring.

“Consumer panels and focus groups can generate personally identifiable data, depending on their subscription levels. But their market coverage is extremely low.”

Consumer panels and focus groups can generate personally identifiable data, depending on their subscription levels. But their market coverage is extremely low – to a point that users should always question if the panel or focus group is a fair representation of the market in question.

One company I’ve been observing and interacting with closely in this area is InfoScout. They are an up-and-coming firm, starting to shake up the consumer panel industry with their innovative use of gamification, mobility and social media. We typically prioritize the use of consumer panel and focus group data to create new hypotheses and recommend validating these hypotheses with more statistically significant A/B test groups before rolling out in larger scales.

Finally, while working with personally identifiable data, compliance with relevant laws and regulations is crucial. Not doing so may result in hundreds of millions in fines and job losses. For example, the General Data Protection Regulation act (GDPR) (effective May 2018), will have significant impacts on the use of personal data for EU businesses. There are even implications for US companies with customers that are EU citizens, regardless of where these customers reside. How ready are you for GDPR?

5. Access frequency

This is the fifth key lever that contributes to the value of data. The starting questions are again: “How frequently do we really need access to data to make those business decisions? Do we need real-time data? And if so, can we execute in real-time based on what the data is telling us?”

For example, let’s consider the use case of sending a free shipping coupon to a busy mom who abandoned an online cart earlier in the day. She was also not able to purchase the products she wanted in store by closing time because of her hectic schedule. Wouldn’t it be super-convenient (and make her feel valued as a customer) if we were to email her a free shipping coupon alongside a personalized note containing the shopping cart she abandoned, first thing the next morning..? That’s powerful customer experience.

To provide this type of customer experience, you need near real-time integrated data between your online sales, and bricks and mortar loyalty card transactions. This is a very advanced level of customer engagement capability, even for most of the leading players in the industry… But, it’s not impossible!

Having said that, for most strategic, foundational and even tactical decisions, real-time data access may not be required… and the importance of this point is often overlooked.

Most key functional areas in sales, marketing and supply chain, such as assortment, in-store replenishment planning, pricing, store and shopper segmentation, and demand forecasting require the real-time application of analytics to simulate different scenarios, but they can still work very effectively with data that is a day, a week or even over a month old.

So, now we’ve covered the 5 key data value levers, but how do you drive successful data and analytics initiatives to create value in your organization? Here are my 10 guidelines…

  1. Start small, think big, move fast.
  2. Engage cross-functional teams across business, IT and other supporting functions, including management, front-line employees and customers.
  3. Understand each stakeholder’s needs and personal characteristics while designing data and analytics solutions. This includes the end users of your data solutions, your corporate customers, as well as the consumers of the products and services you provide.
  4. Focus on the culture of business value and outcomes, not only actions and deliverables.
  5. Deliver tangible value and reduce the risk of your initiative with shorter sprints. Embrace the value of faster iterations and learning.
  6. Focus on building differentiating talent inside your organization for data and analytics solutions. Everything non-strategic or non-differentiating can be done by others. Even some strategic functions can be done by vendors while you are scaling up and learning.
  7. Effective governance, security and compliance should be in place… but the cultural mindset should be on developing high quality and compliant solutions instead of stopping or slowing them down for non-compliance. Taking small, calculated or theoretical risks with effective mitigation plans can make a big difference.
  8. Create a flexible technology environment, including a sandbox to design, test and roll-out solutions efficiently and effectively. Keep a close eye on and continuously test with new technologies that can disrupt your industry.
  9. Data and analytics initiatives have to continuously improve by design. Focus on closely testing, monitoring and improving adoption on an ongoing basis.
  10. Develop and monitor KPIs across all points above. Ensure all stakeholder performances are evaluated in line with the objectives of the initiative.

As the title suggests, when it comes to data, it’s not size that matters. It’s the value you create from it that does. And that value has the potential to vastly improve your customers’ experience with your brand, cementing loyalty for the long-run and encouraging your trading partners to share even more valuable data with you. We can help you set the bar for your industry.

Get your data strategy right, or you will be left behind.