How to use your historical data to drive your strategy and planning? Is it possible to forecast the outcomes of your marketing activities? Can data science really predict the future of your company? Ahmed Hammad is back on the Mowgli blog to give us more insights on how you should be using predictive analysis to save time and money on your marketing strategies.
In this interview, we discuss use cases for predictive analysis and best practices to get your company started with data analytics projects. Let’s dive into it!
What is predictive analysis? Why is it useful?
Predictive Analysis is a category of data analytics aimed at making predictions about future outcomes based on historical data. By predicting what will happen in the future, we can prepare ourselves for future developments and behaviours or forecast the effects of different strategies and decisions.
In data-driven marketing strategies, predictive analytics is used to determine what is likely to happen in the future based on historical user behaviour and past campaign data. For example, these models can be used for segmentation based on past engagement, to identify upsell opportunities, or even to determine the chance of a prospect actually purchasing your product or service.
On the other hand, it can also help determine how good a strategy is performing. Let’s take the example of Covid-19 and how it impacted the way companies sell their products and services. Many businesses decided to invest in their online strategies, and many saw positive results with increasing online sales. However, can we really say that they are actually selling more?
To answer this question with certainty, they cannot compare their sales with the revenue they had just before the lockdown. They need to compare what would have happened if the lockdown never existed. That’s where machine learning and more specifically predictive analysis is used: you can take historical data, on the revenue for example. You can use this to project them in the future to predict what would have happened if businesses were running as usual, without Covid-19 and a lockdown. Only from there can you really compare the two results and determine with accuracy whether the online strategy is performing better.
Can you give us some other examples of how predictive analytics is used in a business environment?
Let’s say that we have a dataset of company X describing sales of various products. Company X has many shops. There are about 25,000 monthly historical sales records and information about the sales price and the shops’ location in the dataset. Our goal is to make a six-month forecast of the sold volume of a product sold by a store.
In addition to the data provided, we will include additional variables that could help predict our outcome, such as special days like holidays, volume sold in the entire industry, seasons, trends from social media platforms, and so on.
After some additional data transformation, we will use all the data except for the last six months to train a Machine Learning model. The model will learn historical patterns from the data and the relationship between variables. We will then test the predictive ability of the model on the last six months of data. If we are satisfied with the results, we can then do a forecast in the next six months.
Can you tell us about a predictive analysis project you have worked on?
Yes, several. From predicting the probability of floods to forecasting sales for a company.
For one of my clients, I had to develop an automated system to predict the effect of concurrent Instagram ads every month. The company wanted to continually monitor and evaluate many Instagram ads strategies every month for roughly a year. The goal was to predict short term effects on sales and attribute the observed increase or decrease in sales to one of the strategies. As you can imagine, this is not only about forecasting sales in the future, but also disentangling the effect of one strategy from the other and defining causal relationships between an ad and the volume of sales.
In this case, we were able to determine the right strategy to increase the revenues over a short term and keep them steady over a medium period. This was a big win for the company!
What are the main challenges of such a project?
Predicting the future is not an easy task. It is challenging to make detailed and long term forecasts. Imagine that we would like to forecast the daily number of clients for the next year! Another challenge is data availability. We need a lot of historical data to use the current state of the art models.
Additionally, uncertainty is another challenge. The example from above represents a basic setting. But in some cases, we might want to include and quantify the uncertainty in our prediction. We would like the model to output a full probability distribution over the full range of our outcomes. In most Machine Learning models, you get back a single real number as the expected outcome. In the classification case, as we have seen for sentiment analysis, most of the models are probabilistic by definition. But this is not the same for regression models. Typically, regression models return a point estimate. But today there are ways to output a full probability distribution using Probabilistic Regression Models.
Another challenging case is with streaming data. In this situation, data arrives sequentially and at high speed. The analysis must take place in real-time. Classic examples of such situations are stock price data or weather data. In these cases, we will need to use algorithms that dynamically adapt to new patterns in the data called Online Learning Algorithms.
What is the average cost of a project?
The cost is conditional on how often the forecasting should be done. The prediction models can be run once a year, twice a year, every month or even every hour, depending on the objectives of the company.
So you have an idea, projects usually range between 6K€ up to 25K€.
In our last discussion, you mentioned that companies should improve their processes of data gathering and storage before launching a data analysis project. What are your recommendations to do so?
In 90% of the cases, companies have much data but are scattered and disorganised. Most of the time, the biggest problem is data quality and data storage. For example, I had a client with unique data on farms all over Southeast Asia, but the data was scattered everywhere. It took me two weeks to fix all the GPS locations, which was a lot of work.
In other cases, the data is perfectly structured, seven years of data stored nicely and easily accessible, but the quality of the information itself, the single datum, was a mess. For example, when I was working with sensor data from a weather station, and the sensor was very unreliable. Plants overtook the weather station and the sensor itself was covered, so the data was not usable.
In the case of purely digital data, the most common problem is accessing the data. It can get complicated to access the data from a Facebook group, a business account, or an Instagram account. The regulations have gotten more strict over the years as these companies try to ensure data privacy. But obviously, those are your data, and you can request and access them. The advantage of this third party data is that it’s really well organized because it’s not the business that’s managing them. Someone else is doing a great job at handling them, and you can access them and use them almost immediately.
Don’t get me wrong, in any data science project; the data you receive is never perfect, meaning they are not already perfectly in shape to conduct analysis. It’s part of my job to restructure the data so that the model can understand it. However, as a data scientist, you’re a bit reluctant to work on cleaning the data. It’s the business’s job to keep them clean, and then we can use them in a project.