Multiple causal influence network

I do not know if this question belongs here, but, would like to pick y’all’s brains. The problem is to find the change in demand (in terms of units sold) for different products within a category if one of the products in that category is removed/goes out of stock. One example would be to find how the demand of, say, dark roast and light roast of coffee are impacted if the medium roast is not available. I imagine this would be like predicting/modeling the sales of both dark and light roast dependent on the medium roast simultaneously or perhaps some kind of network analysis? I am more interested in getting coefficients.
I do not know how to put this in a DAG cause every variable impacts the other.

Let me know if anyone has handled similar problems or has suggestions.

Thanks

This sounds a bit like A/B-Testing :grinning:. But most likely they are better alternatives, which i dont know about.

This also sounds like a switchpoint detection model

https://www.pymc.io/projects/docs/en/latest/learn/core_notebooks/pymc_overview.html?highlight=coal#case-study-2-coal-mining-disasters

This sounds like a multinomial demand with stockout substitution problem.

I looked into a similar problem and I briefly skimmed some lit on the topic but most of the lit I found was quite dense and I don’t have any of the references saved. Marshall Fisher’s papers are a good starting point.

If I remember correctly, the idea is that you have some total number of shoppers, N, and you assume there’s some fixed percent of total shoppers that gets distributed to each product, and a percentage of no purchasers. So if 10% of shoppers purchase light roast, 25% purchase medium roast, 15% purchase dark roast, and 50% don’t make a purchase, you’d define an array of

p=[0.1, 0.25, 0.15, 0.5]

You can simulate demand under unlimited inventory from this as

D \sim \text{Multinomial}(N, p)

Then, in cases you have stockouts, you’d use assume substitution matrix exists that models the replacement behavior

S =\begin{bmatrix} 0 & .25 & .05 & .7\\ .15 & 0 & .1 & .75\\ .02 & .18 & 0 & .8\\ 0 & 0 & 0 & 0\\ \end{bmatrix}

Where each row represents the substitution rate from one product to another. So the third row indicates that if dark roast (product 3) is out of stock, 2% will substitute with light roast, 18% with medium roast, and 80% will choose not to purchase.

I have some code to simulate stockout based demand for your problem, I’m sure you could use this to come up with either simple approaches to test hypotheses, or come up with a model to identify all of the demand rates and substitution rates.

Here’s the code

import numpy as np 
import pymc as pm

# this is just a trick to prevent a matrix of weights for a multinomial distribution from summing to 0
ADJ = 0.000001

def create_substitution_matrix(n_products):
    '''Create a random substitution matrix. These are unknown coefficients that we want to identify
    '''
    subst = np.random.uniform(size=(n_products+1, n_products+1))
    np.fill_diagonal(subst, 0)
    subst = subst / subst.sum(1)[:,None]
    return subst

np.random.seed(99)

# time periods to simulate
periods= 365

# For simulating total demand time series
eps = 500
alpha = 15000

# Ratio of total demand distributed to each product, [light roast, med roast, dark roast, no purchase]
p = np.array([0.1, 0.25, 0.15, 0.5])

# simulate a matrix where a 1 indicates if a time period for product of index m has a stockout
stockouts = np.random.binomial(1, 0.1, size=(periods, 4))*np.array([1,1,1,0])

# simulate a matrix of substitution coefficients
subst = create_substitution_matrix(n_products=3)

# time series of total demand
N = alpha + np.random.normal(0, eps, periods).cumsum().astype(int)

# simulate demand under unlimited stocking conditions
dist = pm.Multinomial.dist(N, p)
Demand = pm.draw(dist)


# simulate observed outcome
missed_rentals = Demand*stockouts
# indicator matrix for in stock products each time period
I = (stockouts^1)
S = subst*I[:,None]
dist2 = pm.Multinomial.dist(missed_rentals, S+ADJ)
substitutions = pm.draw(dist2).sum(1)

observed_rentals = ((Demand*I) + substitutions)

If I have time I’ll try to build a model around this also!

2 Likes

Typically I would try to skirt around this by modeling the upstream driver of demand as a nonlinear cross effect - eg if you discount the price of medium of, sales of light go down and sales of medium goes up as it cannabilizes other sales - but if you keep dropping the price far enough, sales of medium will plateau after stock out and the other things that would have driven medium may instead spill over to light.

Might help to draw a picture.

1 Like

I always disliked that pymc called it switchpoint, everyone else calls it a “changepoint”, no idea where the bayesians found that word.

It might actually be not a Bayesian thing but in PyMC we say that because of Theano called its switch. Now that you mention it changepoint does sound better