Policing Example DAG

bwalters · July 12, 2022, 1:56am

The policing example at the end of Chapter 3 confused me a bit. Is conditioning on the stop wrong only because it may cause sampling bias? I may not be fully grasping the concept of a collider.

bwalters · July 12, 2022, 2:12am

Since minority has a direct effect on force in the DAG, why would conditioning on stop affect the ability to isolate this effect?

ChadDelany · July 12, 2022, 4:30pm

source: Causal Inference The Mixtape - 3 Directed Acyclic Graphs

It helps me to look at the pathways from D to Y. According to this DAG, there are four:

D → Y
D ← X → Y
D → M → Y
D → M ← U → Y

(Fryer controls for X and closes that backdoor through his research design.)

So when we look at the causal pathway D → M → Y and if we were to regress Y onto D, then the causal effect would be measuring both the discrimination inherent in both the stop and the use of force. The junction D → M ← U is a collider and blocks the backdoor pathway D → M ← U → Y.

By conditioning on M (the stop), it opens up this pathway. The only way to condition on the stop and keep the desired effect of a closed backdoor, would be to condition on both M (the stop) and U (suspicion). But U is unknown and therefore cannot be conditioned. So conditioning on the stop without also conditioning on suspicion according to this DAG, introduces spurious correlations that skew any attempt to determine causal effect. Suspicion is affecting M and Y and is an unaccounted for variable M ← U → Y. Conditioning on the stop, reduces the DAG to just the M - U - Y triangle.

ChadDelany · July 12, 2022, 4:35pm

From a conceptual point of view, conditioning on the stop to determine use of force then ignores the sampling bias introduced from the discrimination observed in the stop.

That is what is really well illustrated by the coding examples. You can hard code in bias and then using these techniques demonstrate how they can produce the wrong answer.

ChadDelany · July 12, 2022, 4:47pm

I suppose you could argue whether or not the variable U is valid in this DAG. But since the point of the study is in someway trying to quantify U or understand its quality, ignoring it or excluding it would seem to make the DAG incomplete.

I suppose another question to ask is what is the difference between Discrimination and Suspicion? Are those variables independent of each other? Do they need to be? I suppose the whole point of this DAG is to point out that there is inherent unobservable Suspicion between the Stop and the Use of Force. It is not observable and since it cannot be controlled-for, any sort of discrimination inherent in Suspicion can also not be controlled-for. It is unknown and therefore disrupts any attempt to measure causality between M → Y.

bwalters · July 12, 2022, 9:19pm

Thanks for the response. Its starting to make some more sense. However when I am trying to run some simulations I am not able to recover the parameter values.

def collider_and_confounder(size):
    """Create a collider and confounder example"""
    
    # Is confounder additive or multiplicative
    unit_normal = stats.norm(0,1)
    d = unit_normal.rvs(size) 
    z = unit_normal.rvs(size) 
    # For some reason need to add the coefficient here
    x = 8.9*d + 2.34*z
    
    y = 3.567*d + 1.234*z + 2.456*x + 21.123

    collider_and_confounder_df = pd.DataFrame({"x":x, "d":d, "z":z, "y":y})
    return collider_and_confounder_df

collider_and_confounder_df = collider_and_confounder(10000)

if i am not mistaken, x is a collider in this instance through d → x ← z → y.

So, if I condition on x and z I should have no open paths correct?

RavinKumar · July 15, 2022, 2:01am

Thanks for sharing @bwalters. Could you share the regression you’re using? Then we can get a sense of what may be going wrong

bwalters · July 16, 2022, 5:51pm

This is the regression

mod = smf.ols(formula='y ~ d+z+x', data=collider_and_confounder_df)

And this is the output

Intercept    21.123000
d             0.224747
z             0.355250
x             2.831534
dtype: float64

jahloy · July 18, 2022, 6:10pm

Multicollinearity is at play in your example. Because x is determined entirely by d and z, including all three in a regression is problematic. If you substitute x = 8.9d + 2.34z into your population model, what you’re effectively trying to estimate using regression is the equation y = 25.4254d + 6.98104z. By performing the same substitution with your fitted equation, you’ll see that that’s what you’ve ended up with.

If you include some disturbance in your equation for x (draws from the standard normal should work), you should be able to recover the parameters you’ve specified. You can then explore collider bias by estimating y = d to get the total effect of d; y = z to get the total effect of z; y = d + z to get the total effect of d and the total effect of z; y = d + x to open up the path d → x ← z → y; y = z + x to open up the path z → x ← d → y; and finally, y = d + x + z.

RavinKumar · July 19, 2022, 1:47am

Nice catch @jahloy! Thanks for helping out

Topic		Replies	Views
DAGs in Practice at Lyft Directed Acyclical Graph	2	296	September 6, 2022
Introduction Chapter Livestream Details and Q&A Introduction	8	599	June 14, 2022
Additional free python resources on causal inference Causal Inference Book Club	0	289	July 21, 2022
New Bayesian Causal Inference package Causal Inference Book Club	1	352	December 20, 2022
Microsoft transitioning Causal Inference library to Open Source community Causal Inference Book Club	3	315	June 8, 2022

Policing Example DAG

Related topics