Hi All,
Let’s say we have Y <- X -> D -> Y
. We are trying to find counterfactuals for the observations. Scott says that the matching can be done on X (propensity scores/nearest Neighbors etc.). But, is it okay to match on pre-treatment Y
in addition to X
?
Example: We are trying to assess the impact of an online ad campaign. We have the data on what the users have historically purchased as well as their demographics. We get the demographic info for the ones who clicked on the ad and get users with matching demographics who did not click on the ad. This makes sense. However, does it make sense to also match on their purchase habits prior to the ad campaign (i.e pre-treatment Y
) ? Ideally, we would like to compare those users with each other who had similar purchase habits prior to campaign.
- I couldn’t find any online literature that talks about this, so, if you are aware of anything please let me know.
- I understand that matching on
X
covers for a lot of discrepancies between the exposed and control group. But, does matching on pre-exposureY
induce any kind of biases, or, is it a step in the right direction?
Looking forward to your comments and responses.