Matching on pre-treatment outcome variable

Hi All,

Let’s say we have Y <- X -> D -> Y. We are trying to find counterfactuals for the observations. Scott says that the matching can be done on X (propensity scores/nearest Neighbors etc.). But, is it okay to match on pre-treatment Y in addition to X?

Example: We are trying to assess the impact of an online ad campaign. We have the data on what the users have historically purchased as well as their demographics. We get the demographic info for the ones who clicked on the ad and get users with matching demographics who did not click on the ad. This makes sense. However, does it make sense to also match on their purchase habits prior to the ad campaign (i.e pre-treatment Y) ? Ideally, we would like to compare those users with each other who had similar purchase habits prior to campaign.

  • I couldn’t find any online literature that talks about this, so, if you are aware of anything please let me know.
  • I understand that matching on X covers for a lot of discrepancies between the exposed and control group. But, does matching on pre-exposure Y induce any kind of biases, or, is it a step in the right direction?

Looking forward to your comments and responses.

This is hard stuff.

Taking the log of the pre and post, and then modeling the delta would make results more interpretable i think. In this case you are modeling increases in percentages, making it a bit more robust to pre-treatment levels.

For log conversion use log 1.1, see [2106.03070] Linear Rescaling to Accurately Interpret Logarithms