I was thinking of getting some hands-on experience in causal inference by applying some of the learnings on real-world datasets. There are some good open and anonymized datasets from companies available on scikit-uplift package website here. The X5_RetailHero in particular looks really interesting. Moreover, this dataset was a part of a competition held some 2 years ago.
Would anyone be interested in trying this out in the next few weeks before beginning with the new topic?
These datasets are obtained from randomized control experiments and as such one can simply calculate ATE = Y_1 - Y_0. But, we can treat these as observational studies and apply the causal inference methods such as matching, sub-classification, DiD etc. to see how close we can come to the real ATE. We can also compute heterogeneous treatment effects for individual users. I was thinking of
Manually computing ATE, ATT using matching methods or IPW etc.
Using the libraries dowhy and econml from Microsoft to compare the values calculated manually.
Compare methods to see which one does the best.
I do not have a strong opinion on the format. I was thinking of starting a public git repo and put my code in a folder under my name. Others can refer it, or, fork the repo and create a folder under their name and subsequently create a pull request to main. This way all code is in one place and everyone can refer. But, please feel free to suggest other methods that you think would be better.
It would be interesting to talk about the methods others apply (there is always some degree of subjectivity to causal analysis) and then talk about the results in the next 2-3 weeks time.
Let me know how this sounds and any suggestions or comments are welcome.