At our organization we use R’s Lme4 for hierarchical mixed effects models. This requires the whole data to be loaded in memory. The data is quite huge and generally the lowest granularity (which is a random effect) has to be averaged. There’s also a push to adopt Bayesian approaches to generate credible intervals as they make most sense to business.
Given these requirements I was wondering if PyMC/Bambi or tf-probability does batch processing and can scale well with big data?
I’ve been meaning to try it, but if i understand the idea correctly i think you can sample from a histogram of your large data. Kind of like a bag of little bootstraps design…