Scalable Bayesian models

Hi all,

At our organization we use R’s Lme4 for hierarchical mixed effects models. This requires the whole data to be loaded in memory. The data is quite huge and generally the lowest granularity (which is a random effect) has to be averaged. There’s also a push to adopt Bayesian approaches to generate credible intervals as they make most sense to business.
Given these requirements I was wondering if PyMC/Bambi or tf-probability does batch processing and can scale well with big data?

I’m not sure if this is exactly what you were thinking but i saw this announced in the pymc experimental repo the other week: pymc-experimental/ at main · pymc-devs/pymc-experimental · GitHub

I’ve been meaning to try it, but if i understand the idea correctly i think you can sample from a histogram of your large data. Kind of like a bag of little bootstraps design…

1 Like

Just following up here. I wrote up an example of using this technique here:


Hi, maybe this is what you are looking for? pymc.Minibatch — PyMC 4.1.4 documentation