Great video, I agree that its totally a good idea to learn both and I’m excited to do that with this book! I’m still junior but my experience has been that management dictates more what method I use (freq vs. bayes) and I don’t always have the luxury of choice, so knowing more about both is always helpful.

Btw, In the rocket parts example you gave how did you rationalise your priors? I would of thought in a situation with small ammounts of data careful choice of priors is important for correct inference.

[copy-pastimg my YouTube comment because I realized this is the better home for it]
Loved this video, love the practical approach. One thing you mention that I’ve been trying to understand is around AB tests… there are plenty of blog posts out there espousing the benefits of Bayes in AB testing (especially multiple comparisons) but I’m left wondering why nearly all AB testing is still done using the frequentist approach. In your view, why is Bayes less useful there?

Well here’s the best part, I could ask my stakeholders what their priors were and include them. This process is called expert elicitation. This was fantastic because its what earned their trust in the inference method to begin with!

With the Frequentist methods there was no way to include their expertise, and it was absurd to claim their expertise could not be included in the analysis, which made the analysis easy to reject by them, and in my opinion invalid anyway.

If you’re curious about expert elicitation there’s an upcoming talk on Thursday

Well in my opinion I think AB Bayesian tests are very useful. There’s some good papers explaining it in the multi armed bandit framework.

This is not to say Frequentist AB tests are not useful, but they definitely seem to be more popular. To answer why they’re more popular, my guess is because the computation is simpler, and Frequentist methods are more widely taught and accepted. This again doesn’t make them bad or anything like that, just speculating at your question!

I’ve recently started a project to build out a pre-post testing framework but we’ve gone with the bootstrap methodology. Kevin Murphy calls this the poor man’s posterior, but it got traction with senior mgmt because the method is kind of intuitive, it gives a sense of uncertainty in the point estimates and it can be scaled well for n URLs…

I’ve not worked out all the details, but we will be bootstrapping confidence intervals and p-values for metric degradation/improvement on 100s of URLs, and it didn’t seem feasible to extract priors from the stakeholders for each. Still some open questions for me about family wide error rates and the problems of multiple testing… but i think the core of the bootstrap’s appeal is how easy it is to explain and digest.

Needless to say i’m partial to the attitude you outline above. Statistics is a broad church and there are abuses of any statistical philosophy so i’m keen to understand a bit better how each approaches come to bear on questions of causal inference.

To paraphrase, the aspiration is the same from all sides, to develop generic methods that work on many different data sets and provide some baseline guarantees that reported results aren’t wrong too often or by too much.

2.2 The Meaning of Frequentism
There is a sense in which essentially everyone should
ascribe to frequentism:
FREQUENTIST PRINCIPLE. In repeated practical
use of a statistical procedure, the long-run average
actual accuracy should not be less than (and ideally
should equal) the long-run average reported accuracy.

This version of the frequentist principle is actually
a joint frequentist–Bayesian principle. Suppose, for instance, that we decide it is relevant to statistical practice to repeatedly use a particular statistical model and
procedure—for instance, a 95% classical confidence
interval for a normal mean. This procedure will, in
practice, be used on a series of different problems
involving a series of different normal means with a corresponding series of data. Hence, in evaluating the procedure, we should simultaneously be averaging over
the differing means and data.
This is in contrast to textbook statements of the
frequentist principle which tend to focus on fixing
the value of, say, the normal mean, and imagining
repeatedly drawing data from the given model and
utilizing the confidence procedure repeatedly on this
data. The word imagining is emphasized, because this
is solely a thought experiment. What is done in practice
is to use the confidence procedure on a series of
different problems—not use the confidence procedure
for a series of repetitions of the same problem with
different data (which would typically make no sense
in practice).