Anyone know why the error term switches from u to epsilon?

RavinKumar · June 12, 2022, 4:52pm

In the population model section we’re shown a linear equation that uses u as its error term but once we get to conditional expectation function the error term changes to \epsilon_i, or at least I think its error. I can’t seem to find the definition of \epsilon

Anyone have any insight in the difference, other than \epsilon_i is per X_i and u seems global?

bkktimner · June 13, 2022, 12:47am

Hi, I think \epsilon refers to the error term of CEF. I found it was mentioned here in Therome 3.1.1 via this StackExchange thread

nfultz · June 13, 2022, 2:20am

2nding @bkktimner

Also I think that \epsilon comes from the definition of the CEF, vs u is coming from one (of many) possible models under consideration.

RavinKumar · June 14, 2022, 3:17pm

Thanks @bkktimner and @nfultz ltz

From a data perspective whats the practical difference then? Both terms are just capturing the unmodeled error, whether that be from the linear model, or the variation not captured by a mean estimate. Is that simplification a good general understanding?

nfultz · June 14, 2022, 3:49pm

To expand a little:

E(\epsilon | x) = 0 vs E(u | x) = 0

The \epsilon version follows from how CEF is defined and is thus always true for any pair of RVs.

the u version is an extra assumption on u for identifying regression models.

We would prefer the former but are usually stuck with the later.

RavinKumar · June 16, 2022, 11:26pm

When you say “we’d prefer the former, but are stuck with the latter” why is it the case we’re stuck with the latter?

And for all intents and purposes they’re both modeling the same thing right? Unaccounted for noise that isn’t captured in the conditional

nfultz · June 21, 2022, 3:42am

They’re only equivalent when your (assumed) model is actually true - functional form is correct, not missing anything, etc. There’s no way to be absolutely sure you set up the model correctly. And there’s no way to directly observe the \epsilon either.

Even in cases where you have a strong reason to believe in the model, things can go awry.

Eg Imagine testing Hooke’s law (F = kx + u) for the length of springs, but maybe over time, after several trials you stretch the spring too much and later trials are a little different because the “springiness” has worn out a little. The “True” model that “Nature” “uses” E(F|x) + \epsilon to “make springs a certain length” has a bunch of higher order crap in it that we’ve ignored at our own peril.

It could be “bad”. Or it could be “fine”. Depends if you are making Slinkies or rocket parts.

RavinKumar · June 22, 2022, 2:31am

got it, thank you @nfultz

Topic		Replies	Views
Bias correction : Why use a linear model? Matching and Subclassification	4	251	August 3, 2022
Explanation of matching variance Causal Inference Book Club	0	237	January 20, 2023
Topics for the livestream Probability and Regression Review	3	235	June 25, 2022
Introduction Chapter Livestream Details and Q&A Introduction	8	598	June 14, 2022
Question on RDD Regression Discontinuity	1	239	August 23, 2022

Anyone know why the error term switches from u to epsilon?

Related topics