With DiD there’s no talk about how close or far the control observations should be. The only assumption is the parallel trends. Is there any literature around if we should consider matching observations for the control set and how that affects, if at all, the bias etc? I think having matching/similar observations in the treatment and control should yield better/unbiased results as there’s more certainty that matching observations will have parallel trends.
Sant’Anna discusses in his paper, using their method, if you have multiple groups that have differential timing of the treatment, then you can use all not-yet-treated groups as a control, which does create less uncertainty in the ATT estimate for the time periods when there are multiple not-yet-treated groups. As those groups decrease and convert to treated, the uncertainty of the ATT estimate increases.
Also, I’m not exactly sure if this is what you are asking but Cunningham mentions in section 9.4.1 about using Parallel Leads as an attempt to justify Parallel Trends. Cunningham argues that Parallel Leads do not justify Parallel Trends. Which is technically correct, but it does seem from an applications standpoint that Parallel Leads would at least in a “hand-waving” sort of way, help justify the Parallel Trend’s assumption and a more accurate assessment of the ATT estimate.