Enanpad 2024

17-09-2024

It is very common these days to hear someone say “*correlation does not mean causality*.”

In essence, that is true.

*The killer struck during daylight. Had the sun not been out that day, the victim would have been safe.*There is a correlation, but it is clear there is no causation.

Sometimes, there is causality even when we do not observe correlation.

*The sailor is adjusting the rudder on a windy day to align the boat with the wind, but the boat is not changing direction.* (Source: The Mixtape)

**Note**

**In this example, the sailor is endogenously adjusting the course to balance the unobserved wind.**

- I will discuss some issues in using plain OLS models in Finance Research (mainly with panel data).

I will avoid the word “endogeneity” as much as possible.

- This word refers to the violation of the Conditional Mean Independence (CMI) assumption, meaning that \(x\) and \(\mu\) are correlated.

- I will also avoid the word “identification” because identification does not guarantee causality and vice-versa (Kahn and Whited 2017)

- The discussion is mainly based on Atanasov and Black (2016)

Imagine that you want to investigate the effect of Governance on Q

- You may have more covariates explaining Q (omitted from slides)

\(𝑸_{i} = α + 𝜷 × Gov_{i} + Controls + error\)

All the issues in the next slides will make it not possible to infer that **changing Gov will CAUSE a change in Q**

That is, cannot infer causality

Perhaps it is \(Q\) that causes \(Gov\).

OLS based methods do not tell the difference between these two betas:

\(Q_{i} = \alpha + \beta × Gov_{i} + Controls + \epsilon\)

\(Gov_{i} = \alpha + \beta × Q_{i} + Controls + \epsilon\)

If one Beta is significant, the other will most likely be significant too.

You need a sound theory (and possibly play with lags, might not be enough)!

Perhaps \(Gov\) and \(Q\) are determined simultaneously.

That is, there is a third variable causing both.

An OLS regression will provide a biased estimate of the effect.

Also, the sign might be wrong.

Imagine that you do not include an important “true” predictor of \(Q\)

Let’s say, long is: \(𝑸_{i} = \alpha_{long} + \beta_{long}* Gov_{i} + δ * omitted + error\)

But you estimate short: \(𝑸_{i} = \alpha_{short} + \beta_{short}* Gov_{i} + error\)

\(\beta_{short}\) will be:

\(\beta_{short} = \beta_{long}\) + bias

\(\beta_{short} = \beta_{long}\) + relationship between omitted (omitted) and included (Gov) * effect of omitted in long (δ)

- Where: relationship between omitted (omitted) and included (Gov) is: \(Omitted = \alpha + ϕ *Gov_{i} + \mu\)

Thus, OVB is: \(\beta_{short} – \beta_{long} = ϕ * δ\)

**Bad controls**are variables that are**also outcome of the treatment**(i.e., \(Gov\)) being studied.A

**Bad control**could very well be a**dependent variable**of \(Gov\) as well.**Good controls**are variables that**you can think as being fixed**at the time of the treatment.- \(𝑄_{i} = \alpha + \beta × Gov_{i} + Controls + \epsilon\)

Assuming you also have something that is the consequence of good governance (e.g., Novo Mercado dummy). Should you include it in the model?

No. In this case, the coefficient of interest no longer has a causal interpretation.

**Warning**

**It is not hard to come up with stories of why a control is a bad control.**

**Collider bias** occurs when an independent variable and outcome each influence a **third variable and that variable or collider is included in the regression**.

- In a way, a collider is a bad control.
- While a
**bad control makes you underestimate the effect**, a**collider creates spurious correlation**between the X and the Y.

In the analysis below

- \(Q_{i} = \alpha + \beta × Gov_{i} + Controls + \epsilon\)

Including, for instance, CEO Reputation (assuming that both \(Q\) and \(Gov\) influences CEO Reputation) creates a false correlation between \(Gov\) and \(Q\).

- You may estimate a significant association even when there is none.

\(Q_{i} = \alpha + \beta × Gov_{i} + Controls + \epsilon\)

Even if we could perfectly measure \(Gov\) and all relevant covariates, we would not know for sure the functional form through which each influences \(Q\).

- Functional form: linear? Quadratic? Log-log? Semi-log?

Misspecification of x’s is similar to OVB.

Perhaps, some individuals are signaling the existence of an X without truly having it:

- For instance: firms signaling they have good governance without it

This is similar to the OVB because you cannot observe the full story.

Some constructs (e.g. \(Gov\)) are complex and sometimes have conflicting mechanisms.

We usually don’t know for sure what “good” governance is, for instance.

It is common to use imperfect proxies, that may poorly fit the underlying concept.

“Classical” random measurement error in

**x’s**will bias the**coefficient toward zero**- \(x^{*} = x + \sigma_{2}\)
- Imagine that \(x^{*}\) is a bunch of noise. It would not explain anything.
- Thus, your results are biased toward zero.

“Classical” random measurement error in the

**Y**will inflate standard errors but**will not lead to biased coefficients.**- \(y^{*} = y + \sigma_{1}\)
- If you estimante \(y^{*} = f(x)\), you have \(y + \sigma_{1} = x + \epsilon\)
- \(y = x + u\)
- where \(u = \epsilon + \sigma_{1}\)

Maybe the causal effect of \(Gov\) on \(Q\) depends on observed and unobserved firm characteristics:

- Let’s assume that firms seek to maximize \(Q\).
- Different firms have different optimal \(Gov\).
- Firms know their optimal \(Gov\).
- If we observed all factors that affect \(Q\), each firm would be at its own optimum and OLS regression would give a non-significant coefficient.

In such case, we may find a positive or negative relationship.

Neither is the true causal relationship.

This is analogous to the Hawthorne effect, in which observed subjects behave differently because they are observed.

Firms which change gov may behave differently because their managers or employees think the change in \(Gov\) matters, when in fact it has no direct effect.

If you run a regression with two types of companies

- High gov (let’s say they are the treated group)
- Low gov (let’s say they are the control group)

Without any matching method, these companies are likely not comparable

*Apples compared to oranges*

Thus, the estimated beta will contain selection bias, which can be

**either be positive or negative**

Self-selection is a type of selection bias. Usually, firms decide which level of governance they adopt

It is like they “self-select” into the treatment.

- Units decide whether they receive the treatment or not

There are reasons why firms adopt high governance

- If observable, you need to control for. If unobservable, you have a problem.

**Important**

**More data is not necessarily a solution, you need a sound empirical design.**

**QUESTIONS?**

**Henrique C. Martins**

**[ Henrique C. Martins] [henrique.martins@fgv.br][Do not use without permission]**