First differences
In most applications, the main reason for collecting panel data is to allow for the unobserved effect, \(c_i\), to be correlated with the explanatory variables.
For example, in the crime equation, we want to allow the unmeasured city factors in \(c_i\) that affect the crime rate also to be correlated with the unemployment rate.
It turns out that this is simple to allow: because \(c_i\) is constant over time, we can difference the data across the two years.
More precisely, for a cross-sectional observation \(i\), write the two years as:
\[y_{i,1} = \beta_0 + \beta_1 x_{i,1} + c_i + \mu_{i,1}, t=1\]
\[y_{i,2} = (\beta_0 + \delta_0) + \beta_1 x_{i,2} + c_i + \mu_{i,2}, t=2\]
If we subtract the second equation from the first, we obtain
\[(y_{i,2} - y_{i,1}) = \delta_0 + \beta_1 (x_{i,2} - x_{i,1}) + (\mu_{i,2}-\mu_{i,1})\]
\[\Delta y_{i} = \delta_0 + \beta_1 \Delta x_{i} + \Delta \mu_{i}\]
First differences
So, rather than subtracting the group mean of each variable, you subtract the lagged observation.
Not hard to see that, when t=2, FE and FD will give identical solutions
FE is more efficient if disturbances \(\mu_{i,t}\) have low serial correlation
FD is more efficient if disturbance \(\mu_{i,t}\) follow a random walk
At the end of the day, you can estimate both.
Empirical research usually estimate FD only in specific circumstances, when they are interested in how changes of X affect changes of Y.
Things like stationarity or trends are often not concerns in panel data