Economics should be open

March 30, 2009

Clustering with panel data and fixed effects

Filed under: Data Insights, Stata — Tags: , , , , , — howardchong @ 9:27 pm

I have it on good authority that “The cluster option is robust for within-cluster serial correlation of arbitrary form.” Also, that it is ok to run fixed effects on the same level as your clustering.

Question1:  If I have panel data (individual-year) and have individual level fixed effects, does it make sense to cluster on the individual level?

 

Question2:  If I have panel data (individual-year) and have state-level fixed effects, does it make sense to cluster on the state level?
The answer to both, according to Michael Anderson, is Yes, you can do it.

 

Stata implements cluster and robust together. I think that the original Moulton clustering specification is not robust to serial correlation.

 

I wrote (and was wrong, apparently):

Thank you very much, Michael.

When you write about

“If, however, observations that are close together (along the time dimension) have a higher correlation than observations that are far apart (along the time dimension), then the fixed effect will not remove this form of serial correlation.”,

my understanding is that clustering will not correct for this, but a serial correlation correction will.
The reason is that, within a cluster, clustering ignores time. Page 3 of Imben’s notes (http://are.berkeley.edu/courses/ARE213/spring2006/lect4_06jan26.pdf) shows the ZZ’ matrix, so clustering would treat the correlation between time 1 and T the same as between time 1 and 2.
Maybe you’re talking about a clustering estimator that uses a different ZZ’ matrix that decays further out from the cluster’s diagonal?

 

And this is how Stata models the clustering:

I just checked, and Stata implements the robust clustered standard error estimation as detailed here:

which is different than the Moulton technique Imbens taught. The robust cluster version probably controls for serial correlation as you say, if the Huber-White robust standard errors also controls for it.

————-  This is what I *thought* was true. I guess if I care to argue, I’d have to run a monte-carlo simulation and see.

The basic message is, don’t cluster on the fixed effect variable. The two are redundant, I think. But they are definitely not the same.

The secondary message is that I am not 100% sure. This isn’t a proof, but just the write up of my sketch of the understanding.

So if you have annual household level data and use household fixed effects, don’t cluster on the household level. If you have (state, year)-level data and have state fixed effets, don’t cluster on the state variable.

Stata (at least Stata 10) will let you cluster on this variable. You might  get smaller standard errors. These are because you did something wrong, not because of the clustering “magic”.

Evidence #1

In my notes for ARE213 (Applied Econometrics, with Imbens), the example he uses doesn’t use fixed effects. I don’t think his references used fixed effects and clustering either; I think they were trying to get something out of the cross sectional variation so they didn’t use fied effects. References

Kloek, T., (1981), “OLS Estimation in a Model where a Microvariable is Explained

by Aggregates and Contemparoneous Disturbances are Equicorrelated,” Econometrica, Vol.

49, No. 1, 205-207.

Moulton, B., (1990), “An Illustration of a Pitfall in Estimating the Effects of Aggregate

Variables on Micro Units,” Review of Ecnomics and Statistics, 334-338.

Moulton, B., and W. Randolph, (1989) “Alternative Tests of the Error Component

Model,” Econometrica, Vol. 57, No. 3, 685-693.

 

Evidence #2

Wooldridge (AER 2003) presents a framework where the c_g is the “cluster effect”. He then says “Under a full set of “fixed effects” assumptions…, inference is straightforward using standard software.” This is with the case of large N asymptotics. If you have small N and large T, then I still think fixed effects deal with the clustering effect.

Cluster-Sample Methods in Applied Econometrics 

Jeffrey M. Wooldridge

 

Evidence #3

This is where I explain my intuition. Clustering has to do with the error structure within a group being correlated. As Wooldridge (2003) presents it, there is a common component within a cluster and an idiosyncratic component. The clustering variable that is estimated is the relative size of the common versus idiosyncratic component. STATA *should* report this variable, but I don’t think it does. You can look at Moulton to see the formula he uses to estimate it; I think he runs OLS first (which is consistent) and then looks at the within group correlation of error terms to get the clustering variable. This variable is between -1 and 1 and represents the share of the variation that is common within a group.

So, if you have 2 states and all the errors in CA are high and all the erros in NY are low, then you probably have positive correlation. And this typically will raise your standard errors. Intuitively, this is because your CA data is not as informative as you thought because after you draw the first observation, part of the second observation is already known via the correlation. (And since larger standard errors generally isn’t what we want, if a referee wants to kill a paper, they can say “cluster” and chances are that standard errors will go up. Not every referee has negative intentions, some just want to be careful and see that your results aren’t driven by clustering, which is a very fair point.)

However, STATA here (http://www.stata.com/support/faqs/stat/cluster.html) writes that your standard errors could go DOWN  if you have negative (intragroup correlation) clustering. Negative clustering means is that if your draws of disturbance terms  so far are mostly positive, your next observations should have a negative disturbance term.  But the astute reader will notice that the Wooldrige formulation of the error term

v_gm=c_g + u_gm

g=group

m=other (stuff, time, etc.)

There is no value of c_g that would induce negative correlation of the error structure.

However the way that the clustering variable is calculated one uses (all?) pairwise combinations within the group to get the correlation. Clearly, if I have two observation in a group and one is positive and one is negative, there is a negative correlation.  I believe that if I have -1, -1, 1, and 1 as my error terms, the pairwise correlation would be negative.

In fact, this example mirrors what happens when you use fixed effects and clustering. The fixed effect will be set to minimize the sum of squared error terms (in OLS, the first step), and this will be similar to demeaning within the fixed effect observations. In the previous 2 state example, the CA fixed effect will be positive and the Ny fixed effect will be negative, and they will be set so that the mean of the error terms within each state is zero. So, when computing the cluster variable, one will see roughly equal positive and negative estimated disturbance terms, which is (waving hands) the conditions for negative correlation within a clustering group.

 

CONCLUSION

Wooldridge presents clustering as:

v_gm=c_g + u_gm

g=group

m=other (stuff, time, etc.)

So if you do fixed effects at the group level, you’re done. Don’t also run clustering. If you do, then you might get negative intra-cluster correlation (which STATA should warm you about but doesn’t), which would shrink your standard errors.

 

===

 

for some reason, I think Stata 9 didn’t allow me to cluster in xtreg , fe; which was a good thing.

If someone who is smarter than me wants to point out what’s wrong with this, I’d be grateful. I would also be grateful for corroborating analysis or evidence from running regressions.  As a simple project, one could run a monte carlo on fixed effects + clustering and see how often you get negative clustering.

 

 

==== MORE CHATTER ====

Floating in the stata mailing list is a claim that fixed effects and clustering at the same level is ok. See:

http://www.stata.com/statalist/archive/2006-09/msg00746.html

http://www.stata.com/statalist/archive/2006-09/msg00782.html

 

I’d be ineteresting to see the citation where someone does clustering AND fixed effects at the same level.

 

One of the messages does cite a great article that argues for the need to cluster or do serial correlation corrections in Diff in Diff (DD) estimators. But the key thing here is that they have state level fixed effects as well as year fixed effects and then suggest adding state-year level clustering.  Importantly, clustering is at the sub-fixed-effect level, where the error terms aren’t constrianed to average out to zero. Consider Row 2 of Table 2  and Table 8.

http://www.mitpressjournals.org/doi/abs/10.1162/003355304772839588

QJE: February 2004, Vol. 119, No. 1, Pages 249-275

How Much Should We Trust Differences-in-Differences Estimates?*

 

Marianne Bertrand
Esther Duflo
Sendhil Mullainathan

1 Comment »

  1. This is my most looked at post.

    Too bad it is barely comprehensible. Don’t take my word on the matter. Read through the links.

    Also, this article is important and gives citations for the robust command in stata:

    http://www.stata.com/support/faqs/stat/robust_ref.html

    Comment by howardchong — July 22, 2009 @ 2:22 am


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.