I have it on good authority that “The cluster option is robust for within-cluster serial correlation of arbitrary form.” Also, that it is ok to run fixed effects on the same level as your clustering.
Question1: If I have panel data (individual-year) and have individual level fixed effects, does it make sense to cluster on the individual level?
Stata implements cluster and robust together. I think that the original Moulton clustering specification is not robust to serial correlation.
I wrote (and was wrong, apparently):
Thank you very much, Michael.
“If, however, observations that are close together (along the time dimension) have a higher correlation than observations that are far apart (along the time dimension), then the fixed effect will not remove this form of serial correlation.”,
And this is how Stata models the clustering:
I just checked, and Stata implements the robust clustered standard error estimation as detailed here:
————- This is what I *thought* was true. I guess if I care to argue, I’d have to run a monte-carlo simulation and see.
The basic message is, don’t cluster on the fixed effect variable. The two are redundant, I think. But they are definitely not the same.
The secondary message is that I am not 100% sure. This isn’t a proof, but just the write up of my sketch of the understanding.
So if you have annual household level data and use household fixed effects, don’t cluster on the household level. If you have (state, year)-level data and have state fixed effets, don’t cluster on the state variable.
Stata (at least Stata 10) will let you cluster on this variable. You might get smaller standard errors. These are because you did something wrong, not because of the clustering “magic”.
Evidence #1
In my notes for ARE213 (Applied Econometrics, with Imbens), the example he uses doesn’t use fixed effects. I don’t think his references used fixed effects and clustering either; I think they were trying to get something out of the cross sectional variation so they didn’t use fied effects. References
Kloek, T., (1981), “OLS Estimation in a Model where a Microvariable is Explained
by Aggregates and Contemparoneous Disturbances are Equicorrelated,” Econometrica, Vol.
49, No. 1, 205-207.
Moulton, B., (1990), “An Illustration of a Pitfall in Estimating the Effects of Aggregate
Variables on Micro Units,” Review of Ecnomics and Statistics, 334-338.
Moulton, B., and W. Randolph, (1989) “Alternative Tests of the Error Component
Model,” Econometrica, Vol. 57, No. 3, 685-693.
Evidence #2
Wooldridge (AER 2003) presents a framework where the c_g is the “cluster effect”. He then says “Under a full set of “fixed effects” assumptions…, inference is straightforward using standard software.” This is with the case of large N asymptotics. If you have small N and large T, then I still think fixed effects deal with the clustering effect.
Cluster-Sample Methods in Applied Econometrics
Jeffrey M. Wooldridge
Evidence #3
This is where I explain my intuition. Clustering has to do with the error structure within a group being correlated. As Wooldridge (2003) presents it, there is a common component within a cluster and an idiosyncratic component. The clustering variable that is estimated is the relative size of the common versus idiosyncratic component. STATA *should* report this variable, but I don’t think it does. You can look at Moulton to see the formula he uses to estimate it; I think he runs OLS first (which is consistent) and then looks at the within group correlation of error terms to get the clustering variable. This variable is between -1 and 1 and represents the share of the variation that is common within a group.
So, if you have 2 states and all the errors in CA are high and all the erros in NY are low, then you probably have positive correlation. And this typically will raise your standard errors. Intuitively, this is because your CA data is not as informative as you thought because after you draw the first observation, part of the second observation is already known via the correlation. (And since larger standard errors generally isn’t what we want, if a referee wants to kill a paper, they can say “cluster” and chances are that standard errors will go up. Not every referee has negative intentions, some just want to be careful and see that your results aren’t driven by clustering, which is a very fair point.)
However, STATA here (http://www.stata.com/support/faqs/stat/cluster.html) writes that your standard errors could go DOWN if you have negative (intragroup correlation) clustering. Negative clustering means is that if your draws of disturbance terms so far are mostly positive, your next observations should have a negative disturbance term. But the astute reader will notice that the Wooldrige formulation of the error term
v_gm=c_g + u_gm
g=group
m=other (stuff, time, etc.)
There is no value of c_g that would induce negative correlation of the error structure.
However the way that the clustering variable is calculated one uses (all?) pairwise combinations within the group to get the correlation. Clearly, if I have two observation in a group and one is positive and one is negative, there is a negative correlation. I believe that if I have -1, -1, 1, and 1 as my error terms, the pairwise correlation would be negative.
In fact, this example mirrors what happens when you use fixed effects and clustering. The fixed effect will be set to minimize the sum of squared error terms (in OLS, the first step), and this will be similar to demeaning within the fixed effect observations. In the previous 2 state example, the CA fixed effect will be positive and the Ny fixed effect will be negative, and they will be set so that the mean of the error terms within each state is zero. So, when computing the cluster variable, one will see roughly equal positive and negative estimated disturbance terms, which is (waving hands) the conditions for negative correlation within a clustering group.
CONCLUSION
Wooldridge presents clustering as:
v_gm=c_g + u_gm
g=group
m=other (stuff, time, etc.)
So if you do fixed effects at the group level, you’re done. Don’t also run clustering. If you do, then you might get negative intra-cluster correlation (which STATA should warm you about but doesn’t), which would shrink your standard errors.
===
for some reason, I think Stata 9 didn’t allow me to cluster in xtreg , fe; which was a good thing.
If someone who is smarter than me wants to point out what’s wrong with this, I’d be grateful. I would also be grateful for corroborating analysis or evidence from running regressions. As a simple project, one could run a monte carlo on fixed effects + clustering and see how often you get negative clustering.
==== MORE CHATTER ====
Floating in the stata mailing list is a claim that fixed effects and clustering at the same level is ok. See:
http://www.stata.com/statalist/archive/2006-09/msg00746.html
http://www.stata.com/statalist/archive/2006-09/msg00782.html
I’d be ineteresting to see the citation where someone does clustering AND fixed effects at the same level.
One of the messages does cite a great article that argues for the need to cluster or do serial correlation corrections in Diff in Diff (DD) estimators. But the key thing here is that they have state level fixed effects as well as year fixed effects and then suggest adding state-year level clustering. Importantly, clustering is at the sub-fixed-effect level, where the error terms aren’t constrianed to average out to zero. Consider Row 2 of Table 2 and Table 8.
http://www.mitpressjournals.org/doi/abs/10.1162/003355304772839588
QJE: February 2004, Vol. 119, No. 1, Pages 249-275
Esther Duflo
Sendhil Mullainathan
This is my most looked at post.
Too bad it is barely comprehensible. Don’t take my word on the matter. Read through the links.
Also, this article is important and gives citations for the robust command in stata:
http://www.stata.com/support/faqs/stat/robust_ref.html
Comment by howardchong — July 22, 2009 @ 2:22 am