Economics should be open

March 30, 2009

Clustering with panel data and fixed effects

Filed under: Data Insights, Stata — Tags: , , , , , — howardchong @ 9:27 pm

I have it on good authority that “The cluster option is robust for within-cluster serial correlation of arbitrary form.” Also, that it is ok to run fixed effects on the same level as your clustering.

Question1:  If I have panel data (individual-year) and have individual level fixed effects, does it make sense to cluster on the individual level?

 

Question2:  If I have panel data (individual-year) and have state-level fixed effects, does it make sense to cluster on the state level?
The answer to both, according to Michael Anderson, is Yes, you can do it.

 

Stata implements cluster and robust together. I think that the original Moulton clustering specification is not robust to serial correlation.

 

I wrote (and was wrong, apparently):

Thank you very much, Michael.

When you write about

“If, however, observations that are close together (along the time dimension) have a higher correlation than observations that are far apart (along the time dimension), then the fixed effect will not remove this form of serial correlation.”,

my understanding is that clustering will not correct for this, but a serial correlation correction will.
The reason is that, within a cluster, clustering ignores time. Page 3 of Imben’s notes (http://are.berkeley.edu/courses/ARE213/spring2006/lect4_06jan26.pdf) shows the ZZ’ matrix, so clustering would treat the correlation between time 1 and T the same as between time 1 and 2.
Maybe you’re talking about a clustering estimator that uses a different ZZ’ matrix that decays further out from the cluster’s diagonal?

 

And this is how Stata models the clustering:

I just checked, and Stata implements the robust clustered standard error estimation as detailed here:

which is different than the Moulton technique Imbens taught. The robust cluster version probably controls for serial correlation as you say, if the Huber-White robust standard errors also controls for it.

————-  This is what I *thought* was true. I guess if I care to argue, I’d have to run a monte-carlo simulation and see.

The basic message is, don’t cluster on the fixed effect variable. The two are redundant, I think. But they are definitely not the same.

The secondary message is that I am not 100% sure. This isn’t a proof, but just the write up of my sketch of the understanding.

(more…)

March 27, 2009

Conditional Formatting Excel

Filed under: Excel — howardchong @ 12:01 am

I use conditional formatting a limited amount in Excel. It is great to visually highlight certain data. But It is not quite as useful as autofilter, which it seems like I use daily.

In any case, here are some good links for conditional formatting in the order of their usefulness to me:

 

March 26, 2009

Sample regular expressions for stata

Filed under: Stata, coding — howardchong @ 10:22 pm

Just a note to show how to use regular expressions in stata for text processing.

PROBLEM: I had a lot of codes in variable dsmnem the had “:” and “.” characters. I wanted to do a reshape my data and use these strings as the j variable, i.e. “reshape … j( dsmnem) string”
SOLUTION: regular expressions

replace dsmnem=lower(regexr(dsmnem,”:”,”_”))
replace dsmnem=regexr(dsmnem,”\.”,”")

These two lines replace periods and colons with emptytext and underscore respectively. Note that I have to use the escape character to specify the period character; otherwise the period has a special meaning in the regular expression.

Weird how they call it “regexr” and not “regexp” or “regexpr”, but whatever.

By the way, dsmnem is datastream mnemomic

Stata “unique” command helpful

Filed under: Stata — howardchong @ 9:58 pm

I just found the unique stata command.

PROBLEM: I have a correspondence table of companies to domains. One company can have multiple domains. I wanted a count of the number of unique companies.
SOLUTION: download the “unique” stata command. Install by running “ssc install unique”. You can also read more about it from the site: http://ideas.repec.org/c/boc/bocode/s354201.html

ALTERNATIVE: I used to just do “keep company” and then “duplicates drop”. It was a hack, but it worked. If you have a small number, a easy way to do it is “tab company” and just count the lines.

March 25, 2009

How to scam the Geithner Plan

Filed under: bank bailout — howardchong @ 9:50 pm

Marginal revolution posts a good description of the weakness of the Geithner Plan, i.e. how to game it:

http://www.marginalrevolution.com/marginalrevolution/2009/03/gaming-the-geithner-plan.html
Which refers to the Public-Private Investment Program.

Here I’m just trying to make it REALLY clear how the scam works and that the beneficiary of the scam is the banks. I initially looked into this trying to figure out if I could game the system and get rich. The answer to that is no. The only ones who can win big, is the banks.

I tend to bury the lede, so up front: the message is: The banks will book at most a $7 cost for every $100 to sell the asset at the AAA-rated price. Some estimates of the “market value” are $60. So instead of getting a $40 loss, they just pay $7 to get this junk off their books.

The assets will be sold at pretty-close to whatever the bank values them, which I’d guess is about 80cents per dollar. They could get 100 cents per dollar, but I would bet that’s too fishy.

Details below.
(more…)

March 11, 2009

Data sources for Residential Energy/Electricity Data in California

Filed under: California, Energy, Residential — howardchong @ 10:32 pm

Perhaps no task is more vexing than doing a data search.

I’ve done a pretty thorough data search, over a several year period, with probably about 80 hours of work in here. The topic is residential energy/electricity data.

I’ve published this as a Google Document. You can access this document here:
(more…)

Blog at WordPress.com.