Economics should be open

November 2, 2009

Pitfalls of STATA robust estimation, a quick simulation study.

Filed under: Uncategorized — howardchong @ 11:50 pm

Use of “robust” after regress in Stata seems to be automatic. From the textbooks,  robust is more asymptotically efficient, and there is only a small hit for not assuming homoskedasticity.

There is one problem I am encountering, and that is in small samples. If your coefficient of interest has very little variation, be careful. Especially when measuring treatment effects where you have very few or very many treated observations.

(more…)

October 13, 2009

Notepad++ and Stata, a better do file editor

Filed under: Stata — howardchong @ 1:16 am

I do not like the bult in stata editor. It makes reading stata do files a chore. I come a bit from the programming world which will show commented lines and blocks in a different color and highlight reserved words. I looked for al alternative stata text editor / do file editor and like Notepad++.

Notepad++ is a good alternative. You can still run blocks of code (like control-D) and who do file (like control-R) if you set it up. Plus it’s free.

(more…)

October 1, 2009

Difference between WLS and weighted average

Filed under: Data Insights, Stata — howardchong @ 11:00 pm

I hear the phrase “what does it look when we weight the data” a lot. It confused me for a while, but I figured it out: it could mean two things, so the response should be, which of the two do you want?

Weighted Least Squares and weighted average are opposite concepts, in a sense.

(more…)

August 26, 2009

List of european power plants, data sources for electricity generation

Filed under: Carbon Trading, Data Insights, Energy, Open Source — howardchong @ 6:19 pm

I was looking for a list of power plants in Europe in 2008. I didn’t find one. You know why? It just got created in late 2008, and I just found it in 2009.

http://carma.org/

More beta below the bump.

(more…)

August 14, 2009

Octave cell-arrays are pretty slow

Filed under: coding — howardchong @ 5:50 pm

I’m trying to figure out which open source statistical/computation package to use.  I used to use Matlab. It’s good, but expensive, and it has WAY more features than I need.

I know I should be running things on Unix, but right now I’m on Windows XP. I sometimes putty into a Unix server and run things.

R looks very good. That’s  my next langauge to learn.

Octave is pretty good. It provides syntax almost identical to Matlab.  In 3.0, it now has support for Multidimensional Cell Arrays. These are arrays that can hold any data type. Most common for me is an array of strings. If you load data that is mixed text and numeric, then your data will probably be read as a cell-array.

One thing I have noticed is that the cell-arrays are really quite slow.

I had a ~10000 x 10 csv file.

Column 1 had mixed numeric and strings. They were 6 character codes, and about 2/3 of them did not have alphabetical characters. I needed to convert these to strings, and then do a sort and some other processing. I basically had to traverse each element of the first row and do the datatype change individually.

The process was VERY slow. In fact, I think Excel would be better at such tasks.

Here are a few tips:

  • If you can, remove all strings from your CSV file.
  • If you read a large dataset as a large cell arrays, separate each column into its own variable. Then pack together the numeric data into a matrix (if needed).
  • STATA has an “encode” routine that converts strings into records stored as numeric. For example, if your data range is car makes, it will give each make a number and then also generate a lookup table where you can decipher what the numbers mean.

Also check out this page that benchmarks the math/science packages with a set of standard routines:

http://www.sciviews.org/benchmark/index.html

July 29, 2009

Stata, control flow based on variable type

Filed under: Uncategorized — howardchong @ 12:45 am

Suppose you want to write a function (or a loop), where you do something to every variable that depends on its type. In matlab, I would use “isnumber”, etc, or just use the function that returns the type of the variable.

I couldn’t find such a function in stata. There is no “isnumeric” or “isfloat” function.

There is an extended function called “type”. This is my prefered way to do it:

local mytype : type myvarname
disp "`mytype'"

They also have “confirm”, which works:

http://www.stata.com/help.cgi?confirm

(more…)

July 28, 2009

octave error “error: invalid call to script”

Filed under: Uncategorized — howardchong @ 11:18 pm

I am new to Octave. I am using Octave 3.0 on windows (with octave-forge) and trying to call a script and get the following error:

“error: invalid call to script”

The fix?

I was trying to run hello1.m by typing

hello1.m

but I need to take the .m off and type

hello1

This error may also come up for different reasons, but this was my reason.

July 27, 2009

Input Output Table, generating from BEA data

Filed under: Uncategorized — howardchong @ 6:52 pm

I struggled a bit in putting together an input output  table (IO table) for the US economy in doing my research on the impact of carbon prices on industrial activity.

Here I include Octave (compatible with MATLAB) code to generate an input output table given 3 inputs from the Bureau of Economic Analysis:

  1. Total consumption
  2. Industrial activity linked to commodity (USE)
  3. Industries and how much of each commodity they produce (MAKE)

More…

(more…)

July 20, 2009

identity matrix in excel

Filed under: Uncategorized — howardchong @ 10:13 pm

So, excel can do matrix calculations. That’s useful. But why the hell didn’t it give us a function to create an identity matrix.

Here’s a quick hack (no programming) to generate an identity matrix.

(more…)

June 4, 2009

Billing Data and Randomized Experiments in Energy Efficiency Evaluation a research survey

Filed under: California, Data Insights, Energy, Residential — howardchong @ 9:44 pm

I’m doing a research survey of empirical evaluations of energy efficiency using billing data. Much evaluation is done in the laboratory and these estimates are extrapolated to the field. I’m looking at whether field data has been used to test the laboratory assumptions. I found one by Dubin et al from 1986. I review why this is important and other related articles. This is part of my ongoing research so feedback, especially detailed and esoteric knowledge are greatly appreciated.

(more…)

Older Posts »

Blog at WordPress.com.