Economics should be open

August 26, 2009

List of european power plants, data sources for electricity generation

Filed under: Carbon Trading, Data Insights, Energy, Open Source — howardchong @ 6:19 pm

I was looking for a list of power plants in Europe in 2008. I didn’t find one. You know why? It just got created in late 2008, and I just found it in 2009.

http://carma.org/

More beta below the bump.

(more…)

February 6, 2009

Traversing a directory in stata

Filed under: coding, Open Source, Stata — howardchong @ 12:17 am

I found a nice way to traverse a directory and load all the files in the directory. The key stata commands are to run a directory listing and output the list to a file. Then, you just have to use “levelsof” (or levels) to get your file names.

PROBLEM: Load a bunch of fixed-width TXT files in a directory without having to list all the file names.
SOLUTION:
STATA code

Unfortunately stata has some lame limits on the number of characters in a string (type “help limits”, 244 is the smallest limit), so this will break if you have too many files. You can probably fix this by using a matrix of strings, but I didn’t need to do that.

Another kludgy way of loading ALOT of files would be to not use levels of and each time open up the filelist.txt file and do something like “local filename_to_get=v1[`i’]” and loop over i=1 to numfiles.

January 8, 2009

file “outreg2_prf.ado” not found

Filed under: coding, Open Source, Stata — howardchong @ 1:03 am

So, I get the above stata error when using outreg2 which I install with “ssc install outreg2”.

This articles tells you (1) what I did to trigger the error message and (2) what steps I took to fix it.

UPDATE JUL2009: A good comment below suggests (from stata staff) that  it has to do with disk writes. So, that’s the best answer to date.

(more…)

December 16, 2008

Data cleaning, excel

Filed under: coding, Excel, Open Source, Uncategorized — Tags: , , , — howardchong @ 10:17 pm

Ever have data with commas between the thousand and million marks? spaces at the end of numbers? text footnotes appended to numbers, spaces in numbers?

Cleaning this by hand is a complete pain, so I wrote an Excel Macro.

To install, record a new macro and then stick this in the module.

There’s even a “careful” mode. (Change the line of careful=false to careful=true) which prompts you for each change that I felt might be wrong.

Please write a comment if you find it helpful.

 

(more…)

December 2, 2008

MS Access Tables to Stata, for Residential Energy Consumption Survey

Problem: To convert MS Access tables (of the EIA Residential Energy Consumption Survey 1997 data) to CSV files for STATA import

This is also a general script for converting MS Tables to CSVs

Solution: Wrote a VBA script

 

  1. Open MS Access
  2. Tools | Macros | Visual Basic Editor
  3. Create a new module on your database by doing: Right Clickon database | Insert | Module
  4. Copy the following text (without the line numbers)
    1. Sub ExportAllTablesCsv()
    2. Dim dbMyDB As Database
    3. Set dbMyDB = OpenDatabase(“recs97_converted.mdb”)
    4.     For Each tdfCurrent In dbMyDB.TableDefs
    5.         fileoutname = “C:\” & tdfCurrent.Name & “.csv”
    6.         If Left(tdfCurrent.Name, 2) <> “MS” Then
    7.             DoCmd.TransferText acExportDelim, , tdfCurrent.Name, fileoutname, TRUE
    8.         End If
    9.             
    10.     Next
    11. End Sub
  5. For your customization:
    1. change your mdb file to your mdb file
    2. The If statement (line 10) is there to deal with the fact that there are certain system tables that I needed to skip. None of my tables started with “MS”, so this was a simple non-general fix.
    3. If you look up help for TransferText it has a SpecificationName. I left that argument blank. You can change to tabs or other things using that.
    4. I had trouble with getting column headings, but jwhite at UtterAccess.com showed me the light. This code now gives column headings with the TRUE argument.  For more details, check out http://www.utteraccess.com/forums/showflat.php?Cat=&Number=1752140

If you can think of a better solution, I’d love your thoughts!

Other possible/failed solutions:

  • I tried running a query that joined all the tables with the EIA ID Num as the key for each table. Though this was easy, I got the “too many fields” error.
  • I think there is a way to use SQL (or something???) to select all the tables and write them and to import those into another program. STATA, to my limited knowledge, doesn’t do SQL. 
  • STATA supposedly does XML. One can use TransferText to write XML too. I don’t know XML well enough to try it.

Any comment appreciated.

 

 

—-

(more…)

November 6, 2008

EPER, consistency with indirect emissions, email

So using EPER (European Pollution Emissions Register), I’ve found some anomalies with Aluminum data. The main question is whether some countries might be counting indirect emissions (akin to life cycle analysis). Aluminum,  as I understand it (I can cite a McKenzie report), has very small direct process emissions and mainly uses electricity. Since the CO2 generated is usually attributed to the power plant, that CO2 is “indirectly” emitted by Aluminum producers.

On the same token, my personal direct emissions are the gasoline/petrol and natural gas I burn. The indirect is all the CO2 associated with the electricity I use.

Here is a letter I sent to EPER folks at the European Commission. Hope I get a response!

=====

 

To Whom It May Concern:
I am a researcher at the Univ of California, Berkeley working with the EPER data. First of all, I want to thank you for providing this information; I’ve found the information very useful and appreciate full access to the database tables (MS ACCESS).
I have a question about the underlying the questionnaire, a copy of which I could not find online (ASIDE: does every country implement their own version of the questionnaire?). My question regards direct vs indirect emissions of CO2 and double-counting. If a manufacturing firm only uses electricity, is the manufacturer’s CO2 emissions zero and the indirect CO2 is reported at the utility level?
Aluminum, anecdotally, should have very small direct CO2 emissions because the main input is electricity. However, the following is just one example of an aluminum producer record with very large CO2 emissions. This leads me to believe that indirect emissions are counted sometimes.
Thank you in advance for your help in understanding the data.
=====
EXAMPLE OF HIGH CO2 RECORD FOR ALUMINUM PRODUCER

CountryID:  DE

ReportYear:  2004

Emission.FacilityID:  216219

ParentCompanyName:  TRIMET ALUMINIUM AG

Address:  Aluminiumallee 1

City:  Essen

FacilityName:  TRIMET ALUMINIUM AG

Emissions (metric tons):  301000

Code:  2.1/2.2/2.3/2.4/2.5/2.6

Description:  Metal industry and metal ore roasting or sintering installations, Installations for the production of ferrous and non-ferrous metals

Text:   Aluminium production                                                                                                                                                                                   

MainActivity:  1

ActivityID:  12

— 
Howard Chong
Dept. of Agricultural and Resource Economics and UC Energy Institute
UC Berkeley
chong@are.berkeley.edu
Office: 510-643-4831
Cell: 510-333-0539

October 31, 2008

CITL, coverage for the ETS, matching to EPER

Filed under: Carbon Trading, Data Insights, Open Source — Tags: , , , , , — howardchong @ 9:53 pm

This post is a big deal for me because it really pushes me to stay true to open source principles.

So, here’s the deal.
The ETS is the Emissions Trading Scheme, a cap and trade carbon program in Europe.
The CITL is the Community Independent Transaction Log for the ETS.
The EPER is European Pollution Emissions Register (http://eper.eea.europa.eu/eper/) which is a European version of the Toxics Release Inventory in the US, only much better in that it covers more emissions (including CO2).

And, my current project is this 50 hour effort to match records in the CITL to records in the EPER.

What’s the big deal? Well, I’m getting insight into what companies were excluded from the ETS, something that may or may not be well highlighted in the national allocation plans. For all the mandarins running the ETS, could it be that they failed to ensure that countries included all units that should be under the ETS in the ETS? It gets to the question of whether allowances were too high (somewhat, my own sense is that economic activity and weather had something to do with the “over-allocation”).

So, here’s the deal. There’s plenty I want to do with this data and I think there is a small time window to do it. So, if you want to work on this project with my matched database, please write me.

As academia is all about getting credit for what you do, we’d have to talk carefully about credit, etc. But my prior is that any work done would be collaborative and everyone gets to share credit.

If you are a private firm doing proprietary market research (i.e., you wouldn’t want what you do with the data to be public), ask me what info you need, and I’ll probably give it to you, perhaps for a fee or some other trade. This information has a full list of contact information for EUA permit holders.

I’m already telling you too much by telling you that there’s something interesting in the EPER-CITL data matching, but that’s the risk I’m taking. Partly because I think it is more important that good research be done and get out there than that I get total credit.

You comments are deeply appreciated.

September 29, 2008

Eurostat “database” and downloading bug

Filed under: Data Insights, Open Source — Tags: , — howardchong @ 10:02 pm

I need to download various data from Eurostat (the European statistical unit). I found this bug (and work around) for downloading files in a standard format readable by a variety of software packages.

Accessing the Web-access to the “Database” using Firefox (http://epp.eurostat.ec.europa.eu/extraction/evalight/EVAlight.jsp?A=1&language=en&root=/theme8/nrg/nrg_135a, called Eurostat Visual Application), I get a consistent error in downloading files. After the first step of specifying the query and the second step of table definitions, I select “For Databases” and then click on download. The file downloaded is called a ZIP file, but is not of zip format and is always about 410 bytes long. When I click download a second time, it opens up another window in the browser with a link. When I click this link, then I get the data I requested. I’ve come across this bug several times with different data sets.

With both Google Chrome and Internet Explorer (7), I cannot get downloading of the “for database” option to work correctly at all.

So, I reported this to Eurostat today. I’ll see if they give a response.

I’ll save you a rant on Eurostat generally, but summarize it with: “Great Data. Bad Access Tools.”

September 26, 2008

COMTRADE – data availability of 2006 data, a graph

Filed under: Data Insights, Open Source — howardchong @ 9:47 pm

So, here is when countries reported their 2006 COMTRADE data.

 

Comtrade 2006 data publication dates

Comtrade 2006 data publication dates

Looks like steady reporting starting in Feb of 2007 through Oct of 2007. Another 20 reporters through May 2008. And then a spate in July-Sept 2008. I would GUESS that this is just additional nomenclatures or revisions. (would anyone like to check for me?) This data uses all reporting as of 9/26/2008.

Note: this is a quick and dirty chart. Reporters may have uploaded data twice which gets counted twice. This also collapses all classifications together. But it does answer with moderate precision: most data is available about a year after.

 

The source data is http://comtrade.un.org/db/mr/daYearsNewResults.aspx?nom=all&px=ori

COMTRADE – some usage notes from UC Berkeley

Filed under: Data Insights, Open Source — howardchong @ 9:29 pm

I’m working with COMTRADE data. COMTRADE is the main source for trade data in the world. I talked to an IMF data guy and he told me that they use COMTRADE for trade statistics (they run some of their own surveys too, but for general purpose, this is it).

COMTRADE is not the easiest thing to use, though it is pretty easy.

I’m at UC Berkeley, so we have site coverage. If you connect from a campus terminal or AIRBEARS, you should be able to get full access.

You should download files via the “Direct Download” link. I tried SDMX and could not get it to work very well. I am not that good at XML. If someone is, I would love to talk to you to develop a tool to easily extract data (I spent some time using Python XML tools trying to get it to work. I could, but I decided the hurdle was too high). Data downloaded via “direct download” is in CSV format.

Now the kicker. What codes? What classifciation system.

Classification System

Check out this file: SEPT26snapshot

It shows what data is available in each classification for recent years. Note: NOT ALL DATA IS REPORTED IN ALL CLASSIFICATIONS. SITC Rev 3 seems to be the fullest. HS2002 is good, but has less. They don’t downconvert to HS1996. So, if you want a series from 1990-2007, you might be trying to figure out whether you want HS1996 but fewer countries, or to get HS1996 for pre-2001 data and then HS2002 for later data and merging across classifications.

Google search for “RAMON metadata server” to get some correspondence tables between nomenclatures.

Access

I don’t like using their interface, but it is helpful at first.

First, try a simple search with BASIC SELECTION. Look at the query that is produced in the URL. We’ll talk about this below. Write down the codes for countries and codes

Next, try to use the express selection. You have to know already what codes you want. If you look ar the URL above, you can extract the codes you need. I’ve got lists for HS classification so I can browse it in excel or run greps to find the fields I want. For countries, here is a list of codes:

CtyCode Cty Name English
0 World
4 Afghanistan
8 Albania
10 Antarctica
12 Algeria
16 American Samoa
20 Andorra
24 Angola
28 Antigua and Barbuda
31 Azerbaijan
32 Argentina
36 Australia
40 Austria
44 Bahamas
48 Bahrain
50 Bangladesh
51 Armenia
52 Barbados
56 Belgium
58 Belgium-Luxembourg
60 Bermuda
64 Bhutan
68 Bolivia
70 Bosnia Herzegovina
72 Botswana
74 Bouvet Island
76 Brazil
80 Br. Antarctic Terr.
84 Belize
86 Br. Indian Ocean Terr.
90 Solomon Isds
92 Br. Virgin Isds
96 Brunei Darussalam
97 EU25
100 Bulgaria
104 Myanmar
108 Burundi
112 Belarus
116 Cambodia
120 Cameroon
124 Canada
132 Cape Verde
136 Cayman Isds
140 Central African Rep.
144 Sri Lanka
148 Chad
152 Chile
156 China
158 Taiwan, Province of China
162 Christmas Isds
166 Cocos Isds
170 Colombia
174 Comoros
175 Mayotte
178 Congo
180 Dem. Rep. of the Congo
184 Cook Isds
188 Costa Rica
191 Croatia
192 Cuba
196 Cyprus
200 Czechoslovakia
203 Czech Rep.
204 Benin
208 Denmark
212 Dominica
214 Dominican Rep.
218 Ecuador
222 El Salvador
226 Equatorial Guinea
230 Fmr Ethiopia
231 Ethiopia
232 Eritrea
233 Estonia
234 Faeroe Isds
238 Falkland Isds (Malvinas)
239 South Georgia and the South Sandwich Islands
242 Fiji
246 Finland
250 France
251 France
254 French Guiana
258 French Polynesia
260 Fr. South Antarctic Terr.
262 Djibouti
266 Gabon
268 Georgia
270 Gambia
275 Occ. Palestinian Terr.
276 Germany
278 Fmr Dem. Rep. of Germany
280 Fmr Fed. Rep. of Germany
288 Ghana
292 Gibraltar
296 Kiribati
300 Greece
304 Greenland
308 Grenada
312 Guadeloupe
316 Guam
320 Guatemala
324 Guinea
328 Guyana
332 Haiti
334 Heard Island and McDonald Islands
336 Holy See (Vatican City State)
340 Honduras
344 China, Hong Kong SAR
348 Hungary
352 Iceland
356 India, excl. Sikkim
360 Indonesia
364 Iran
368 Iraq
372 Ireland
376 Israel
380 Italy, excluding San Marino and the Holy See
381 Italy
384 Côte d’Ivoire
388 Jamaica
392 Japan
398 Kazakhstan
400 Jordan
404 Kenya
408 Dem. People’s Rep. of Korea
410 Rep. of Korea
412 Kosovo
414 Kuwait
417 Kyrgyzstan
418 Lao People’s Dem. Rep.
422 Lebanon
426 Lesotho
428 Latvia
430 Liberia
434 Libya
438 Liechtenstein
440 Lithuania
442 Luxembourg
446 China, Macao SAR
450 Madagascar
454 Malawi
457 Sarawak
458 Malaysia
459 Peninsula Malaysia
461 Sabah
462 Maldives
466 Mali
470 Malta
473 South America, nes
474 Martinique
478 Mauritania
480 Mauritius
484 Mexico
488 Midway
490 Asia, nes
492 Monaco
496 Mongolia
498 Rep. of Moldova
499 Montenegro
500 Montserrat
504 Morocco
508 Mozambique
512 Oman
516 Namibia
520 Nauru
524 Nepal
527 Oceania, nes
528 Netherlands
530 Neth. Antilles
532 Neth. Antilles and Aruba
533 Aruba
536 Neutral zone
540 New Caledonia
548 Vanuatu
554 New Zealand
558 Nicaragua
562 Niger
566 Nigeria
568 Europe, nes
570 Niue
574 Norfolk Isds
577 Africa, nes
578 Norway, excluding Bouvet Island, Svalbard and Jan Mayen Islands
579 Norway
580 N. Mariana Isds
581 United States Minor Outlying Islands
582 Fmr Pacific Isds
583 FS Micronesia
584 Marshall Isds
585 Palau
586 Pakistan
588 East and West Pakistan
590 Fmr Panama, excl.Canal Zone
591 Panama
592 Fmr Panama-Canal-Zone
598 Papua New Guinea
600 Paraguay
604 Peru
608 Philippines
612 Pitcairn
616 Poland
620 Portugal
624 Guinea-Bissau
626 Timor-Leste
630 Puerto Rico
634 Qatar
637 North America, the Caribbean and Central America not elsewhere specified
638 Réunion
642 Romania
643 Russian Federation
646 Rwanda
647 Ryukyu Isd
654 Saint Helena
658 Saint Kitts, Nevis and Anguilla
659 Saint Kitts and Nevis
660 Anguilla
662 Saint Lucia
666 Saint Pierre and Miquelon
670 Saint Vincent and the Grenadines
674 San Marino
678 Sao Tome and Principe
682 Saudi Arabia
686 Senegal
688 Serbia
690 Seychelles
694 Sierra Leone
698 Sikkim
699 India
702 Singapore
703 Slovakia
704 Viet Nam
705 Slovenia
706 Somalia
710 South Africa
711 So. African Customs Union
716 Zimbabwe
717 Fmr Rhodesia Nyas
720 Fmr Dem. Yemen
724 Spain
732 Western Sahara
736 Sudan
740 Suriname
744 Svalbard and Jan Mayen Islands
748 Swaziland
752 Sweden
756 Switzerland
757 Switzerland
760 Syria
762 Tajikistan
764 Thailand
768 Togo
772 Tokelau
776 Tonga
780 Trinidad and Tobago
784 United Arab Emirates
788 Tunisia
792 Turkey
795 Turkmenistan
796 Turks and Caicos Isds
798 Tuvalu
800 Uganda
804 Ukraine
807 TFYR of Macedonia
810 Fmr USSR
818 Egypt
826 United Kingdom
834 United Rep. of Tanzania
835 Fmr Tanganyika
836 Fmr Zanzibar and Pemba Isd
837 Bunkers
838 Free Zones
839 Secret & Differences
840 United States, excluding Puerto Rico and United States Virgin Islands
841 USA (before 1981)
842 USA
849 US Misc. Pacific Isds
850 US Virgin Isds
854 Burkina Faso
858 Uruguay
860 Uzbekistan
862 Venezuela
866 Fmr Dem. Rep. of Vietnam
868 Fmr Rep. of Vietnam
872 Wake Island
876 Wallis and Futuna Isds
882 Samoa
886 Fmr Arab Rep. of Yemen
887 Yemen
890 Yugoslavia
891 Serbia and Montenegro
894 Zambia
899 Areas, nes
1251 France, including Monaco, excluding Guadeloupe, Martinique, Reunion and French Guiana
1381 Italy, including San Marino

Lastly, take a look at the query. Here is a query I ran:

All reporters, EU countries + world partners

http://comtrade.un.org/db/dqBasicQueryResults.aspx?cc=72, 7201, 7202, 7203, 7206, 7207&px=H1&r=4, 8, 10, 12, 16, 20, 24, 28, 31, 32, 36, 40, 44, 48, 50, 51, 52, 56, 58, 60, 64, 68, 70, 72, 74, 76, 80, 84, 86, 90, 92, 96, 97, 100, 104, 108, 112, 116, 120, 124, 132, 136, 140, 144, 148, 152, 156, 158, 162, 166, 170, 174, 175, 178, 180, 184, 188, 191, 192, 196, 200, 203, 204, 208, 212, 214, 218, 222, 226, 230, 231, 232, 233, 234, 238, 239, 242, 246, 250, 251, 254, 258, 260, 262, 266, 268, 270, 275, 276, 278, 280, 288, 292, 296, 300, 304, 308, 312, 316, 320, 324, 328, 332, 334, 336, 340, 344, 348, 352, 356, 360, 364, 368, 372, 376, 380, 381, 384, 388, 392, 398, 400, 404, 408, 410, 412, 414, 417, 418, 422, 426, 428, 430, 434, 438, 440, 442, 446, 450, 454, 457, 458, 459, 461, 462, 466, 470, 473, 474, 478, 480, 484, 488, 490, 492, 496, 498, 499, 500, 504, 508, 512, 516, 520, 524, 527, 528, 530, 532, 533, 536, 540, 548, 554, 558, 562, 566, 568, 570, 574, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 588, 590, 591, 592, 598, 600, 604, 608, 612, 616, 620, 624, 626, 630, 634, 637, 638, 642, 643, 646, 647, 654, 658, 659, 660, 662, 666, 670, 674, 678, 682, 686, 688, 690, 694, 698, 699, 702, 703, 704, 705, 706, 710, 711, 716, 717, 720, 724, 732, 736, 740, 744, 748, 752, 756, 757, 760, 762, 764, 768, 772, 776, 780, 784, 788, 792, 795, 796, 798, 800, 804, 807, 810, 818, 826, 834, 835, 836, 837, 838, 839, 840, 841, 842, 849, 850, 854, 858, 860, 862, 866, 868, 872, 876, 882, 886, 887, 890, 891, 894, 899, 1251, 1381&y=1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006&p=0, 842, 156, 97, 8, 20, 40, 56, 58, 70, 100, 191, 196, 200, 203, 208, 233, 234, 246, 251, 276, 292, 300, 348, 352, 372, 381, 428, 440, 442, 470, 528, 574, 579, 616, 620, 642, 703, 705, 724, 752, 757, 792, 807, 826, 890&rg=1,2&so=9999&qt=n

Note all the & characters. These are field separators. the equal signs set the values.

The above query gets steel (the 72** codes) for a bunch of reporters, for years 97-2006 with various partners (namely World (0) and EU, China, others), for imports and exports.

All express query does is fill out these fields, so if you want to generate the query yourself (with an excel database to store/modify your queries, all you need is the output url string to be correct, as above.

A request

Anyone know python, perl, or another scripting language and want to help me automate gathering data. To the maximum extent possible, I want to automate the data gathering process. I specifically do NOT want to facilitate any broad data ripping. It’s just that I may need to run repeated queries (maybe about 1000) and manually clicking is my least favorite part of the task.

The pseudocode of an API I’d want is as follows:

inputs: fields to get

1) start a browser session that keeps track of cookies, etc.

2) submit a properly formatted request URL to comtrade.

3) follow the link to “direct download”

4) do some initial preprocessing of the file.

My main difficulty is I haven’t figured out how to simulate a browser session (step 1).

Please post comments.

Older Posts »

Blog at WordPress.com.