Economics should be open

September 26, 2008

COMTRADE – some usage notes from UC Berkeley

Filed under: Data Insights, Open Source — howardchong @ 9:29 pm

I’m working with COMTRADE data. COMTRADE is the main source for trade data in the world. I talked to an IMF data guy and he told me that they use COMTRADE for trade statistics (they run some of their own surveys too, but for general purpose, this is it).

COMTRADE is not the easiest thing to use, though it is pretty easy.

I’m at UC Berkeley, so we have site coverage. If you connect from a campus terminal or AIRBEARS, you should be able to get full access.

You should download files via the “Direct Download” link. I tried SDMX and could not get it to work very well. I am not that good at XML. If someone is, I would love to talk to you to develop a tool to easily extract data (I spent some time using Python XML tools trying to get it to work. I could, but I decided the hurdle was too high). Data downloaded via “direct download” is in CSV format.

Now the kicker. What codes? What classifciation system.

Classification System

Check out this file: SEPT26snapshot

It shows what data is available in each classification for recent years. Note: NOT ALL DATA IS REPORTED IN ALL CLASSIFICATIONS. SITC Rev 3 seems to be the fullest. HS2002 is good, but has less. They don’t downconvert to HS1996. So, if you want a series from 1990-2007, you might be trying to figure out whether you want HS1996 but fewer countries, or to get HS1996 for pre-2001 data and then HS2002 for later data and merging across classifications.

Google search for “RAMON metadata server” to get some correspondence tables between nomenclatures.

Access

I don’t like using their interface, but it is helpful at first.

First, try a simple search with BASIC SELECTION. Look at the query that is produced in the URL. We’ll talk about this below. Write down the codes for countries and codes

Next, try to use the express selection. You have to know already what codes you want. If you look ar the URL above, you can extract the codes you need. I’ve got lists for HS classification so I can browse it in excel or run greps to find the fields I want. For countries, here is a list of codes:

CtyCode Cty Name English
0 World
4 Afghanistan
8 Albania
10 Antarctica
12 Algeria
16 American Samoa
20 Andorra
24 Angola
28 Antigua and Barbuda
31 Azerbaijan
32 Argentina
36 Australia
40 Austria
44 Bahamas
48 Bahrain
50 Bangladesh
51 Armenia
52 Barbados
56 Belgium
58 Belgium-Luxembourg
60 Bermuda
64 Bhutan
68 Bolivia
70 Bosnia Herzegovina
72 Botswana
74 Bouvet Island
76 Brazil
80 Br. Antarctic Terr.
84 Belize
86 Br. Indian Ocean Terr.
90 Solomon Isds
92 Br. Virgin Isds
96 Brunei Darussalam
97 EU25
100 Bulgaria
104 Myanmar
108 Burundi
112 Belarus
116 Cambodia
120 Cameroon
124 Canada
132 Cape Verde
136 Cayman Isds
140 Central African Rep.
144 Sri Lanka
148 Chad
152 Chile
156 China
158 Taiwan, Province of China
162 Christmas Isds
166 Cocos Isds
170 Colombia
174 Comoros
175 Mayotte
178 Congo
180 Dem. Rep. of the Congo
184 Cook Isds
188 Costa Rica
191 Croatia
192 Cuba
196 Cyprus
200 Czechoslovakia
203 Czech Rep.
204 Benin
208 Denmark
212 Dominica
214 Dominican Rep.
218 Ecuador
222 El Salvador
226 Equatorial Guinea
230 Fmr Ethiopia
231 Ethiopia
232 Eritrea
233 Estonia
234 Faeroe Isds
238 Falkland Isds (Malvinas)
239 South Georgia and the South Sandwich Islands
242 Fiji
246 Finland
250 France
251 France
254 French Guiana
258 French Polynesia
260 Fr. South Antarctic Terr.
262 Djibouti
266 Gabon
268 Georgia
270 Gambia
275 Occ. Palestinian Terr.
276 Germany
278 Fmr Dem. Rep. of Germany
280 Fmr Fed. Rep. of Germany
288 Ghana
292 Gibraltar
296 Kiribati
300 Greece
304 Greenland
308 Grenada
312 Guadeloupe
316 Guam
320 Guatemala
324 Guinea
328 Guyana
332 Haiti
334 Heard Island and McDonald Islands
336 Holy See (Vatican City State)
340 Honduras
344 China, Hong Kong SAR
348 Hungary
352 Iceland
356 India, excl. Sikkim
360 Indonesia
364 Iran
368 Iraq
372 Ireland
376 Israel
380 Italy, excluding San Marino and the Holy See
381 Italy
384 Côte d’Ivoire
388 Jamaica
392 Japan
398 Kazakhstan
400 Jordan
404 Kenya
408 Dem. People’s Rep. of Korea
410 Rep. of Korea
412 Kosovo
414 Kuwait
417 Kyrgyzstan
418 Lao People’s Dem. Rep.
422 Lebanon
426 Lesotho
428 Latvia
430 Liberia
434 Libya
438 Liechtenstein
440 Lithuania
442 Luxembourg
446 China, Macao SAR
450 Madagascar
454 Malawi
457 Sarawak
458 Malaysia
459 Peninsula Malaysia
461 Sabah
462 Maldives
466 Mali
470 Malta
473 South America, nes
474 Martinique
478 Mauritania
480 Mauritius
484 Mexico
488 Midway
490 Asia, nes
492 Monaco
496 Mongolia
498 Rep. of Moldova
499 Montenegro
500 Montserrat
504 Morocco
508 Mozambique
512 Oman
516 Namibia
520 Nauru
524 Nepal
527 Oceania, nes
528 Netherlands
530 Neth. Antilles
532 Neth. Antilles and Aruba
533 Aruba
536 Neutral zone
540 New Caledonia
548 Vanuatu
554 New Zealand
558 Nicaragua
562 Niger
566 Nigeria
568 Europe, nes
570 Niue
574 Norfolk Isds
577 Africa, nes
578 Norway, excluding Bouvet Island, Svalbard and Jan Mayen Islands
579 Norway
580 N. Mariana Isds
581 United States Minor Outlying Islands
582 Fmr Pacific Isds
583 FS Micronesia
584 Marshall Isds
585 Palau
586 Pakistan
588 East and West Pakistan
590 Fmr Panama, excl.Canal Zone
591 Panama
592 Fmr Panama-Canal-Zone
598 Papua New Guinea
600 Paraguay
604 Peru
608 Philippines
612 Pitcairn
616 Poland
620 Portugal
624 Guinea-Bissau
626 Timor-Leste
630 Puerto Rico
634 Qatar
637 North America, the Caribbean and Central America not elsewhere specified
638 Réunion
642 Romania
643 Russian Federation
646 Rwanda
647 Ryukyu Isd
654 Saint Helena
658 Saint Kitts, Nevis and Anguilla
659 Saint Kitts and Nevis
660 Anguilla
662 Saint Lucia
666 Saint Pierre and Miquelon
670 Saint Vincent and the Grenadines
674 San Marino
678 Sao Tome and Principe
682 Saudi Arabia
686 Senegal
688 Serbia
690 Seychelles
694 Sierra Leone
698 Sikkim
699 India
702 Singapore
703 Slovakia
704 Viet Nam
705 Slovenia
706 Somalia
710 South Africa
711 So. African Customs Union
716 Zimbabwe
717 Fmr Rhodesia Nyas
720 Fmr Dem. Yemen
724 Spain
732 Western Sahara
736 Sudan
740 Suriname
744 Svalbard and Jan Mayen Islands
748 Swaziland
752 Sweden
756 Switzerland
757 Switzerland
760 Syria
762 Tajikistan
764 Thailand
768 Togo
772 Tokelau
776 Tonga
780 Trinidad and Tobago
784 United Arab Emirates
788 Tunisia
792 Turkey
795 Turkmenistan
796 Turks and Caicos Isds
798 Tuvalu
800 Uganda
804 Ukraine
807 TFYR of Macedonia
810 Fmr USSR
818 Egypt
826 United Kingdom
834 United Rep. of Tanzania
835 Fmr Tanganyika
836 Fmr Zanzibar and Pemba Isd
837 Bunkers
838 Free Zones
839 Secret & Differences
840 United States, excluding Puerto Rico and United States Virgin Islands
841 USA (before 1981)
842 USA
849 US Misc. Pacific Isds
850 US Virgin Isds
854 Burkina Faso
858 Uruguay
860 Uzbekistan
862 Venezuela
866 Fmr Dem. Rep. of Vietnam
868 Fmr Rep. of Vietnam
872 Wake Island
876 Wallis and Futuna Isds
882 Samoa
886 Fmr Arab Rep. of Yemen
887 Yemen
890 Yugoslavia
891 Serbia and Montenegro
894 Zambia
899 Areas, nes
1251 France, including Monaco, excluding Guadeloupe, Martinique, Reunion and French Guiana
1381 Italy, including San Marino

Lastly, take a look at the query. Here is a query I ran:

All reporters, EU countries + world partners

http://comtrade.un.org/db/dqBasicQueryResults.aspx?cc=72, 7201, 7202, 7203, 7206, 7207&px=H1&r=4, 8, 10, 12, 16, 20, 24, 28, 31, 32, 36, 40, 44, 48, 50, 51, 52, 56, 58, 60, 64, 68, 70, 72, 74, 76, 80, 84, 86, 90, 92, 96, 97, 100, 104, 108, 112, 116, 120, 124, 132, 136, 140, 144, 148, 152, 156, 158, 162, 166, 170, 174, 175, 178, 180, 184, 188, 191, 192, 196, 200, 203, 204, 208, 212, 214, 218, 222, 226, 230, 231, 232, 233, 234, 238, 239, 242, 246, 250, 251, 254, 258, 260, 262, 266, 268, 270, 275, 276, 278, 280, 288, 292, 296, 300, 304, 308, 312, 316, 320, 324, 328, 332, 334, 336, 340, 344, 348, 352, 356, 360, 364, 368, 372, 376, 380, 381, 384, 388, 392, 398, 400, 404, 408, 410, 412, 414, 417, 418, 422, 426, 428, 430, 434, 438, 440, 442, 446, 450, 454, 457, 458, 459, 461, 462, 466, 470, 473, 474, 478, 480, 484, 488, 490, 492, 496, 498, 499, 500, 504, 508, 512, 516, 520, 524, 527, 528, 530, 532, 533, 536, 540, 548, 554, 558, 562, 566, 568, 570, 574, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 588, 590, 591, 592, 598, 600, 604, 608, 612, 616, 620, 624, 626, 630, 634, 637, 638, 642, 643, 646, 647, 654, 658, 659, 660, 662, 666, 670, 674, 678, 682, 686, 688, 690, 694, 698, 699, 702, 703, 704, 705, 706, 710, 711, 716, 717, 720, 724, 732, 736, 740, 744, 748, 752, 756, 757, 760, 762, 764, 768, 772, 776, 780, 784, 788, 792, 795, 796, 798, 800, 804, 807, 810, 818, 826, 834, 835, 836, 837, 838, 839, 840, 841, 842, 849, 850, 854, 858, 860, 862, 866, 868, 872, 876, 882, 886, 887, 890, 891, 894, 899, 1251, 1381&y=1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006&p=0, 842, 156, 97, 8, 20, 40, 56, 58, 70, 100, 191, 196, 200, 203, 208, 233, 234, 246, 251, 276, 292, 300, 348, 352, 372, 381, 428, 440, 442, 470, 528, 574, 579, 616, 620, 642, 703, 705, 724, 752, 757, 792, 807, 826, 890&rg=1,2&so=9999&qt=n

Note all the & characters. These are field separators. the equal signs set the values.

The above query gets steel (the 72** codes) for a bunch of reporters, for years 97-2006 with various partners (namely World (0) and EU, China, others), for imports and exports.

All express query does is fill out these fields, so if you want to generate the query yourself (with an excel database to store/modify your queries, all you need is the output url string to be correct, as above.

A request

Anyone know python, perl, or another scripting language and want to help me automate gathering data. To the maximum extent possible, I want to automate the data gathering process. I specifically do NOT want to facilitate any broad data ripping. It’s just that I may need to run repeated queries (maybe about 1000) and manually clicking is my least favorite part of the task.

The pseudocode of an API I’d want is as follows:

inputs: fields to get

1) start a browser session that keeps track of cookies, etc.

2) submit a properly formatted request URL to comtrade.

3) follow the link to “direct download”

4) do some initial preprocessing of the file.

My main difficulty is I haven’t figured out how to simulate a browser session (step 1).

Please post comments.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: