I’m working with COMTRADE data. COMTRADE is the main source for trade data in the world. I talked to an IMF data guy and he told me that they use COMTRADE for trade statistics (they run some of their own surveys too, but for general purpose, this is it).
COMTRADE is not the easiest thing to use, though it is pretty easy.
I’m at UC Berkeley, so we have site coverage. If you connect from a campus terminal or AIRBEARS, you should be able to get full access.
You should download files via the “Direct Download” link. I tried SDMX and could not get it to work very well. I am not that good at XML. If someone is, I would love to talk to you to develop a tool to easily extract data (I spent some time using Python XML tools trying to get it to work. I could, but I decided the hurdle was too high). Data downloaded via “direct download” is in CSV format.
Now the kicker. What codes? What classifciation system.
It shows what data is available in each classification for recent years. Note: NOT ALL DATA IS REPORTED IN ALL CLASSIFICATIONS. SITC Rev 3 seems to be the fullest. HS2002 is good, but has less. They don’t downconvert to HS1996. So, if you want a series from 1990-2007, you might be trying to figure out whether you want HS1996 but fewer countries, or to get HS1996 for pre-2001 data and then HS2002 for later data and merging across classifications.
Google search for “RAMON metadata server” to get some correspondence tables between nomenclatures.
I don’t like using their interface, but it is helpful at first.
First, try a simple search with BASIC SELECTION. Look at the query that is produced in the URL. We’ll talk about this below. Write down the codes for countries and codes
Next, try to use the express selection. You have to know already what codes you want. If you look ar the URL above, you can extract the codes you need. I’ve got lists for HS classification so I can browse it in excel or run greps to find the fields I want. For countries, here is a list of codes:
Lastly, take a look at the query. Here is a query I ran:
All reporters, EU countries + world partners
http://comtrade.un.org/db/dqBasicQueryResults.aspx?cc=72, 7201, 7202, 7203, 7206, 7207&px=H1&r=4, 8, 10, 12, 16, 20, 24, 28, 31, 32, 36, 40, 44, 48, 50, 51, 52, 56, 58, 60, 64, 68, 70, 72, 74, 76, 80, 84, 86, 90, 92, 96, 97, 100, 104, 108, 112, 116, 120, 124, 132, 136, 140, 144, 148, 152, 156, 158, 162, 166, 170, 174, 175, 178, 180, 184, 188, 191, 192, 196, 200, 203, 204, 208, 212, 214, 218, 222, 226, 230, 231, 232, 233, 234, 238, 239, 242, 246, 250, 251, 254, 258, 260, 262, 266, 268, 270, 275, 276, 278, 280, 288, 292, 296, 300, 304, 308, 312, 316, 320, 324, 328, 332, 334, 336, 340, 344, 348, 352, 356, 360, 364, 368, 372, 376, 380, 381, 384, 388, 392, 398, 400, 404, 408, 410, 412, 414, 417, 418, 422, 426, 428, 430, 434, 438, 440, 442, 446, 450, 454, 457, 458, 459, 461, 462, 466, 470, 473, 474, 478, 480, 484, 488, 490, 492, 496, 498, 499, 500, 504, 508, 512, 516, 520, 524, 527, 528, 530, 532, 533, 536, 540, 548, 554, 558, 562, 566, 568, 570, 574, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 588, 590, 591, 592, 598, 600, 604, 608, 612, 616, 620, 624, 626, 630, 634, 637, 638, 642, 643, 646, 647, 654, 658, 659, 660, 662, 666, 670, 674, 678, 682, 686, 688, 690, 694, 698, 699, 702, 703, 704, 705, 706, 710, 711, 716, 717, 720, 724, 732, 736, 740, 744, 748, 752, 756, 757, 760, 762, 764, 768, 772, 776, 780, 784, 788, 792, 795, 796, 798, 800, 804, 807, 810, 818, 826, 834, 835, 836, 837, 838, 839, 840, 841, 842, 849, 850, 854, 858, 860, 862, 866, 868, 872, 876, 882, 886, 887, 890, 891, 894, 899, 1251, 1381&y=1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006&p=0, 842, 156, 97, 8, 20, 40, 56, 58, 70, 100, 191, 196, 200, 203, 208, 233, 234, 246, 251, 276, 292, 300, 348, 352, 372, 381, 428, 440, 442, 470, 528, 574, 579, 616, 620, 642, 703, 705, 724, 752, 757, 792, 807, 826, 890&rg=1,2&so=9999&qt=n
Note all the & characters. These are field separators. the equal signs set the values.
The above query gets steel (the 72** codes) for a bunch of reporters, for years 97-2006 with various partners (namely World (0) and EU, China, others), for imports and exports.
All express query does is fill out these fields, so if you want to generate the query yourself (with an excel database to store/modify your queries, all you need is the output url string to be correct, as above.
Anyone know python, perl, or another scripting language and want to help me automate gathering data. To the maximum extent possible, I want to automate the data gathering process. I specifically do NOT want to facilitate any broad data ripping. It’s just that I may need to run repeated queries (maybe about 1000) and manually clicking is my least favorite part of the task.
1) start a browser session that keeps track of cookies, etc.
2) submit a properly formatted request URL to comtrade.
4) do some initial preprocessing of the file.
My main difficulty is I haven’t figured out how to simulate a browser session (step 1).
Please post comments.