I use non-Stata editors for do files because they look better an integrate better into my workflow. One editor I use is Sublime Text 2.
WHY I USE SUBLIME TEXT 2
I use this because other programmers told me to use it. It’s great for HTML, Ruby, Python, etc. It’s not free, but there is a free evaluation version. It just bugs you in increasing frequency if you don’t register.
The killer feature for me is “Find in Files” CTRL+SHIFT+F. It lets me search all my do files for a command, gives me the results in a color coded file, and then I can double click to go to that file.
I also like that it is white text on black background. My eyes don’t like black text on white background. In fact, I think my preference is for yellow on black (old IBM) or cyan on black (old DOS days.)
SYNTAX HIGHLIGHTING, HOWTO
Out of the box, Sublime Text 2 does not do syntax highlighting. So, you have to add it yourself.
The guide to do it is sorta here: http://bylr.net/3/2010/10/stata-bundle-for-textmate/
Except, that’s for textmate. Luckily, textmate uses the same syntax definitions. (I love programmers and reuse of code).
So here are the steps:
- download the file for textmate from http://bylr.net/3/2010/10/stata-bundle-for-textmate/ and unzip
- navigate to Syntaxes\stata.tmLanguage
- Save this file in your user directory. For example, on Windows, my user directory is here: “C:\Users\howardchong\AppData\Roaming\Sublime Text 2\Packages\User”
- NOTE: If you can’t find your user directory, got to Sublime Text and do CTRL+` and then type: “sublime.packages_path()”. This will give you almost to the right path. Go to the “User” subdirectory and that’s the right path.
Text highlighting should now be enabled.
(NOTE: If this doesn’t work, you may also have to add package development. Follow the instructions here: http://sublimetext.info/docs/en/extensibility/syntaxdefs.html)
Here are some tips for running large stata jobs in linux/Unix
I gathered shape files for california and produced the following spatial correspondence table:
CEC (California Energy Commission) Climate Zones (numbering 16).
CA counties (polygons)
I used this by using the ArcTool called INTERSECT.
I don’t have a great way of uploading the data, so if you want this data, help me upload it and it will be available to you (and everyone) for free.
Use this info here to check if a file exists in your stata code. Uses CAPTURE and return codes (_rc)
So, I’m having lots of “fun” with GIS right now.
I’m having to map zip9 (AKA zip+4, zip 9, zip5+4) in California to census block groups (and census tracts) (CBG / CT) and then to latitude and longitude.
Use of “robust” after regress in Stata seems to be automatic. From the textbooks, robust is more asymptotically efficient, and there is only a small hit for not assuming homoskedasticity.
There is one problem I am encountering, and that is in small samples. If your coefficient of interest has very little variation, be careful. Especially when measuring treatment effects where you have very few or very many treated observations.
I do not like the bult in stata editor. It makes reading stata do files a chore. I come a bit from the programming world which will show commented lines and blocks in a different color and highlight reserved words. I looked for al alternative stata text editor / do file editor and like Notepad++.
Notepad++ is a good alternative. You can still run blocks of code (like control-D) and who do file (like control-R) if you set it up. Plus it’s free.
I hear the phrase “what does it look when we weight the data” a lot. It confused me for a while, but I figured it out: it could mean two things, so the response should be, which of the two do you want?
Weighted Least Squares and weighted average are opposite concepts, in a sense.
I was looking for a list of power plants in Europe in 2008. I didn’t find one. You know why? It just got created in late 2008, and I just found it in 2009.
More beta below the bump.
I’m trying to figure out which open source statistical/computation package to use. I used to use Matlab. It’s good, but expensive, and it has WAY more features than I need.
I know I should be running things on Unix, but right now I’m on Windows XP. I sometimes putty into a Unix server and run things.
R looks very good. That’s my next langauge to learn.
Octave is pretty good. It provides syntax almost identical to Matlab. In 3.0, it now has support for Multidimensional Cell Arrays. These are arrays that can hold any data type. Most common for me is an array of strings. If you load data that is mixed text and numeric, then your data will probably be read as a cell-array.
One thing I have noticed is that the cell-arrays are really quite slow.
I had a ~10000 x 10 csv file.
Column 1 had mixed numeric and strings. They were 6 character codes, and about 2/3 of them did not have alphabetical characters. I needed to convert these to strings, and then do a sort and some other processing. I basically had to traverse each element of the first row and do the datatype change individually.
The process was VERY slow. In fact, I think Excel would be better at such tasks.
Here are a few tips:
- If you can, remove all strings from your CSV file.
- If you read a large dataset as a large cell arrays, separate each column into its own variable. Then pack together the numeric data into a matrix (if needed).
- STATA has an “encode” routine that converts strings into records stored as numeric. For example, if your data range is car makes, it will give each make a number and then also generate a lookup table where you can decipher what the numbers mean.
Also check out this page that benchmarks the math/science packages with a set of standard routines: