Economics should be open

December 16, 2008

Data cleaning, excel

Filed under: coding, Excel, Open Source, Uncategorized — Tags: , , , — howardchong @ 10:17 pm

Ever have data with commas between the thousand and million marks? spaces at the end of numbers? text footnotes appended to numbers, spaces in numbers?

Cleaning this by hand is a complete pain, so I wrote an Excel Macro.

To install, record a new macro and then stick this in the module.

There’s even a “careful” mode. (Change the line of careful=false to careful=true) which prompts you for each change that I felt might be wrong.

Please write a comment if you find it helpful.



The code:

IMPORTANT: click “view source” and copy the text from there. WordPress is cutting off some of the lines.

Sub Convert_to_numbers_blankednums()
    lb = Chr$(13) & Chr$(10) ' linebreak

    countchanges = 0
    For Each mycell In Selection
        nowvalue = mycell.Value
        ' mycell.Value = CInt(Substitute(mycell.Value, " ", ""))
        'get rid of space

        'Set the pattern by using the Pattern property.
        Pattern = " "
        nowvalue = Replace(nowvalue, Pattern, "")

        nowvalue1 = nowvalue

        Pattern = ".."
        nowvalue = Replace(nowvalue, Pattern, "")

        nowvalue2 = nowvalue
        founddots = nowvalue2 <> nowvalue1

        'get rid of numbers in parentheses (footnotes)
        Pattern = "("
        parenindex = InStr(nowvalue, Pattern)
        If parenindex <> 0 Then nowvalue = Left(nowvalue, parenindex - 1)

        'get rid of alpha characters
        Dim runningval, str
        runningval = ""

        For i = 1 To Len(nowvalue)
          mychar = Mid(nowvalue, i, 1)
          If IsNumeric(mychar) Or mychar = "." Then
            runningval = runningval + mychar
          End If
        Next i
        nowvalue = runningval

        careful = False
        If careful = True Then
            If nowvalue <> nowvalue1 Then
                response = MsgBox("Replace " & lb & nowvalue1 & lb & " with " & lb & nowvalue, vbYesNoCancel)
                If response = vbYes Then
                    mycell.Value = nowvalue
                    countchanges = countchanges + 1
                ElseIf response = vbNo Then
                    mycell.Value = nowvalue1
                    mycell.Interior.Pattern = xl
                ElseIf response = vbCancel Then Exit For
                End If
                mycell.Value = nowvalue
            End If
            mycell.Value = nowvalue
            If nowvalue <> nowvalue1 Then countchanges = countchanges + 1
        End If

        If founddots Then
            mycell.Value = "."
        End If


    MsgBox ("Done!" & lb & "Parens and alpha changes made: " & countchanges)

End Sub

1 Comment »

  1. Anyone know of a better way to include code in a blog post? I tried the or preformatted tag, but the code was cut off due to the narrow columns.

    Comment by howardchong — December 16, 2008 @ 10:22 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at

%d bloggers like this: