Douglas McKee: C tips for economists

Home -> C tips

C tips for economists

CAVEAT: I am a Unix/Linux guy and have done very little programming under Windows. A lot of what I say below will still apply if you want to program in C on a Windows machine, but some of it won't. If you want to be able to run your code on a remote server or cluster (and you probably will), it's worth investing in learning how to program in a Unix/Linux environment. That said, I've heard that the Microsoft Visual C++ compiler and development environment are OK if you're happy just running programs on your own computer.

Why use C?

The main reason I use C is that it's a great language for writing very fast programs. It's also a small enough language that you can fit the whole thing in your brain and don't have to spend a lot of time combing through manuals. C is also a general purpose language and you might find it useful for doing non-scientific programming one day.

What about C++?

I might recommend learning C++ if you don't already know C, but the learning curve will be a bit steeper. At the end of the day, your code might be a little cleaner and it might be a little faster to write and debug, but you might also pay a performance penalty.

What about Fortran?

And if the only kind of programming you will ever do is scientific programming, I might recommend learning Fortran, even though the numerical libraries available for C have pretty much caught up to the Fortran ones. Modern variants (e.g., Fortran95) are quite nice (from what I've heard) and they've got some interesting support for exploiting multiple cpu systems. What I DON'T recommend is writing new code in Fortran77-- that language belongs in a museum.

Buy the book.

If you haven't already, buy yourself a copy of The C Programming Language by Brian Kernighan and Dennis Ritchie. If you buy it used, make sure and get the second edition as it describes ANSI C. This is the standard reference and it's remarkably concise and well-written.

Learn your compiler.

First, use the best compiler you can on a given machine. The Intel compilers are very good on both Intel and AMD hardware. Intel gives there compilers away free for non-commercial use, but unfortunately, academic use doesn't count. They do discount heavily for academics and even more for students. If you've got the money, Pathscale also makes very good C and Fortran compilers for Linux. I think it's worth at least skimming the compiler manual and getting an idea of what things you might try to speed up your code. Finally, keep your code portable. Pretty often, I run the same code on several architectures and use lots of different compilers. It sure is easier when the code is portable.

Exploit parallelism.

If you just can't get your program to run fast enough on a single computer, there are often some basic ways to parallelize it and run it on a cluster. For example, suppose you're using a derivative-based optimizer and you have K parameters. Usually computing a derivative requires at least K+1 evaluations of the objective function. And you'll need 2K if you want more accurate derivatives. These evals can be run in parallel. If your objective function isn't smooth enough for a derivative-based algorithm, I've had good luck with the APPSpack parallel optimizer.

Learn and use a good scientific library.

I like the GNU Scientific Library quite a bit. It's free, very well-documented, and has lots of very useful stuff including vectors, matrices, random numbers, and optimizers. It's way easier to use their stuff than to roll your own.

Use make.

If your program is more than a couple hundred lines, it almost always makes sense to divide it into multiple files. Make is a great tool for managing the compilation of your program (or programs). Ben Yoshino has written a nice tutorial for the uninitiated.

Use a debugger.

I still tend to put lots of print statements in my programs when I'm debugging them, but using an actual debugger can often get to the bottom of a problem much faster. gdb is simple and works great.

Use profiler.

Before you dive into optimizing your code, save yourself some time and use a profiler first. You may think you know what parts of your code are using up all the cpu time, but often, you'll be wrong. I'm kind of old school, so I use GNU gprof, and it does what I want. But I'm sure there's even better stuff out there.

Use a platform-specific math library.

This is very important if you do a lot of matrix math. The standard library interface for doing simple matrix and vector calculations is called BLAS and for more complex matrix manipulation, it's LAPACK. Intel sells a very fast library called MKL that implements BLAS and LAPACK (and more), and AMD has a similar math library called ACML. If you can't get ahold of either of these, there's a good free implementation called ATLAS. The GNU Scientific Library has a wrapper around BLAS so you can call it on GSL matrices and just link to whatever BLAS library is best for you.

Use integer arithmetic if possible.

It's much faster than working with doubles.

BAD:    int a = floor(x/((double) XGRIDSIZE))*XGRIDSIZE;
BETTER:   int a = (x/XGRIDSIZE) * XGRIDSIZE;

Pre-compute as much as possible.

Even something that doesn't seem like it would take much time to compute, can really add up if you're doing it 30 million times.

BAD:
inline int
choice_sector (int choice)
{
    if (choice==C_NW) {
        return 0;
    } else if (choice==C_W1PT || choice==C_W1FT || choice==C_W1OT || choice==C_W1) {
        return 1;
    } else if (choice==C_W2PT || choice==C_W2FT || choice==C_W2OT || choice==C_W2) {
        return 2;
    } else {
        return 3;
    }
}

BETTER:
int G_choice_sectors[] = {0,1,1,1,2,2,2,3,3,3,1,2,3};
#define choice_sector(c) (G_choice_sectors[(c)])

Don't use `pow()` to square or cube something.

It's way way faster to just multiply the number by itself. Save pow() for when you need non-integer powers.

BAD:
x2 = pow(x,2.0);
x3 = pow(x,3.0);

BETTER:
#define SQUARE(x) ((x)*(x))
#define CUBE(x) ((x)*(x)*(x))

x2 = SQUARE(x);
x3 = CUBE(x);