Home
-> C tips
C tips for economists
CAVEAT: I am a Unix/Linux guy and have done very little programming
under Windows. A lot of what I say below will still apply if you want
to program in C on a Windows machine, but some of it won't. If you want
to be able to run your code on a remote server or cluster (and you
probably will), it's worth investing in learning how to program in a
Unix/Linux environment. That said, I've heard that the Microsoft
Visual C++ compiler and development environment are OK if you're
happy just running programs on your own computer.
Why use C?
The main reason I use C is that it's a great language for writing very fast
programs. It's also a small enough language that you can fit the whole
thing in your brain and don't have to spend a lot of time combing
through manuals. C is also a general purpose language and you might
find it useful for doing non-scientific programming one day.
What about C++?
I might recommend learning C++ if you don't already know C, but the
learning curve will be a bit steeper. At the end of the day, your code
might be a little cleaner and it might be a little faster to write and
debug, but you might also pay a performance penalty.
What about Fortran?
And if the only kind of programming you will ever do is scientific
programming, I might recommend learning Fortran, even though the
numerical libraries available for C have pretty much caught up to the
Fortran ones. Modern variants (e.g., Fortran95) are quite nice (from
what I've heard) and they've got some interesting support for exploiting
multiple cpu systems. What I DON'T recommend is writing new code in
Fortran77-- that language belongs in a museum.
Buy the book.
If you haven't already, buy yourself a copy of The C Programming
Language by Brian Kernighan and Dennis Ritchie. If you buy it
used, make sure and get the second edition as it describes ANSI C. This
is the standard reference and it's remarkably concise and
well-written.
Learn your compiler.
First, use the best compiler you can on a given machine. The Intel
compilers are very good on both Intel and AMD hardware. Intel gives
there compilers away free for non-commercial use, but unfortunately,
academic use doesn't count. They do discount
heavily for academics and even more for students. If you've got the
money, Pathscale also makes very
good C and Fortran compilers for Linux.
I think it's worth at least skimming the compiler manual and getting an
idea of what things you might try to speed up your code.
Finally, keep your code portable. Pretty often, I run the same code on
several architectures and use lots of different compilers. It sure is
easier when the code is portable.
Exploit parallelism.
If you just can't get your program to run fast enough on a single
computer, there are often some basic ways to parallelize it and run it
on a cluster.
For example, suppose you're using a derivative-based optimizer
and you have K parameters. Usually computing a derivative requires at
least K+1 evaluations of the objective function. And you'll need 2K if
you want more accurate derivatives. These evals can be run in
parallel.
If your objective function isn't smooth enough for a
derivative-based algorithm, I've had good luck with the APPSpack parallel optimizer.
Learn and use a good scientific library.
I like the GNU Scientific
Library quite a bit. It's free, very well-documented, and has lots
of very useful stuff including vectors, matrices, random numbers, and
optimizers. It's way easier to use their stuff than to roll your own.
Use make.
If your program is more than a couple hundred lines, it almost always
makes sense to divide it into multiple files. Make is a great tool for
managing the compilation of your program (or programs). Ben Yoshino has
written a nice tutorial for
the uninitiated.
Use a debugger.
I still tend to put lots of print statements in my programs when I'm
debugging them, but using an actual debugger can often get to the bottom
of a problem much faster. gdb
is simple and works great.
Use profiler.
Before you dive into optimizing your code, save yourself some time and
use a profiler first. You may think you know what parts of your code
are using up all the cpu time, but often, you'll be wrong. I'm kind of
old school, so I use GNU
gprof, and it does what I want. But I'm sure there's even better
stuff out there.
Use a platform-specific math library.
This is very important if you do a lot of matrix math. The standard
library interface for doing simple matrix and vector calculations is
called BLAS and for more complex matrix manipulation, it's LAPACK.
Intel sells a very fast library called MKL that implements BLAS and
LAPACK (and more), and AMD has a similar math library called ACML. If you can't get
ahold of either of these, there's a good free implementation called ATLAS.
The GNU Scientific Library has a wrapper around BLAS so you can call it
on GSL matrices and just link to whatever BLAS library is best for you.
Use integer arithmetic if possible.
It's much faster than working with doubles.
BAD: int a = floor(x/((double) XGRIDSIZE))*XGRIDSIZE;
BETTER: int a = (x/XGRIDSIZE) * XGRIDSIZE;
Pre-compute as much as possible.
Even something that doesn't seem like it would take much time to
compute, can really add up if you're doing it 30 million times.
BAD:
inline int
choice_sector (int choice)
{
if (choice==C_NW) {
return 0;
} else if (choice==C_W1PT || choice==C_W1FT || choice==C_W1OT || choice==C_W1) {
return 1;
} else if (choice==C_W2PT || choice==C_W2FT || choice==C_W2OT || choice==C_W2) {
return 2;
} else {
return 3;
}
}
BETTER:
int G_choice_sectors[] = {0,1,1,1,2,2,2,3,3,3,1,2,3};
#define choice_sector(c) (G_choice_sectors[(c)])
Don't use pow() to square or cube something.
It's way way faster to just multiply the number by itself. Save
pow() for when you need non-integer powers.
BAD:
x2 = pow(x,2.0);
x3 = pow(x,3.0);
BETTER:
#define SQUARE(x) ((x)*(x))
#define CUBE(x) ((x)*(x)*(x))
x2 = SQUARE(x);
x3 = CUBE(x);