While working on my Google Summer of Code project today, I came across a bug that pretty much halted my productivity for the day.
Early on in my project, I decided that working with Unicode is hard, among other things. Since I was restricted to using C, I had to find a way to easily manipulate Unicode stuff, and I came across GLib (I’m not even entirely sure how, I think I just remember other projects using it, and decided to look it up.)
Not only did it have Unicode handling stuff, it also provides a bunch of convenient things like a linked list implementation, memory allocation, etc. All in a way intended to be cross-platform compatible, since this is the thing that’s used to power Gtk.
I’m not entirely sure how it differs from Apache’s Portable Runtime (APR); maybe it’s even a Not Invented Here syndrome. In any case, I, not suffering from that particular affliction, decided to be lazy and re-use existing code.
For some reason, GLib’s g_slice_alloc() call was failing. For those of you that don’t know what this does, it is similar to malloc() in standard C – it allocates a chunk of memory and returns it to you, so that you can make use of dynamic memory allocation, rather than everything just being auto variables. In particular, it means you can be flexible and allocate as much or as little memory as you need.
So I spent the entire day trying to figure out why my program was segfaulting. Looking at the output of gdb (the GNU Debugger), the backtrace showed that it was crashing at the allocation statement. No way, I thought, that test is so well-tested, it must be a problem with the way I’m using it.
I changed the code to use malloc() instead of g_slice_alloc(), and the program started crashing right away, rather than after four or five executions with g_slice_alloc(). Not exactly useful for debugging.
I mentioned my frustration with C on the Debian Perl Packager Group channel, and a friend from the group, Ryan Niebur took a look at the code (accessible via a public repository). After a bit of tinkering, he determined that the problem was that I was using g_slice_alloc instead of g_slice_alloc0, which automatically zeroes memory before returning it.
It stopped the crashing and my program works as intended. I’m still left totally puzzled as to why this bug was happening, and I’m not sure if malloc isn’t supposed to be used with structs, or some other limitation like that.
But thanks the magic of open source and social coding/debugging, the expertise of many contribute to the success of a single project. It’s such a beautiful thing.
Update: There were a lot of questions and comments, mainly relating to the fact that malloc and friends return chunks of memory that may not have been zeroed.
Indeed, this was the first thing I considered, but the line it happened to be crashing on was a line that pretty much just did g_slice_alloc, rather than any of the statements after that.
For those that are curious, all of the code is visible in the public repository.
I do realize that the fixes that have been made are pretty much temporary and that they are probably just masking a bigger problem. However, I’m at a loss for the issue is. Hopefully the magic of open source will work for me again, and one of the many people who have commented will discover the issue.