While working on my Google Summer of Code project today, I came across a bug that pretty much halted my productivity for the day.
Early on in my project, I decided that working with Unicode is hard, among other things. Since I was restricted to using C, I had to find a way to easily manipulate Unicode stuff, and I came across GLib (I’m not even entirely sure how, I think I just remember other projects using it, and decided to look it up.)
Not only did it have Unicode handling stuff, it also provides a bunch of convenient things like a linked list implementation, memory allocation, etc. All in a way intended to be cross-platform compatible, since this is the thing that’s used to power Gtk.
I’m not entirely sure how it differs from Apache’s Portable Runtime (APR); maybe it’s even a Not Invented Here syndrome. In any case, I, not suffering from that particular affliction, decided to be lazy and re-use existing code.
For some reason, GLib’s g_slice_alloc() call was failing. For those of you that don’t know what this does, it is similar to malloc() in standard C – it allocates a chunk of memory and returns it to you, so that you can make use of dynamic memory allocation, rather than everything just being auto variables. In particular, it means you can be flexible and allocate as much or as little memory as you need.
So I spent the entire day trying to figure out why my program was segfaulting. Looking at the output of gdb (the GNU Debugger), the backtrace showed that it was crashing at the allocation statement. No way, I thought, that test is so well-tested, it must be a problem with the way I’m using it.
I changed the code to use malloc() instead of g_slice_alloc(), and the program started crashing right away, rather than after four or five executions with g_slice_alloc(). Not exactly useful for debugging.
I mentioned my frustration with C on the Debian Perl Packager Group channel, and a friend from the group, Ryan Niebur took a look at the code (accessible via a public repository). After a bit of tinkering, he determined that the problem was that I was using g_slice_alloc instead of g_slice_alloc0, which automatically zeroes memory before returning it.
It stopped the crashing and my program works as intended. I’m still left totally puzzled as to why this bug was happening, and I’m not sure if malloc isn’t supposed to be used with structs, or some other limitation like that.
But thanks the magic of open source and social coding/debugging, the expertise of many contribute to the success of a single project. It’s such a beautiful thing.
Update: There were a lot of questions and comments, mainly relating to the fact that malloc and friends return chunks of memory that may not have been zeroed.
Indeed, this was the first thing I considered, but the line it happened to be crashing on was a line that pretty much just did g_slice_alloc, rather than any of the statements after that.
For those that are curious, all of the code is visible in the public repository.
I do realize that the fixes that have been made are pretty much temporary and that they are probably just masking a bigger problem. However, I’m at a loss for the issue is. Hopefully the magic of open source will work for me again, and one of the many people who have commented will discover the issue.
malloc works with structs. The sentence “After a bit of tinkering, he determined that the problem was that I was using g_slice_alloc instead of g_slice_alloc0 , which automatically zeroes memory before returning it.”
gives a strong hint that the problem was not in library functions but in your code. If you want initialized memory, you have to either initialize it yourself after malloc or use calloc.
Indeed. I know that it must be something weird with memory, and that it’s probably due to how I’m using it – I just can’t figure out how.
Not to make an opinion out of thin air but if your problem is that your memory was not being zeroed out on allocation (what calloc is to malloc on standard C) it is very likely the issue is actually that you are using the structure before populating it.
The code that uses it (and glib itself probably) do check for null pointers so if you zero out the memory first everything works OK. But if you don’t, when the allocated structure has garbage that points out of your segment you get the dreaded segfault.
Why did it fail with malloc right away? I have no idea but I assume you can try duplicating the issue running the whole thing with valgrind and you’ll figure it out soon enough (changing the memory policy routines can normally help spot these kind of issues)
Good luck with your GSoC!
I did run my code through valgrind (memcheck); there were some issues with memory leaks due to forgetting to free memory after allocating. I was able to fix those, though there is still 1 piece of memory definitely lost for which I have been unable to determine the source.
On the other hand, I need to work on completing more feature additions, so the real memory debugging will need to wait until later. Hopefully my program isn’t too complex by then!
Thanks for your comment!
Hi,
if your program was crashing and it magically went away after initializing memory with zero, chances are that you accessed a value of the struct and treated it as a pointer or something like that.
Generally, you should use calloc instead of malloc when you expect the data to be initialized with zeros.
And, no, malloc has no limitations for use with structs. Consider the following examples, which are perfectly valid:
struct foo {
char *bar;
}
struct foo *test1 = malloc(sizeof(struct foo));
/* undefined, will probably crash sometimes */
printf(“test1 = %s\n”, test1->bar);
struct foo *test2 = calloc(1, sizeof(struct foo));
/* prints test2 = (null) */
printf(“test2 = %s\n”, test2->bar);
struct foo *test3 = malloc(sizeof(struct foo));
test3->bar = NULL;
/* equivalent to test2 */
printf(“test3 = %s\n”, test3->bar);
struct foo *test4 = malloc(sizeof(struct foo));
test4->bar = strdup(“foobar”);
/* here, initializing with zero is not necessary */
printf(“test4 = %s\n”, test4->bar);
Best regards,
Michael
Sounds like your code uses some memory before initialising it. GCC should warn about that.
Unfortunately, despite turning on all warnings (-Wall) and a few other flags trying to make the gcc as pedantic as possible, it still didn’t emit any useful warnings.
Well, I guess that you were using the returned memory in some way, interpreting it. Now if the freshly allocated memory was not explicitly zeroed, you (or one of the libs/functions you used) might have interpreted some of its content as a pointer. Not having seen you code or the library you use (I’m not exactly any active programmer though I know most concepts and many pitfalls in using C), I can’t really tell. But the problem I describe is one of the most common ones with dynamic allocation.
BTW: Where is your code? You mention it is in a public repository, but not where that is.
malloc() doesn’t set allocated memory to zeroes, just allocates it. You can use calloc() for that, which will internally malloc() and then fill all the allocated memory with zero-bytes. Don’t know why you say malloc() is not meant to be used with structs… of course you can use it, but remember that if you use it then you may need to initialize the contents of the struct.
Also, g_slice_alloc() can be forced to use malloc() internally by exporting G_SLICE_ALLOC=always-malloc before running your program. If you don’t do that, g_slice_alloc() and g_slice_alloc0 will use Glib’s slice-allocator, as default.
Cheers
Can you point me to your code repository..I can take a look..
Hi! Thanks for the offer. I’ve updated my article with the subversion repository url, so feel free to take a look at it.
Without seeing your code, I can’t tell you what the bug is, but in C, the contents of newly allocated memory (whether on the stack or malloced) are undefined. They can be anything. It’s up to you to initialize all of the data fields in your struct.
This is also true of C++, where you write a constructor to initialize all of the data fields, and that constructor is called automatically. Only in C, it’s not called automatically — you have to write your own function and call it yourself.
g_slice_alloc0 is probably the wrong solution to your problem. It’s probably hiding the real bug.
If you allocate a struct with malloc(), the memory is not guaranteed to be all zero. So if you allocate a struct, then fill in only some fields – don’t forget, some fields might be private and undocumented – then the ones you didn’t fill in could be set to arbitrary values, which can then cause arbitrary segfaults in whatever piece of library code happens to use those values. It’s okay to malloc() and then memset() the struct to 0.
Sorry if I’m totally off base ;) This happened to me a while back with aio_read(), though.
It seems to me that you’re reading some memory before writing it. That’s why zeroing it with g_slice_alloc0 makes a difference. Maybe it’s a pointer that you check for null, but with malloc it has some value. Have you checked that?
It seems to me that you’re reading some memory before writing it. That’s why zeroing it with g_slice_alloc0 makes a difference. Maybe it’s a pointer that you check for null, but with malloc it has some value. Have you checked that?
I feel it likely that you are dereferencing a pointer from your allocated structure without previously initializing it. Since it is uninitialized data it may contain any invalid or valid memory address and dereferencing it generates a segfault when it goes out of bounds. By switching to the alloc0 version this pointer is getting set to all zeros and your code is recognizing it as a null pointer and avoiding the dereference and masking the issue for that special case.
Memory problems can be quite hard. But with some persistence the issues can be found and addressed. Good luck!