Feeds:
Posts
Comments

Posts Tagged ‘Computer Science’

Last year, I had a great time participating in the Google Summer of Code with the Debian project. I had a neat project with some rather interesting implications for helping developers to package and maintain their work. It’s still a work-in-progress, of course, as many projects in open source are, but I was able to accomplish quite a bit and am proud of my work. I learned quite a bit about coding in C, working with Debian and met some very intelligent people.

My student peers were also very intelligent and great to learn from. I enjoyed meeting them virtually and discussing our various projects on the IRC channel as the summer progressed and the Summer of Code kicked into full swing. The Debian project in particular also helps arrange travel grants for students to attend the Debian Conference (this year, DebConf10 is being held in New York City!). DebConf provides a great venue to learn from other developers (both in official talks but also unofficial hacking sessions). As the social aspect is particularly important to Debian, DebConf helps people meet those with whom they are working with the most, thereby creating lifelong friendships and making open source fun.

I have had several interviews for internships, and the bit of my work experience most asked about is my time doing the Google Summer of Code. I really enjoyed seeing a project go from the proposal stage, setting a reasonable timeline with my mentor, exploring the state of the art, and most importantly, developing the software. I think this is the sort of indispensible industry-type experience we often lack in our undergrad education. We might have an honours thesis or presentation, but much of the work in the Google Summer of Code actually gets used “in the field.”

Developing software for people rather than for marks is significant in a number of ways, but most importantly it means there are real stakeholders that must be considered at all stages. Proposing brilliant new ideas is important, however, without highlighting the benefits they can have for various users, the reality is that it simply will not gain traction. Learning how to write proposals effectively is an important skill and working with my prospective mentor (at the time – he later mentored my project once it was accepted) to develop mine was tremendously useful for my future endeavours.

The way I see the Google Summer of Code is, in many ways, similar to an academic grant (and the stipend is about the same as well). It provides a modest salary (this year it’s US$5000) but more importantly, personal contact with a mentor. Mentors are typically veterans in software development or the Debian project and act in the same role as supervisors for post-graduate work: they help monitor your progress and propose new ideas to keep you on track.

The Debian Project is looking for more students and proposals. We have a list of ideas as well as application instructions available on our Wiki. As I will be going on internship starting May, I have offered to be a mentor this year. I look forward to seeing your submissions (some really interesting ones have already begun to filter in as the deadline approaches).

Advertisements

Read Full Post »

One of the most often overlooked–yet arguably most important–issues in software development is copyright and licensing of works. In particular, I will discuss how this affects the open source software community with relevance to the Debian project.

As with any artistic or creative works, software is protected by copyright and its use is often governed by some sort of license. Please note that I am not a lawyer and I am not qualified to give legal advice, so take my suggestions with a grain of salt and please do leave a comment if you know something that I don’t.

A license is a legal contract that permits end users use of software under agreed-upon guidelines. In the open source community, licenses protect the integrity of free software by ensuring that they continue to remain freely available. For example, the GNU General Public License (GPL) stipulates that any derivative works of GPL-licensed code must distribute source code back to the community, which enables a two-way sharing of information between the originating software developers and the others who benefit from their work. Other licenses, such as the BSD License, are more liberal and do not have this restriction, but do have a disclaimer of warranties which shields authors from unintended legal consequences of their work.

Though licensing is probably the most important document detailing the relationship of the supplier (software developer or team) and other users, it cannot mean anything without copyright. In general, it is most useful to provide a copyright statement somewhere in resulting packages. A copyright statement is what allows authors to assert a particular license in the first place.

Moreover, license terms can only be changed when all copyright holders agree to the change. Unless you are explicit in your copyright conditions in the beginning, this can lock your project in to an undesirable license.

To make matters even more complicated, the Berne Convention for the Protection of Literary and Artistic Works (or simply Berne Convention as it’s most often called) describes a mechanism by which copyright is automatically in force upon creation of a work, even if the author does not explicitly assert it. For software, this effectively means that anyone who contributes any code is automatically the copyright holder on their contribution, which means that things quickly get complicated when there are many authors and contributors involved.

In Debian, we cannot and do not distribute software without knowing copyright information (including years of copyright, names, e-mail addresses where people can be reached, or a web site in the case of an incorporated entity). This is pursuant to the Debian Free Software Guidelines (DFSG), which require that we distribute only “free” software in our main repository–it’s part of our Social Contract.

In this regard, I would make the following recommendations:

  1. When beginning any project (open source or not), include a copyright statement immediately. It will eventually become a force of habit; this is a very good thing, and will pay dividends in the future.
  2. Establish a policy whereby contributors are asked to assign you copyright of their work; make a note of this somewhere in your documentation. Better yet, if you are part of an incorporated entity, assign copyright to that entity.
  3. Be explicit about your licensing terms and make sure to include copies of the license with your software. This helps to resolve ambiguities where there are several derivatives of a license (occasionally, developers license software under the BSD License without specifying which version they mean)
  4. Be wary of the “Public Domain” — this is an even more contentious issue than choosing an appropriate license. It is probably preferable to use a non-restrictive license such as the aforementioned BSD License (and its variants) or the MIT/X11 license, which is even more permissive.

Read Full Post »

Okay, so this is a long-awaited follow-up to my first post on the topic of  Debian Perl Packaging. Some of you might note I was pretty extreme in the first post, which is partially because people only really ever respond to extremes when they’re new to things. When you first begin programming, the advice you get is “hey, never use goto statements” — but as your progress in your ability and your understanding of how it works, what it’s actually doing in the compiler — then it might not be so bad after all. In fact, I hear the Linux kernel uses it extensively to provide Exceptions in C. The Wikipedia page on exception handling in various languages shows how to implement exceptions in C using setjmp/longjmp (which is essentially a goto statement). But I digress.

Back to the main point of this writeup. Previously I couldn’t really think of cases where packaging your own modules is really all that useful, especially when packaging them for Debian means that you benefit many communities — Debian, Ubuntu, and all of the distributions that are based on those.

During a discussion with Hans Dieter Piercey after his article providing a nice comparison between dh-make-perl and cpan2dist. (Aside: I feel like he was slightly biased toward cpan2dist in his writeup, but I’m myself biased toward dh-make-perl, so he might be right, even though I won’t admit it.)

I’m really glad for that article and the ensuing dialog, because it really got people talking about what they use Debian Perl packages for, and where it is useful to make your own.

Firstly, if you’ve got an application that depends on some Perl module that isn’t managed by Debian, but you need it yesterday, then you can either install that module via CPAN or roll your own Debian package. The idea here is to make and install the package so you can use it, but also file a Request For Package bug at the same time — see the reportbug command in Debian, or use LaunchPad if you’re on Ubuntu. This way, when the package is officially released and supported, you can move to that instead, and thus get the benefits of automatic upgrades of those packages.

Secondly, if you’ve got an application that depends on some internally-developed modules, then they probably wouldn’t exist on CPAN (some call this Perl code part of the DarkPAN), except in the rare case that a company open sources their work. But corporations will never open source all of their work, even if they consider the implications of providing some of it to the open source community, so at some point or another you’ll need to deal with internal packages. Previously, the best way to handle this was to construct your own CPAN local mirror, and have other machines install and upgrade from it — thus your internal code is easily distributed via the usual mechanism.

One of the advantages of using CPAN to distribute things is that it’s available on most platforms, builds things and runs tests automatically on many platforms. CPANPLUS will even let you remove packages, which was one of the main reasons I am so pro-Debian packages anyway. However, it does mean you’ll need to rebuild the package on other systems, which is prone to failures that cost time and money to track down and fix. CPAN and CPANPLUS are the Perl tradition of distributing packages.

If you are using an environment mostly with Debian systems, however, you may benefit from using a local Debian repository. This way, you only need to upgrade packages in your repository, and they’ll be automatically upgraded along with the rest of your operating system (you do run update and upgrade periodically right?). There is even the fantastic aptcron program to automate this, so there’s really no excuse not to automatically update.

In either case, creating a local package means you will be able to easily remove anything you no longer need via the normal package management tools. You can also distribute the binary packages between machines — though it sometimes depends on the platform (for modules that incorporate C or other platform-specific code that needs to be rebuilt). Generally, most Perl modules are Pure Perl, and thus you can compile and test it once, on one machine, and distribute it to other ones simply by installing the .deb package on other machines. You can copy packages to machines and use dpkg to install them, or better yet, create a local Debian mirror so it’s done automatically and via the usual mechanism (aptitude, etc.)

In conclusion: if you’re going to make your own Debian packages, do so with caution, and be aware of all the consequences (positive and negative) of what you’re doing. As always, a real understanding of everything is necessary.

Read Full Post »

Okay, so allow me to explain my personal development platform. I use Windows XP Professional as my primary operating system for various reasons, but I develop software for Linux (mostly because I like Linux, but also because I intend for a lot of my software to be used on my own servers, all of which can be considered Unix-like).

Because I prefer Windows for daily use, I’ve built various tools around the operating system and have become quite attached to them. I really like TortoiseSVN and Notepad++, for example. I’ve just gotten used to running Windows.

Anyway, I wanted to see if I could transition into using Linux for everything, and because I work in Debian and have servers running Debian, I installed Debian. I had some trouble at first, prompting me to change to Kubuntu, but later came back to Debian after realizing that the packages in unstable and testing that I’d come to love were under an entirely different process in Ubuntu, which is currently unknown to me.

So long Ubuntu, I’ve re-installed Debian and love it so far. One problem I’ve noticed is that Debian would not boot inside VMware Workstation under Windows, while Kubuntu was able to do so out-of-the-box. Instead, I got an error like this:

WARNING bootdevice may be renamed. Try root=/dev/hda2
Gave up waiting for root device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/sda2 does not exist. Dropping to a shell!

Looking into it further, it turned out I could edit the command line options to use a different device name–the debug output was right, I had to use /dev/hda2 instead of /dev/sda2–and things would work. The system would boot up normally, and everything was good. I guess it had to do with my drives actually being SATA (which I suppose are treated like SCSI drives); whereas VMware Workstation (in my setup) presents them as simple IDE drives.

But then I got to wondering how Ubuntu totally got around this problem, and it turns out that Ubuntu uses UUID identifiers to point to a hard drive. After searching on Google, it turns out that using UUIDs is convenient because it can handle cases where drives are removed; especially with external drives, or even re-arranging the drives inside your computer for whatever reason.

Using the command “blkid” from Debian (do this as root), you can get the UUID of all the partitions. You can either reboot into Linux using your normal dual-boot method (selecting the option from GRUB, etc), or you can edit the boot options from GRUB explictly to use /dev/hda2 temporarily. Once you boot into your Debian installation, you can edit /boot/grub/menu.lst to use drive UUIDs instead of /dev names.

Instead of:

# kopt=root=/dev/hda1 ro

or similar, you can change this to:

# kopt=root=UUID=<info from blkid for /dev/hda1> ro

Then you can use ‘update grub’ to update grub’s automagic menu.lst information, so that the new root is recognized. This way, however your disk is installed, the UUID will be detected and the appropriate drive mounted/booted. I haven’t tried this, but I suppose this would help if you have an operating system on a bootable USB drive or something, to help you select those partitions (though I don’t know if GRUB will even detect them on a boot. I’m hoping so since most BIOSes these days support booting from USB Mass Storage Media).

Good luck, I hope this helps somebody out there :-)

Read Full Post »

One thing that makes Perl different from many other languages is that it has a rather small collection of core commands. There are only a few hundred commands in Perl itself, so the rest of its functionality comes from its rich collection of modules,  many of which are distributed via the Comprehensive Perl Archive Network (CPAN).

When CPAN first came on the scene, it preceded many modern package management systems, including Debian’s Advanced Packaging Tool (APT) and Ruby’s gem system, among others. As a consequence of its rich history, the CPAN Shell is relatively simplistic by today’s standards, yet still continues to get the job done quite well.

Unfortunately, there are two issues with CPAN:

  1. Packages are distributed as source code which is built on individual machines when installing or upgrading packages.
    • Since packages must be re-built on every machine that installs it, the system is prone to breaking and wastes CPU time and other resources. (The CPAN Testers system is a great way module authors can try to mitigate this risk, though.)
    • Due to wide variation in packages, many packages cause problems with the host operating system in terms of where they install files, or expect them to be installed. This is because CPAN does not (and cannot) know every environment that packages will be installed on.
  2. It does not integrate nicely with package managers
    • The standard CPAN Shell is not designed to remove modules, only install them. Removals need to be done manually, which is prone to human error such as forgetting to clean up certain files, or breaking other installs in the process.
    • It cannot possibly know the policies that govern the various Linux flavours or Unices. This means that packages might be installed where users do not expect, which violates the Principle of Least Surprise.
    • It is a separate ecosystem to maintain. When packages are updated via the normal means (eg, APT), packages installed via CPAN will be left alone (ie, not upgraded).

Here is the real problem: packages installed via CPAN will be left alone. This means that if new releases come out, your system will retain an old copy of packages, until you get into the CPAN Shell and upgrade it manually. If you’re administrating your own system, this isn’t a big problem — but it has significant implications for collections of production systems. If you are managing thousands of servers, then you will need to run the upgrade on each server, and hope that the build doesn’t break (thus requiring your, or somebody else’s, intervention).

One of the biggest reasons to select Debian is because of one of its primary design goal: to be a Universal Operating System. What this means is that the operating system should run on as many different platforms and architectures as possible, while providing the same rich environment to each of them to the greatest extent possible. So, whether I’m using Debian GNU/Linux x86 or Debian GNU/kFreeBSD x64, I have access to the same applications, including the same Perl packages. Debian has automated tools to build and test packages on every architecture we support.

The first thing I’m going to say is: if you are a Debian user, or a user of its derivatives, there is absolutely no need for you to create your own packages. None. Just don’t do it; it’s bad. Avoid it like the goto statement, mmkay?

If you come across a great CPAN package that you’d really like to see packaged for Debian, then contact the Debian Perl Packagers (pkg-perl) team, and let us know that you’d like a package. We currently maintain well over a thousand Perl packages for Debian, though we are by no means the only maintainers of Perl packages in Debian. You can do this easily by filing a Request For Package (RFP) bug using the command: reportbug wnpp.

On-screen prompting will walk you through the rest, and we’ll try to package the module as quickly as possible. When we’re done, you’ll receive a nice e-mail letting you know that your package has been created, thus closing the bug. A few days of waiting, but you will have a package in perfect working condition as soon as we can create it for you. Moreover, you’re helping the next person that seeks such a module, since it will already be available in Debian (and in due time it will propagate to its derivatives, like Ubuntu).

All 25,000+ Debian packages meet the rigorous requirements of Debian Policy. The majority of them meet the Debian Free Software Guidelines (DFSG), too; the ones which are not considered DFSG-free are placed in their own repository, separate from the rest of packages. A current work in progress is machine-parseable copyright control files, which will hopefully provide a way for administrators to quickly review licensing terms of all the software you install. This is especially important for small- and medium-sized businesses without their own intellectual property legal departments to review open source software, which is something that continues to drive many businesses away from using open source.

For the impatient, note this well: packages which are not maintained by Debian are not supported by Debian. This means that if you install something using a packaging tool (we’ll discuss these later) or via CPAN, then your package is necessarily your own responsibility. In the unlikely event that you totally break your system installing a custom package, it’s totally your fault, and it may mean you will have to restore an earlier backup or re-install your system completely. Be very careful if you decide to go this route. A few days waiting to ensure that your package will work on every platform you’re likely to encounter is worth the couple days of waiting for a package to be pushed through the normal channels.

The Debian Perl Packaging group offers its services freely to the public for the benefit of our users. It is much better to ask the volunteers (preferably politely) to get your package in Debian, so that it passes through the normal testing channels. You really should avoid making your own packages in a vacuum; the group is always open to new members, and it means your package will be reviewed (and hopefully uploaded into Debian) by our sponsors.

But the thing about all rules is that there are always exceptions. There are, in fact, some reasons when you might want to produce your own packages. I was discussing this with Hans Dieter Pearcey the other day, and he has written a great follow-up blog post about the primary differences between dh-make-perl and cpan2dist, two packaging tools with a similar purpose but very different design goals. Another article is to follow this one, where I will discuss the differences between the two.

Read Full Post »

While working on my Google Summer of Code project today, I came across a bug that pretty much halted my productivity for the day.

Early on in my project, I decided that working with Unicode is hard, among other things. Since I was restricted to using C, I had to find a way to easily manipulate Unicode stuff, and I came across GLib (I’m not even entirely sure how, I think I just remember other projects using it, and decided to look it up.)

Not only did it have Unicode handling stuff, it also provides a bunch of convenient things like a linked list implementation, memory allocation, etc. All in a way intended to be cross-platform compatible, since this is the thing that’s used to power Gtk.

I’m not entirely sure how it differs from Apache’s Portable Runtime (APR); maybe it’s even a Not Invented Here syndrome. In any case, I, not suffering from that particular affliction, decided to be lazy and re-use existing code.

For some reason, GLib’s g_slice_alloc() call was failing. For those of you that don’t know what this does, it is similar to malloc() in standard C – it allocates a chunk of memory and returns it to you, so that you can make use of dynamic memory allocation, rather than everything just being auto variables. In particular, it means you can be flexible and allocate as much or as little memory as you need.

So I spent the entire day trying to figure out why my program was segfaulting. Looking at the output of gdb (the GNU Debugger), the backtrace showed that it was crashing at the allocation statement. No way, I thought, that test is so well-tested, it must be a problem with the way I’m using it.

I changed the code to use malloc() instead of g_slice_alloc(), and the program started crashing right away, rather than after four or five executions with g_slice_alloc(). Not exactly useful for debugging.

I mentioned my frustration with C on the Debian Perl Packager Group channel, and a friend from the group, Ryan Niebur took a look at the code (accessible via a public repository). After a bit of tinkering, he determined that the problem was that I was using g_slice_alloc instead of g_slice_alloc0, which automatically zeroes memory before returning it.

It stopped the crashing and my program works as intended. I’m still left totally puzzled as to why this bug was happening, and I’m not sure if malloc isn’t supposed to be used with structs, or some other limitation like that.

But thanks the magic of open source and social coding/debugging, the expertise of many contribute to the success of a single project. It’s such a beautiful thing.

Update: There were a lot of questions and comments, mainly relating to the fact that malloc and friends return chunks of memory that may not have been zeroed.

Indeed, this was the first thing I considered, but the line it happened to be crashing on was a line that pretty much just did g_slice_alloc, rather than any of the statements after that.

For those that are curious, all of the code is visible in the public repository.

I do realize that the fixes that have been made are pretty much temporary and that they are probably just masking a bigger problem. However, I’m at a loss for the issue is. Hopefully the magic of open source will work for me again, and one of the many people who have commented will discover the issue.

Read Full Post »

They say that premature optimization is the root of all evil, and they are right. Most likely, this code performs as well as, or potentially even worse than, checking that strcmp(s1, s2) == 0.

Nonetheless, while working on my Google Summer of Code project, I needed to test that two strings are equal while ignoring the case. As a result I wrote a simple algorithm called strieq(s1,s2) which simply returns 1 if the strings are equal, and 0 otherwise. This differs from strcmp because strcmp provides the stuff necessary for an actual lexicographical comparison, which my particular application didn’t require.

So here I publish my code for your viewing pleasure. Possibly you will be able to use it, and while I’ve tested it and tried to make it safe, I’m not an extremely experienced C coder, so I may have missed something. If there is a bug, please report it (via e-mail or in the comments section) so that I can update it.

int strieq(const char *, const char *);
static inline char lower(char);

/* strieq(const char *str1, const char *str2)
 *
 * This function takes two ASCII strings and compares them for equality in a
 * case-insensitive manner. It doesn't work with Unicode strings -- that is,
 * it will only fix case for normal ASCII strings. Everything else is passed
 * through as-is.
 */
int strieq(const char *str1, const char *str2)
{
  while (TRUE)
  {
    if ((*str1 == '' && *str2 != '') || (*str1 != '' && *str2 == ''))
      break;

    /* From the above test we know that they are either both non-null or both
     * strings are at the end. The latter case means we have a match.
     */
    else if (*str1 == '')
      return TRUE;

    /* Check if the lowercased version of the characters match */
    if (lower(*str1) != lower(*str2))
      break;

    str1++; str2++;
  }
  return FALSE;
}

/* lower(char ch)
 *
 * This function takes a single alphabetic ASCII character and makes it
 * lowercase, useful for comparison.
 */
static inline char lower(char ch)
{
  /* Check that ch is between A and Z */
  if (ch >= 'A' && ch <= 'Z')
    return (ch + ('a' - 'A'));
  /* Otherwise return it as-is */
  return ch;
}

In case this is an issue:

I, the copyright holder of the above snippet for stricmp and lower, hereby release the entire contents therein into the public domain. This applies worldwide, to the extent that it is permissible by law.

In case this is not legally possible, I grant any entity the right to use this work for any purpose, without any conditions, unless such conditions are required by law.

Update: Small fixes to documentation and code; WordPress must have stripped characters when I first inserted the code.

Read Full Post »

Older Posts »