Archive for June, 2009

Recently there was a thread on the Google Summer of Code students’ list discussing gender dynamics in open source, but more broadly, interactions between those of different genders (mainly the discussion was simplified to be a discussion of sexes, which I think demonstrates the lack of understanding of the difference between gender and sex. But I suppose that’s a blog post for another day).

It was noted that many of the women on the list have blog addresses and other details that quickly self-identify the authors as female. There was discussion about whether this is a good thing or not, and the possible reasons behind it.

Here is what I wrote:

I think what you mention about yourself shows the world what you think about yourself, and what you consider yourself.

If first and foremost you associate your identity with being female (or male) or straight (or not)… then I guess that’s your prerogative.

But I, for one, am not /just/ an Asian male. I’m not just a Computer Science student. I’m not just a coder. I’m not just an Engineering student. I’m not just 20-years old. I’m not just a blogger. I’m not just an Open Source contributor. I’m not just an advocate of strange and often unpopular ideas.

I am a human being, with many dimensions. And I don’t try to simplify it by putting myself in a box and categorizing myself as anything.

I think that the key is just to understand everyone for who they are, and part of that is being somewhat ambiguous. As Leslie [Hawthorne] somewhat alluded to, it’s about managing people’s preconceptions about you.

I do not actively try to hide that I am male, or that I am Asian (you might guess that from my last name). There are all sorts of preconceptions people might have about things, and there are lots of -isms we should seek to avoid. (I’m Asian – maybe that means I’m a bad driver, and that I can’t pronounce Rs. I’m male – maybe I’m violent. I’m in Computer Science, presumably that means I play Dungeons & Dragons with my classmates on the weekends. I’m in Engineering, maybe that means I’m sexist.)

The reality is: none of these things should matter, nor should they define you.

Just be yourself. You show to the world what you consider relevant about yourself.

And for what it’s worth, I found out the other day that someone I respect and admire in the open source community is a teenager. Somewhere around 15 years old. It’s impressive, really. I look up to him, because he’s a really smart guy. But that wasn’t something he brought up right away; his nickname wasn’t “smartdude15” or anything
like that. That’s the magic of open source, and the Internet — I judged him purely on his knowledge. And once I did find out, I thought to myself… Wow, would I have thought the same thing of him if I knew his age right away? Would I have even given him a chance, or would I just dismiss everything he said as something an immature teenager might say?

I think along with sexism there are tons of other issues to worry about, like racism (consider how difficult it is in some cultures, and even in Western culture, to be really accepted if you are gay, lesbian, transgender, bisexual, two-spirited, asexual, intersex…) In fact, being gay was considered a disease until relatively recently.

I’m glad for all the progress women have made in the past several decades. Not everyone has reached a point where they are accepted in mainstream society, and not everyone feels comfortable announcing certain details about themselves.

If *all* you are is a woman in a male-dominated world, then I feel sorry for you. I truly, truly do. Because none of the women I respect and admire are that. They are, first, talented Engineers, Scientists and Programmers, who are only incidentally female. Being female isn’t something that really identifies them any more than the colour of their skin, hair or eyes. No, no, they are talented, and that is, in the end, all I care about, and that is one reason I am grateful for Open Source — because you oftentimes don’t meet the people you are working with all the time in real life, so you cannot judge them on anything other than their ability.

Read Full Post »

Okay, so allow me to explain my personal development platform. I use Windows XP Professional as my primary operating system for various reasons, but I develop software for Linux (mostly because I like Linux, but also because I intend for a lot of my software to be used on my own servers, all of which can be considered Unix-like).

Because I prefer Windows for daily use, I’ve built various tools around the operating system and have become quite attached to them. I really like TortoiseSVN and Notepad++, for example. I’ve just gotten used to running Windows.

Anyway, I wanted to see if I could transition into using Linux for everything, and because I work in Debian and have servers running Debian, I installed Debian. I had some trouble at first, prompting me to change to Kubuntu, but later came back to Debian after realizing that the packages in unstable and testing that I’d come to love were under an entirely different process in Ubuntu, which is currently unknown to me.

So long Ubuntu, I’ve re-installed Debian and love it so far. One problem I’ve noticed is that Debian would not boot inside VMware Workstation under Windows, while Kubuntu was able to do so out-of-the-box. Instead, I got an error like this:

WARNING bootdevice may be renamed. Try root=/dev/hda2
Gave up waiting for root device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/sda2 does not exist. Dropping to a shell!

Looking into it further, it turned out I could edit the command line options to use a different device name–the debug output was right, I had to use /dev/hda2 instead of /dev/sda2–and things would work. The system would boot up normally, and everything was good. I guess it had to do with my drives actually being SATA (which I suppose are treated like SCSI drives); whereas VMware Workstation (in my setup) presents them as simple IDE drives.

But then I got to wondering how Ubuntu totally got around this problem, and it turns out that Ubuntu uses UUID identifiers to point to a hard drive. After searching on Google, it turns out that using UUIDs is convenient because it can handle cases where drives are removed; especially with external drives, or even re-arranging the drives inside your computer for whatever reason.

Using the command “blkid” from Debian (do this as root), you can get the UUID of all the partitions. You can either reboot into Linux using your normal dual-boot method (selecting the option from GRUB, etc), or you can edit the boot options from GRUB explictly to use /dev/hda2 temporarily. Once you boot into your Debian installation, you can edit /boot/grub/menu.lst to use drive UUIDs instead of /dev names.

Instead of:

# kopt=root=/dev/hda1 ro

or similar, you can change this to:

# kopt=root=UUID=<info from blkid for /dev/hda1> ro

Then you can use ‘update grub’ to update grub’s automagic menu.lst information, so that the new root is recognized. This way, however your disk is installed, the UUID will be detected and the appropriate drive mounted/booted. I haven’t tried this, but I suppose this would help if you have an operating system on a bootable USB drive or something, to help you select those partitions (though I don’t know if GRUB will even detect them on a boot. I’m hoping so since most BIOSes these days support booting from USB Mass Storage Media).

Good luck, I hope this helps somebody out there :-)

Read Full Post »

For my Google Summer of Code project, I have been working with PerlQt4 bindings, which requires that I have Qt4 installed. While this is technically possible under a Win32 environment. Lots of people in the free software community vehemently oppose Windows, but while it has its flaws, I think overall the hardware support is still much better than Linux. True, this is because of Microsoft’s shady business practices, and because many companies keep their driver source code closed. I’m still using Windows XP Professional and quite happy with it, stability-wise and feature-wise.

As an Engineer, many applications we use on a regular basis are simply not available on Linux. They’re simply not replaceable with the current state of open source software, though there is some great stuff out there. Nonetheless, we’re still far from a point where engineers in general can switch to Linux — the application support is as important to an operating system as the kernel. Linux would be nothing without GNU’s binutils, for example.

I tried to install Debian first, as this is an environment I’m very familiar with. I use Debian on my development server, and it has worked wonders there. But everything I do on that server is command-line stuff. When trying to install a desktop environment, I followed the KDE Configuration Wizard, which isn’t too bad, but it expects an Internet connection throughout the process. The problem was that I didn’t have enough Ethernet cables to have both the desktop computer and my laptop plugged in at the same time, even though I had a wireless router set up, which meant I had to unplug the computer while updating packages, etc. Some of the updates took quite a bit of time, which was inconvenient for everyone else.

I eventually got the system to install, and told tasksel to set up a desktop environment. It was installing stuff, I typed ‘apt-get install kde’ and assumed everything would Just Work. After installing a whole bunch of stuff (which included a local install of mysqld, on a desktop machine?! — turns out it was due to one of KDE’s recommended packages, it starts with an A, I forget which). Anyway, then the environment didn’t “just work” as I had expected. Upon booting up my system, it just dropped me to a command line prompt. Fine, I thought, I’ll just use startx. But that was broken too. So after another few hours of fiddling I just gave up altogether.

While trying Ubuntu (the last time I had done so was probably in version 7 or so), I downloaded a recent image of Kubuntu 9.04, the Ubuntu flavour using KDE as a default desktop environment. It’s surprising that there has been lots of progress in Ubuntu and Linux in general. I have found that driver support is much better than it used to be, as it now detects my network card – a Broadcom 43xx chip – and does everything it needs to do. For the most part, my operating system “Just Works.” Great. This looks like something I might be able to slowly transition toward, completely replacing Windows except inside WINE or a Virtual Machine container.

Has Debian and Ubuntu made lots of progress? Sure. I can definitely see that Ubuntu is geared a lot more to the average user, while Debian provides bleeding-edge features to the power user. Unfortunately, despite being involved in packaging Perl modules for Debian, I fall into the former category. I’d really just like my desktop system to just work. Oh, and dual monitor support out-of-the-box would be nice too — I hear the new KDE and Gnome support this.

One thing Windows handles rather well is changing hardware profiles – when my computer is connected to its docking station, a ton of peripherals are attached. When I undock, they’re gone. Windows handles this rather gracefully. In Kubuntu, I got lots of notification boxes repeatedly telling me that eth2 was disconnected, etc. This sort of thing is undecipherable for the average user, so I’d really just like for these operating systems to be more human-friendly before they are ready for prime time on the desktop.

Read Full Post »

One thing that makes Perl different from many other languages is that it has a rather small collection of core commands. There are only a few hundred commands in Perl itself, so the rest of its functionality comes from its rich collection of modules,  many of which are distributed via the Comprehensive Perl Archive Network (CPAN).

When CPAN first came on the scene, it preceded many modern package management systems, including Debian’s Advanced Packaging Tool (APT) and Ruby’s gem system, among others. As a consequence of its rich history, the CPAN Shell is relatively simplistic by today’s standards, yet still continues to get the job done quite well.

Unfortunately, there are two issues with CPAN:

  1. Packages are distributed as source code which is built on individual machines when installing or upgrading packages.
    • Since packages must be re-built on every machine that installs it, the system is prone to breaking and wastes CPU time and other resources. (The CPAN Testers system is a great way module authors can try to mitigate this risk, though.)
    • Due to wide variation in packages, many packages cause problems with the host operating system in terms of where they install files, or expect them to be installed. This is because CPAN does not (and cannot) know every environment that packages will be installed on.
  2. It does not integrate nicely with package managers
    • The standard CPAN Shell is not designed to remove modules, only install them. Removals need to be done manually, which is prone to human error such as forgetting to clean up certain files, or breaking other installs in the process.
    • It cannot possibly know the policies that govern the various Linux flavours or Unices. This means that packages might be installed where users do not expect, which violates the Principle of Least Surprise.
    • It is a separate ecosystem to maintain. When packages are updated via the normal means (eg, APT), packages installed via CPAN will be left alone (ie, not upgraded).

Here is the real problem: packages installed via CPAN will be left alone. This means that if new releases come out, your system will retain an old copy of packages, until you get into the CPAN Shell and upgrade it manually. If you’re administrating your own system, this isn’t a big problem — but it has significant implications for collections of production systems. If you are managing thousands of servers, then you will need to run the upgrade on each server, and hope that the build doesn’t break (thus requiring your, or somebody else’s, intervention).

One of the biggest reasons to select Debian is because of one of its primary design goal: to be a Universal Operating System. What this means is that the operating system should run on as many different platforms and architectures as possible, while providing the same rich environment to each of them to the greatest extent possible. So, whether I’m using Debian GNU/Linux x86 or Debian GNU/kFreeBSD x64, I have access to the same applications, including the same Perl packages. Debian has automated tools to build and test packages on every architecture we support.

The first thing I’m going to say is: if you are a Debian user, or a user of its derivatives, there is absolutely no need for you to create your own packages. None. Just don’t do it; it’s bad. Avoid it like the goto statement, mmkay?

If you come across a great CPAN package that you’d really like to see packaged for Debian, then contact the Debian Perl Packagers (pkg-perl) team, and let us know that you’d like a package. We currently maintain well over a thousand Perl packages for Debian, though we are by no means the only maintainers of Perl packages in Debian. You can do this easily by filing a Request For Package (RFP) bug using the command: reportbug wnpp.

On-screen prompting will walk you through the rest, and we’ll try to package the module as quickly as possible. When we’re done, you’ll receive a nice e-mail letting you know that your package has been created, thus closing the bug. A few days of waiting, but you will have a package in perfect working condition as soon as we can create it for you. Moreover, you’re helping the next person that seeks such a module, since it will already be available in Debian (and in due time it will propagate to its derivatives, like Ubuntu).

All 25,000+ Debian packages meet the rigorous requirements of Debian Policy. The majority of them meet the Debian Free Software Guidelines (DFSG), too; the ones which are not considered DFSG-free are placed in their own repository, separate from the rest of packages. A current work in progress is machine-parseable copyright control files, which will hopefully provide a way for administrators to quickly review licensing terms of all the software you install. This is especially important for small- and medium-sized businesses without their own intellectual property legal departments to review open source software, which is something that continues to drive many businesses away from using open source.

For the impatient, note this well: packages which are not maintained by Debian are not supported by Debian. This means that if you install something using a packaging tool (we’ll discuss these later) or via CPAN, then your package is necessarily your own responsibility. In the unlikely event that you totally break your system installing a custom package, it’s totally your fault, and it may mean you will have to restore an earlier backup or re-install your system completely. Be very careful if you decide to go this route. A few days waiting to ensure that your package will work on every platform you’re likely to encounter is worth the couple days of waiting for a package to be pushed through the normal channels.

The Debian Perl Packaging group offers its services freely to the public for the benefit of our users. It is much better to ask the volunteers (preferably politely) to get your package in Debian, so that it passes through the normal testing channels. You really should avoid making your own packages in a vacuum; the group is always open to new members, and it means your package will be reviewed (and hopefully uploaded into Debian) by our sponsors.

But the thing about all rules is that there are always exceptions. There are, in fact, some reasons when you might want to produce your own packages. I was discussing this with Hans Dieter Pearcey the other day, and he has written a great follow-up blog post about the primary differences between dh-make-perl and cpan2dist, two packaging tools with a similar purpose but very different design goals. Another article is to follow this one, where I will discuss the differences between the two.

Read Full Post »

When working on some code, I came across a random number generation algorithm (a pseudorandom number generator to be exact, since the numbers aren’t truly random but only designed to “look” like it to the casual observer) called ISAAC. The name stands for “Indirection, Shift, Accumulate, Add, and Count,” which is essentially what it does to a set of state variables, in that sequence.

The algorithm allows sequences of 32-bit numbers to be generated extremely quickly, since the operations it uses are relatively fast on modern processors; in fact, the author, Bob Jenkins, claims it only requires (on average) 18.75 machine processor cycles to generate each 32-bit value.

Even so, the algorithm is designed to be as uniformly distributed as possible.

I decided to test how fast this would be on a relatively modern multiprocessor system. The following is a benchmark on an amd64 machine with 2GB of memory. The benchmark script is available as part of the Math::Random::ISAAC package available via CPAN or its SVN Repository.

Math::Random TT800 Core
ISAAC::PP 134/s -54% -79% -82% -90% -98%
MT 292/s 118% -53% -61% -78% -95%
ISAAC::XS 626/s 367% 114% -16% -54% -90%
Math::Random 748/s 458% 156% 19% -45% -88%
TT800 1350/s 907% 362% 116% 80% -78%
Core 6211/s 4534% 2024% 892% 730% 360%

So as we can see, Perl’s built in rand() function is evidently really, really fast; quicker than every other algorithm studied here by a few orders of magnitude. What remains to be seen, then, is whether the quality of random numbers actually generated is any good. Ideally, most random number generators are designed to produce distributions of numbers that are uniformly distributed — that is, every number is equally likely to occur.

Uniform distributions often do not occur in real-life “random” samples – for example, things like height or marks in a class tend to follow more of a normal distribution (or “bell curve”) – the majority of the sample population is near the mean, and it drops off as you go to higher or lower figures. Nonetheless, these types of random numbers are great for uses in computer science — where you want each number to be equally likely, thus ensuring that your sequence is as unpredictable as possible.

In this case, all of the algorithms produce relatively uniform distributions, though the TT800 and Perl rand() core function produce a somewhat jagged distribution. Less interesting distributions are those for ISAAC, Math::Random and the Mersenne Twister.

Read Full Post »