Feeds:
Posts
Comments

Archive for June, 2009

One thing that makes Perl different from many other languages is that it has a rather small collection of core commands. There are only a few hundred commands in Perl itself, so the rest of its functionality comes from its rich collection of modules,  many of which are distributed via the Comprehensive Perl Archive Network (CPAN).

When CPAN first came on the scene, it preceded many modern package management systems, including Debian’s Advanced Packaging Tool (APT) and Ruby’s gem system, among others. As a consequence of its rich history, the CPAN Shell is relatively simplistic by today’s standards, yet still continues to get the job done quite well.

Unfortunately, there are two issues with CPAN:

  1. Packages are distributed as source code which is built on individual machines when installing or upgrading packages.
    • Since packages must be re-built on every machine that installs it, the system is prone to breaking and wastes CPU time and other resources. (The CPAN Testers system is a great way module authors can try to mitigate this risk, though.)
    • Due to wide variation in packages, many packages cause problems with the host operating system in terms of where they install files, or expect them to be installed. This is because CPAN does not (and cannot) know every environment that packages will be installed on.
  2. It does not integrate nicely with package managers
    • The standard CPAN Shell is not designed to remove modules, only install them. Removals need to be done manually, which is prone to human error such as forgetting to clean up certain files, or breaking other installs in the process.
    • It cannot possibly know the policies that govern the various Linux flavours or Unices. This means that packages might be installed where users do not expect, which violates the Principle of Least Surprise.
    • It is a separate ecosystem to maintain. When packages are updated via the normal means (eg, APT), packages installed via CPAN will be left alone (ie, not upgraded).

Here is the real problem: packages installed via CPAN will be left alone. This means that if new releases come out, your system will retain an old copy of packages, until you get into the CPAN Shell and upgrade it manually. If you’re administrating your own system, this isn’t a big problem — but it has significant implications for collections of production systems. If you are managing thousands of servers, then you will need to run the upgrade on each server, and hope that the build doesn’t break (thus requiring your, or somebody else’s, intervention).

One of the biggest reasons to select Debian is because of one of its primary design goal: to be a Universal Operating System. What this means is that the operating system should run on as many different platforms and architectures as possible, while providing the same rich environment to each of them to the greatest extent possible. So, whether I’m using Debian GNU/Linux x86 or Debian GNU/kFreeBSD x64, I have access to the same applications, including the same Perl packages. Debian has automated tools to build and test packages on every architecture we support.

The first thing I’m going to say is: if you are a Debian user, or a user of its derivatives, there is absolutely no need for you to create your own packages. None. Just don’t do it; it’s bad. Avoid it like the goto statement, mmkay?

If you come across a great CPAN package that you’d really like to see packaged for Debian, then contact the Debian Perl Packagers (pkg-perl) team, and let us know that you’d like a package. We currently maintain well over a thousand Perl packages for Debian, though we are by no means the only maintainers of Perl packages in Debian. You can do this easily by filing a Request For Package (RFP) bug using the command: reportbug wnpp.

On-screen prompting will walk you through the rest, and we’ll try to package the module as quickly as possible. When we’re done, you’ll receive a nice e-mail letting you know that your package has been created, thus closing the bug. A few days of waiting, but you will have a package in perfect working condition as soon as we can create it for you. Moreover, you’re helping the next person that seeks such a module, since it will already be available in Debian (and in due time it will propagate to its derivatives, like Ubuntu).

All 25,000+ Debian packages meet the rigorous requirements of Debian Policy. The majority of them meet the Debian Free Software Guidelines (DFSG), too; the ones which are not considered DFSG-free are placed in their own repository, separate from the rest of packages. A current work in progress is machine-parseable copyright control files, which will hopefully provide a way for administrators to quickly review licensing terms of all the software you install. This is especially important for small- and medium-sized businesses without their own intellectual property legal departments to review open source software, which is something that continues to drive many businesses away from using open source.

For the impatient, note this well: packages which are not maintained by Debian are not supported by Debian. This means that if you install something using a packaging tool (we’ll discuss these later) or via CPAN, then your package is necessarily your own responsibility. In the unlikely event that you totally break your system installing a custom package, it’s totally your fault, and it may mean you will have to restore an earlier backup or re-install your system completely. Be very careful if you decide to go this route. A few days waiting to ensure that your package will work on every platform you’re likely to encounter is worth the couple days of waiting for a package to be pushed through the normal channels.

The Debian Perl Packaging group offers its services freely to the public for the benefit of our users. It is much better to ask the volunteers (preferably politely) to get your package in Debian, so that it passes through the normal testing channels. You really should avoid making your own packages in a vacuum; the group is always open to new members, and it means your package will be reviewed (and hopefully uploaded into Debian) by our sponsors.

But the thing about all rules is that there are always exceptions. There are, in fact, some reasons when you might want to produce your own packages. I was discussing this with Hans Dieter Pearcey the other day, and he has written a great follow-up blog post about the primary differences between dh-make-perl and cpan2dist, two packaging tools with a similar purpose but very different design goals. Another article is to follow this one, where I will discuss the differences between the two.

Read Full Post »

When working on some code, I came across a random number generation algorithm (a pseudorandom number generator to be exact, since the numbers aren’t truly random but only designed to “look” like it to the casual observer) called ISAAC. The name stands for “Indirection, Shift, Accumulate, Add, and Count,” which is essentially what it does to a set of state variables, in that sequence.

The algorithm allows sequences of 32-bit numbers to be generated extremely quickly, since the operations it uses are relatively fast on modern processors; in fact, the author, Bob Jenkins, claims it only requires (on average) 18.75 machine processor cycles to generate each 32-bit value.

Even so, the algorithm is designed to be as uniformly distributed as possible.

I decided to test how fast this would be on a relatively modern multiprocessor system. The following is a benchmark on an amd64 machine with 2GB of memory. The benchmark script is available as part of the Math::Random::ISAAC package available via CPAN or its SVN Repository.

Rate ISAAC
PP
MT ISAAC
XS
Math::Random TT800 Core
ISAAC::PP 134/s -54% -79% -82% -90% -98%
MT 292/s 118% -53% -61% -78% -95%
ISAAC::XS 626/s 367% 114% -16% -54% -90%
Math::Random 748/s 458% 156% 19% -45% -88%
TT800 1350/s 907% 362% 116% 80% -78%
Core 6211/s 4534% 2024% 892% 730% 360%

So as we can see, Perl’s built in rand() function is evidently really, really fast; quicker than every other algorithm studied here by a few orders of magnitude. What remains to be seen, then, is whether the quality of random numbers actually generated is any good. Ideally, most random number generators are designed to produce distributions of numbers that are uniformly distributed — that is, every number is equally likely to occur.

Uniform distributions often do not occur in real-life “random” samples – for example, things like height or marks in a class tend to follow more of a normal distribution (or “bell curve”) – the majority of the sample population is near the mean, and it drops off as you go to higher or lower figures. Nonetheless, these types of random numbers are great for uses in computer science — where you want each number to be equally likely, thus ensuring that your sequence is as unpredictable as possible.

In this case, all of the algorithms produce relatively uniform distributions, though the TT800 and Perl rand() core function produce a somewhat jagged distribution. Less interesting distributions are those for ISAAC, Math::Random and the Mersenne Twister.

Read Full Post »

« Newer Posts