Feeds:
Posts
Comments

Posts Tagged ‘TIMTOWDI’

They say that premature optimization is the root of all evil, and they are right. Most likely, this code performs as well as, or potentially even worse than, checking that strcmp(s1, s2) == 0.

Nonetheless, while working on my Google Summer of Code project, I needed to test that two strings are equal while ignoring the case. As a result I wrote a simple algorithm called strieq(s1,s2) which simply returns 1 if the strings are equal, and 0 otherwise. This differs from strcmp because strcmp provides the stuff necessary for an actual lexicographical comparison, which my particular application didn’t require.

So here I publish my code for your viewing pleasure. Possibly you will be able to use it, and while I’ve tested it and tried to make it safe, I’m not an extremely experienced C coder, so I may have missed something. If there is a bug, please report it (via e-mail or in the comments section) so that I can update it.

int strieq(const char *, const char *);
static inline char lower(char);

/* strieq(const char *str1, const char *str2)
 *
 * This function takes two ASCII strings and compares them for equality in a
 * case-insensitive manner. It doesn't work with Unicode strings -- that is,
 * it will only fix case for normal ASCII strings. Everything else is passed
 * through as-is.
 */
int strieq(const char *str1, const char *str2)
{
  while (TRUE)
  {
    if ((*str1 == '' && *str2 != '') || (*str1 != '' && *str2 == ''))
      break;

    /* From the above test we know that they are either both non-null or both
     * strings are at the end. The latter case means we have a match.
     */
    else if (*str1 == '')
      return TRUE;

    /* Check if the lowercased version of the characters match */
    if (lower(*str1) != lower(*str2))
      break;

    str1++; str2++;
  }
  return FALSE;
}

/* lower(char ch)
 *
 * This function takes a single alphabetic ASCII character and makes it
 * lowercase, useful for comparison.
 */
static inline char lower(char ch)
{
  /* Check that ch is between A and Z */
  if (ch >= 'A' && ch <= 'Z')
    return (ch + ('a' - 'A'));
  /* Otherwise return it as-is */
  return ch;
}

In case this is an issue:

I, the copyright holder of the above snippet for stricmp and lower, hereby release the entire contents therein into the public domain. This applies worldwide, to the extent that it is permissible by law.

In case this is not legally possible, I grant any entity the right to use this work for any purpose, without any conditions, unless such conditions are required by law.

Update: Small fixes to documentation and code; WordPress must have stripped characters when I first inserted the code.

Read Full Post »

The long and short answer? Both.

Recently an article crawled up Proggit talking about whether we should be using Content Management Systems, or rather, more generic frameworks. The main contention of the author seems to be that generic frameworks are more useful, because programmers end up spending a lot of time undoing “features” offered by CMS packages.

Being from a Perl philosophy, I think TIMTOWDI (pronounced Tim-Toady) – There Is More Than One Way To Do It – applies here. Don’t use a nail when a thumbtack will suffice. Don’t use a Framework when a ready-made open-source/cheap CMS will suffice.

The reason that solutions like SharePoint and WordPress and numerous others exist, and why they are popular, is because they solve a particular set of problems. WordPress lets you get a blog up pretty quickly, but it was never designed for creating full-fledged web sites like Recovery.gov or what-have-you.

Ostensibly, I think that these solutions come from smaller shops or individuals that need a solution that is “close enough” to what they really want. Certainly, it’s cheaper to get up and running with a few minutes spent downloading and installing WordPress (or better yet, using the service of WordPress.com).

Many people and groups cannot afford or do not wish to incur the expense of hiring a web programmer to do the job for them. Content Management Systems still serve an important role, especially since they are general enough to enjoy the same sorts of updates. So when WordPress releases a new version of its flagship product, you get new features that are (pick one):

  • Most requested by its users
  • Required internally by WordPress for its own uses
  • Added by its contributors, or backported from forks of the software
  • Useful for increasing the security of the product

You lose a lot of these benefits with Web Frameworks, but you gain control and flexibility over your software.

My point? Reuse things if they are available; do what is necessary to get the job done, but don’t lose sight of the problem in favour of some perceived “elegant” solution. Sometimes even $40,000 to buy software up front is cheaper than paying a programmer $60,000 a year for several years to maintain the product. Along the same lines as open source, consider contacting the authors of their software to see if they will be willing to write additional features for your company on a contractual basis. Many authors offer this, and many more would probably be perceptive to the idea. And, since most changes are likely to be pretty minor, it doesn’t have to cost much, either.

Read Full Post »

One of my Year II-level Computer Science courses is called Data Structures and Algorithms. In a somewhat rigorous and primarily mathematical way, we determine the optimality of different ways of solving the most common problems in computer science.

Not every method of accomplishing a task is the same, so it goes without saying that some will be better than others, for a given application. Because we spend a lot of time solving a few common but complex problems, general solutions are indispensable. But algorithms are like apples and oranges and it is difficult to directly compare them. Every day, we make simple comparisons between disparate objects by finding a common measure – perhaps the weight, or the size/volume.

In computer science, one such metric is Big Oh time complexity, which tells us the so-called “order” of a function or algorithm. In simple terms, Big Oh tells us the maximum number of basic operations (additions, subtractions, “print” statements) required to accomplish a task. Our class is concerned with the asymptotic properties of Order – that is, what happens when the parameter approaches very large (“infinite”) numbers – for large structures, for large sets of data. There is also Big Oh space complexity, which I won’t discuss here.

After years of rigorous work by mathematicians, software engineers and computer scientists alike, you might ask yourself, why is it important to understand algorithm analysis? After all, conceivably, there already exists an “optimal” solution to every problem, right?

But that’s only half of the picture. The definition of optimality depends largely on the domain you’re looking at, and this makes a lot of sense – your ability to use these general solutions depends on external constraints. For example, development on embedded systems poses problems for developers because they offer slower processor speeds and have more constrained memory.

However, the scientific method we use most often to study anything tells us that we must first hold all variables constant except the one under inspection. That means to really compare two algorithms properly, we must be inspecting them on the same dimension (time or space) and we must be ignoring the other factors.

The measure we focus on in Data Structures and Algorithms is the “worst-case time complexity” of algorithms, which is to say, the number of primitive operations that must be conducted to accomplish a given task in the worst case. The definition of what “worst case” is varies from application to application and requires a greater understanding of the algorithm in question.

If there is an algorithm which always requires T(n) = n + 5 primitive operations to complete, for different values of the input n, then the Order of T(n), O(T(n)) is O(n). Why? The strictly mathematical definition is a bit convoluted (and can be found elsewhere), but the essential idea is that Big Oh notation provides an upper bound of the amount of time an algorithm will take to complete in the worst conceivable case and for data sets approaching infinite size. It’s not difficult to see why the “5” constant term is dropped, then: for large data sets, the value of n becomes much larger than the constant shift factor “5”, so the constant becomes less and less significant.

Now we have a way of comparing algorithms. Unfortunately, while this measure of algorithms is simple to learn and good in a basic mathematical sense, it neglects the impact of different technologies on algorithmic performance. Some algorithms are specially designed by Computer Scientists to be optimized for some of these more complicated mechanisms, which is why benchmarking on your target systems is always important.

Modern processors come with all sorts of nifty additions, including technologies like hyper-threading, multi-core processors, extra L1/L2/L3 cache and different instruction sets like RISC or MISC. Working on understanding these requires an understanding of the physical properties of these hardware, and I hope to spend some time explaining these in greater depth for later articles.

Read Full Post »