Feeds:
Posts
Comments

Posts Tagged ‘Databases’

A specialized storage system known as a Round Robin Database allows one to store large amounts of time series information such as temperatures, network bandwidth and stock prices with a constant disk footprint. It does this by taking advantage of changing needs for precision. As we will see later, the “round robin” part comes from the basic data structure used to store data points: circular lists.

In the short term, each data point is significant: we want an accurate picture of every event that has occurred in the last 24 hours, which might include small transient spikes in disk usage or network bandwidth (which could indicate an attack). However, in the long term, only general trends are necessary.

For example, if we sample a signal at 5-minute intervals, then a 24-hour period will have 288 data points (24hrs*60mins/hr divided by 5 minutes per sample). Considering each data point is probably1 only 4 (float), 8 (double), 16 (quad) bytes, it’s not problematic to store roughly three hundred data points. However, if we continue to store each sample, a year would require about 105120 (365*288) data points; multiplied over many different signals, this can become quite significant.

To save space, we can compact the older data using a Consolidation Function (CF), which performs some computation on many data points to combine it into a single point over a longer period. Imagine that we take an average of those 288 samples at the conclusion of every 24 hour period; in that case, we would only need 365 data points to store data for an entire year, albeit at an irrecoverable loss of precision. Though we have lost precision (we no longer know what happened at exactly 5:05pm on the first Tuesday three months ago), the data is still tremendously useful for demonstrating general trends over time.

Though perhaps not the easiest to learn, RRDtool seems to have the majority of market share (without having done any research, I’d estimate somewhere between 90% and 98%, to account for those who create their own solutions in-house), and for good reason: it gets the job done quickly, provides appealing and highly customizable charts and is free and open source software (licensed under the GNU General Public License).

In a recent project, I learned to use RRDTool::OO to maintain a database and produce some interesting graphs. Since I was sampling my signal once every five minutes, I decided to replicate the archiving parameters used by MRTG, notably:

  • 600 samples store 2 days and 2 hours of data (at full resolution)
  • 700 samples store 14 days and 12 hours of data (where six samples become a 30-minute average)
  • 775 samples store 64 days and 12 hours of data (2-hour average)
  • 797 samples store 797 days of data (24-hour average)

F0r those interested, the following code snippet (which may be rather easily adapted for languages other than Perl) constructs the appropriate database:

archive => {
 rows    => 600,
 cpoints => 1,
 cfunc   => 'AVERAGE',
},
archive => {
 rows    => 700,
 cpoints => 6,
 cfunc   => 'AVERAGE',
},
archive => {
 rows    => 775,
 cpoints => 24,
 cfunc   => 'AVERAGE',
},
archive => {
 rows    => 797,
 cpoints => 288,
 cfunc   => 'AVERAGE',
},

There are also plenty of other examples of this technique in action, mainly related to computing. However, there are also some interesting applications such as monitoring voltage (for an uninterruptible power supply) or indoor/outdoor temperature (using an IP-enabled thermostat).

Footnotes

1. This may, of course, vary depending on the particular architecture

Advertisements

Read Full Post »

Strawberry Perl, by default, does not include many Database Drivers. While it does a great job installing most modules, some CPAN authors simply overlooked Win32 as a target platform, so the build/installation scripts get confused. Among these is the DBD::Pg driver (PostgreSQL database driver), which is really just a thin layer providing access to the C library, libpq.

In terms of working with PostgreSQL databases under Windows, this effectively leaves people with a few options:

  1. Try to install the Perl Package Manager (ppm) version of DBD-Pg. This didn’t work for me. I suppose that’s because the installer was expecting an ActivePerl-like environment, and I was using Strawberry Perl’s ppm tool.
  2. Compile the DBD::Pg drivers from scratch using Microsoft Visual Studio. This wasn’t a possibility for me because I don’t want to purchase Visual Studio. My school provides licenses via the MSDN Academic Alliance, but I wanted to use something more open-source if possible. Also, the Visual Studio suite is pretty large and takes a significant amount of time to install. It also clutters your machine with an SQL database, among other things.
  3. Install DBD::PgPP, a Pure Perl version of the PostgreSQL API. The problem with this is that there are lots of outstanding bugs, and so far it does not behave in exactly the same way as DBD::Pg.
  4. Install a specialized Perl package like Camelbox (one of Camelbox’s design goals was to provide DBI and popular DBD support out of the box). I didn’t like the idea of this because I’m so far a pretty big fan of Strawberry Perl and its sister project, Vanilla Perl. Together they seem like the most effective way to solve the Perl-on-Win32 dilemma.

As it turns out, there’s another option. Taking the package stuff meant for Camelbox and dropping it into your Strawberry Perl installation. It works flawlessly, and I’m very grateful to Brian Manning for his work on the project.

Here’s the quick and dirty:

  1. Download the postgresql-bin package from the Camelbox downloads area.
  2. Download the perl-DBD-Pg package from the same place.
  3. Open the lzma files using your favourite archiver program. I love 7-zip and it worked beautifully for extracting those files.
  4. Under the perl-DBD-Pg package, there should be a bunch of subdirectories; these correspond to those under C:\Strawberry\perl (or the perl subdirectory of wherever you installed Strawberry Perl). Extract all the files there.
  5. In the postgresql-bin package, there is a bin directory that contains a single file, libpq.dll. This one is really important for connecting to the database as it does all of the real work; the Perl stuff just binds to the library functions. Extract this one into C:\Strawberry\perl\site\lib\auto\DBD\Pg (or wherever your Pg.dll is installed)

Alternatively, it might be less stressful to just install Camelbox instead of Strawberry Perl; but this is entirely up to you.

Read Full Post »