WEBlog -- Wouter's Eclectic Blog

Wed, 28 Dec 2011

Rsync'ing over a newer subversion (fsfs) repository

So there are two servers; let's call them srv1 and srv2. Srv1 contains a bunch of subversion repositories, but these are to be migrated to srv2. Since the repositories are not (just) used for ascii-only files, they're fairly big (several tens of gigabytes, altogether), so copying them from one server to the other would take a while. In order to make sure this would happen quickly, we had already copied them over to the new server, so that on the final switch would be quick (an rsync that would copy over just the new done transactions).

That final switch was today. Only I didn't know that instead of just testing, the customer had already started using one of the repositories (and they'd forgotten to remind me), so the subversion repository suddenly jumped backwards in time. In addition, the new server wasn't being backed up yet (at least not for the subversion bits), so restoring from backups wasn't an option. Oops.

Luckily the solution is fairly simple. You see, fsfs stores each revision in a unique file; that means that as long as nobody has committed to the repository yet (which they couldn't, since system users aren't the same on both servers, so the webserver didn't have write permissions on the repository after the rsync), nothing is lost. One only needs to manually change the repository so that whatever subversion thinks is the latest commit, actually is the latest commit.

That information is stored in a file called db/current inside the repository. What's in that file depends on the repository version, which is stored in a file called db/format in the repository. For versions 1 and 2, the format is a single line with three space-separated values, of which the first is the last revision number used in the repository. The other two are counters that are used to give transactions unique names; and they, too, need to be up-to-date. For version 3 and above, the file contains only the revision number; there, the other two are derived from that instead of having their own unique number.

Figuring out the last used revision number in an fsfs repository is ridiculously easy:

ls -v db/revs|tail -n1

So if you've got a repository of fsfs version 3 or above, just change the revision number in the db/current file (after taking backups and making sure nobody can access the repository while you're doing this, of course), and you're all set.

Unfortunately, in my case, the repository was still in fsfs version 2, which meant I could not change just the revision number and not expect trouble. I suppose it should've been possible to figure out what the last transaction numbers are, somehow, so that I could fix the current file completely, but I reasoned that upgrading to a newer repository format might have other advantages too, so I just dumped the repository and reloaded it, and everything worked at that point.

Fri, 23 Dec 2011

Static program analysis with LLVM and clang

"Static program analysis" is a technique whereby a program is verified for errors without actually running it. Finding bugs manually with a debugger and one's brain is tedious, so every shortcut that can help you avoid having to do so is great.

There are some commercial tools available to do such analysis, some of which are rather expensive; but there are also some open source tools available to do similar things. One of these is built into clang, the C compiler of the LLVM project.

Using it is fairly simple. Instead of compiling something with 'make', compile it with 'scan-build make'. This will set the CC (and similar) environment variable(s) so that before the compiler is ran, the clang static-analysis checker is run over the very same source code. The output of this checker is an HTML file with your source, but with comments added to explain the bugs which the tool found.

What does that mean? well, let's look at an example, shall we?

The iframe above contains one out of three reports produced by a scan-build run over the nbd source code (if there's no iframe, someone scrubbed some HTML in your RSS reader. Just follow the link just above instead). The other two are 'dead assignments', which might mean that I'm currently depending on some undefined behaviour (which would be bad), or it might mean that I'm being overly cautious (which makes my code future-proof, which would be good), or it might mean something else—I still need to investigate. But this one is pretty interesting.

In the example, there are eight, numbered, comments in the source code. The first seven show the code path which scan-build took through my code before getting at the eighth comment; and the eighth is where things go bad. In this case, when going through the function as shown, we have a NULL pointer dereference.

When looking at the scan-build output, it's important to realize a few things. First, the code path shown may be just one of a number of possible code paths. For instance, the null pointer dereference would still happen if the phase function parameter would not contain the NEG_INIT bit with the client pointer set to NULL. However, clang does not show these other code paths; this is presumably an optimization ("if we've already shown that this kind of bug is possible at a particular location through one code path, don't bother recording future instances of that very same bug at the exact same location through another code path"). This means that sometimes, some of the branches shown may be completely irrelevant to the bug at hand. In this particular case, in fact, it's possible to show the bug with just the eighth comment; the first seven are in fact totally irrelevant.

Second, the fact that the clang static analyser found a bug does not mean that it's possible to crash the application. Yet. In this particular case, the negotiate function will never be called with the NEG_INIT or NEG_MODERN bits not set, and with the client parameter set to NULL. That's an implicit assertion; there are a few ways in which this function may be called, but the client parameter may only be NULL if NEG_INIT and NEG_MODERN are both set at the same time.

Since nbd-server doesn't currently call the negotiate function in that way, it is not currently possible to crash the server by exploiting this bug. But that doesn't mean it won't ever be possible, nor that there isn't a bug in the code. We may assume that the above rules are true, but we never check it. Adding an assertion to that effect should make sure that no future change to the code will accidentally introduce that error and cause a NULL pointer dereference.

Is this a silly and useless precautionary measure? Not really. Usually, bugs happen in code not because someone wasn't thinking straight, but because there's so much going on inside a piece of software that it can be too much for any programmer to remember. If a function assumes that its parameters are within a given subset of all possible states, but does not check that this is in fact true, then when (not if) some future change incorrectly introduces a state that is outside of the assumed states, things will break. And that's Bad(TM).

Thu, 15 Dec 2011

On beards and politics.

I'd almost forget...

Before:

Me with beard, two weeks ago

During:

After:

Me with less beard, exactly one week
later

They finally made it.

Mon, 07 Nov 2011

git-annex awesomeness

So a few days ago, there was this:

21:24 < wouter> hum.
21:24 < wouter> Anyone know of a tool to manage scanned documents?
21:25 < wouter> the idea being that I can tell this tool "here's a bunch of newly-scanned documents", and it will upload them to a server
21:25 < wouter> and it should allow me to easily find a specific file later on
21:25 < wouter> and I'd also like version control there
21:26 < wouter> and I do _not_ want to download the entire repository of scanned documents on my laptop (that's why I have a server)
21:26 < wouter> and perhaps I'd also like a pony to go with that.
21:29 < wouter> oh, yes, and I do _not_ want a webbrowser as the primary interface (that might be okay to look things up, but not to store stuff)

The answer, as it turned out, was git-annex: a tool to manage files with git, without checking them into git.

What, I hear you say? Yes, that sounds a little weird, doesn't it?

Perhaps it's easiest to explain with a little example.

$ git annex add 2011-11-07-belgacom.pdf
$ ls -l 2011-11-07-belgacom.pdf
lrwxrwxrwx 1 wouter wouter 191 nov  7 14:46 2011-11-07-belgacom.pdf ->
../.git/annex/objects/xx/3F/SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf/SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf

The file is now known to git-annex, and I can have it do all kinds of useful things with it now:

$ git annex drop 2011-11-07-belgacom.pdf
drop 2011-11-07-belgacom.pdf (unsafe)
  Could only verify the existence of 0 out of 1 necessary copies

  No other repository is known to contain the file.

  (Use --force to override this check, or adjust annex.numcopies.)
failed
git-annex: drop: 1 failed

Oops, we hadn't copied it to anywhere else yet. We don't want to lose our data!

$ git annex move --to server 2011-11-07-belgacom.pdf
move 2011-11-07-belgacom.pdf (checking server...) (to server...)
SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf
     1537334 100%    9.22MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 1537668 bytes  1025132.00 bytes/sec
total size is 1537334  speedup is 1.00
ok
$

What just happened? git-annex copied the file to a git remote called "server", and then dropped it from my local copy. It's no longer here! The symlink in my local directory is now a dead link; I can not open it anymore.

But, no worries! If we ever need it again, it's just a single command away.

$ git annex get 2011-11-07-belgacom.pdf
get 2011-11-07-belgacom.pdf (from server...) 
SHA256-s1537334--c44e1a057e247bfe7c196ac146c8a0ca32096c0b10df6c18fd3f1c2e99ecddbf
     1537334 100%    9.58MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 1537668 bytes  3075396.00 bytes/sec
total size is 1537334  speedup is 1.00
ok

This allows me to save space on my local laptop while not having to care where the files are -- they're just there. And it gets more awesome if you know that git-annex can store multiple copies of each file (so you have automatic distributed backups, as with regular git), where you can enforce the minimum number of copies. Also, git-annex supports multiple backends -- you can store your data in Amazon S3, or on an encrypted USB drive, or whatever, and have git-annex manage it transparently for you.

I said this already on IRC, but: Joey, I owe you beer.

Fri, 04 Nov 2011

Things not to do when validating email addresses

ERROR: WOUTER+TIMHOTEL@GREP.BE is not a valid email address

True, because I entered it in lowercase.

  1. Email address local parts (the bit before the @) are case sensitive. That means you can't just convert it to all caps and assume it will arrive.
  2. local parts can be composed out of any character from the following list:
    • alphanumeric characters (a-z, A-Z, 0-9), and
    • !#$%&'*+-/=?^_`{}|~.

This means that if you're trying to validate an email address by adding a regular expression that does more than check whether there's not exactly one @ in the address, you're almost always wrong.

Next time you try to tell me my mail address is invalid, go read the RFC first.

Morons.

Tue, 25 Oct 2011

diff6

There are six servers. They're meant to be used for the same purpose, and are mostly the same too, but not quite entirely. This being the case, mostly because of how they were maintained in the past: with cssh mostly, but it's inevitable that eventually there will be differences.

So we're in the process of migrating them to puppet now. One of the things I need to do while doing that, however, is figure out what the differences are, and get rid of them where applicable. Differences such as, say, in the installed packages. Some of these servers have extra things installed, for a reason that isn't clear anymore, that others don't.

So how does one figure out what the difference is? If this were two servers, I'd create a list of installed packages and use 'diff' on them. If there were three, I'd use 'diff3' instead. But if there are 6?

Turns out that isn't too hard. First, of course, you need a list of packages installed on each individual host. Both rpm and dpkg can do this with a simple command line invocation. Assuming that's done, and the output is stored in files called installed-host1 through installed-host6, the procedure is:

$ comm -12 installed-host1 installed-host2 > installed-common
$ for i in $(seq 3 6)
> do
>   comm -12 installed-common installed-host$i > installed-tmp
>   mv installed-tmp installed-common
> done

And there, you now have a file "installed-common" containing everything which is installed everywhere. If you use "comm" or "diff" to compare that file against the installed-hostX files, you can easily see what the difference is.

Granted, that doesn't give you the overview in just one file, but usually that's not really necessary. The 'installed-common' file contains the largest common denominator, which really is everything you need.

Sat, 08 Oct 2011

Re: 8bitmime

Would people please stop using and/or deplying MTAs that are not 8-bit safe? I mean, it's the 2010s for crying out loud; not speaking 8BITMIME is very much frowned upon (page 16, paragraph 2, second sentence).

Sun, 02 Oct 2011

Test suites considered good

Yesterday, I uploaded NBD 2.9.24-1 to Debian. The big new feature in that release is that it now has an 'includedir' global configuration variable, with which it supports a conf.d-style directory. This feature was implemented by request of Vagrant Cascadian, of LTSP fame.

Only it didn't build on kFreeBSD. The reason? The particular test that I added for that new feature failed.

It turned out to be a simple bug in the code, which just didn't trigger on Linux but which was a bug none the less. I hadn't noticed, since my main development machine runs Linux; this is why I added a test suite to the code.

Bug fixed, new package is in incoming. Whee.

Sat, 01 Oct 2011

Bugs filed versus the phase of the moon.

One of the more classic jokes about not yet understood bugs is that the phase of the moon is somehow involved in causing it.

Being bored, I decided to spend some time to see whether "date of bug filed" could somehow be correlated with "phase of the moon" for a given source package.

Fast forward an hour of perl experimenting, and here we are:

#!/usr/bin/perl -w

use strict;
use warnings;

use constant PI => 3.1415926535;

use feature "say";

use SOAP::Lite;
use Astro::Coord::ECI::Moon;

my $soap = SOAP::Lite->uri('Debbugs/SOAP')->proxy('http://bugs.debian.org/cgi-bin/soap.cgi');

if (!defined($ARGV[0])) {
	die "E: must have a source package!\n";
}

my @bugs = $soap->get_bugs(src=>$ARGV[0])->result();

my $bugsdata = $soap->get_status(@bugs)->result();
my $moon = Astro::Coord::ECI::Moon->new();
my %count = ( 'new' => 0, 'first' => 0, 'full' => 0, 'last' => 0);

foreach my $bug (keys %$bugsdata) {
	my $time = $$bugsdata{$bug}->{date};
	my $phase = $moon->phase($time);
	if ($phase <= 45 * PI / 180 || $phase > 315 * PI / 180) {
		$count{'new'} = $count{'new'} + 1;
	} elsif ($phase <= 135 * PI / 180 && $phase > 45 * PI / 180) {
		$count{'first'} = $count{'first'} + 1;
	} elsif ($phase <= 225 * PI / 180 && $phase > 135 * PI / 180) {
		$count{'full'} = $count{'full'} + 1;
	} elsif ($phase <= 315 * PI / 180 && $phase > 225 * PI / 180) {
		$count{'last'} = $count{'last'} + 1;
	}
}

say "Number of bug submissions during new moon     : " . $count{new};
say "Number of bug submissions during first quarter: " . $count{first};
say "Number of bug submissions during full moon    : " . $count{full};
say "Number of bug submissions during last quarter : " . $count{last};

This uses the (not packaged in Debian) Astro::Coord::ECI::Moon perl module.

Use like so:

wouter@carillon:~code/perl$ ./debbugsmoon nbd
Number of bug submissions during new moon     : 2
Number of bug submissions during first quarter: 10
Number of bug submissions during full moon    : 1
Number of bug submissions during last quarter : 3

Apparently there's a reasonable correlation between 'the moon is in the first quarter' and 'people file bugs on nbd'.

Note: no, the above is probably not very scientific. That's not the point.

Thu, 29 Sep 2011

Youtube music

There's loads of music on YouTube. However, when I want to listen to music, I'm not necessarily interested in slideshows. And yes, that's a euphemism.

Luckily, extracting the video and audio from a youtube clip isn't very hard.

The critical bit is the '-vn' part, which tells ffmpeg not to encode the video to the output file. The 'mplayer' step isn't strictly required; but since youtube only uses lossy audio codecs, re-encoding the audio to a different codec will cost you audio quality. You may not want to do that; hence the mplayer step (to figure out which codec the source file is using), and the '-acodec copy' argument to ffmpeg's command line (which tells ffmpeg to not re-encode the audio, but just store the unmodified original audio stream in a different container file).

Tadaa, music without video.