Samba v2

samba.grep.be.

This machine has been my trusty publically-accessible server for about four years now. It runs my website, including this blog, is my primary MX, contains my subversion repositories, and my gallery. For slightly less than the first year of Planet Grep, it was the only machine running that site (it has now been migrated to two machines running in two different data centers, to cope with the ever-increasing bandwidth usage of that site). It runs the bacula-director for our backup system, and it contains the modem behind our fax number, allowing us both to receive faxes and to log in using PPP to our local network. Most importantly, I've been using it as an SSH jumphost and IRC box to allow me to connect to other machines whenever I'm behind an overly paranoid firewall.

But now it's time to retire the machine. The high number of services combined with the, to today's standards, low number of available system resources, is beginning to be a problem. A few times now, already, I've had to reboot the machine because a load spike was making it unresponsive. So it has to go.

Many people will be shocked to find out that the machine powering samba up to now was an IBM SurePos 500. No, really. The reasons are long and complex, but suffice to say that there was a point in time where I wanted to set up a server for myself, and I had an idle, never-to-be-used-anymore, €2000+ priced, machine standing by. So there I went, usurped this machine as server, and have given it near-100% uptime. If you want to know just how near-100% that is:

  9 Power_On_Hours          0x0032   058   058   000    Old_age   Always       -       31170
[...]
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       99

Or, in plain English: according to SMART, the hard disk has seen just 99 power cycles in four years.

This machine was never designed to do that. Side note: yes, that is a touch screen on top of that box.

The new machine is much, much better. We're going from this:

model name	: Intel(R) Celeron(TM) CPU                1200MHz

to this:

model name	: Dual-Core AMD Opteron(tm) Processor 1210

which is much, much better. The box actually containing this processor is also much, much better.

However, there was one problem: the old processor was a PentiumIV-class processor, requiring Debian's i386 port, whereas the new one (obviously) contains a processor that will allow me to run the amd64 port; and I did intend to do this. Additionally, I wanted to migrate from a simple "root-on-partitions" system to a "root-on-LVM" system, with Xen somewhere in between. As such, simply rsync'ing the entire hard disk on the old system over to the new system (my usual way to migrate a non-critical server) wasn't going to work.

Luckily Debian isn't too hard to migrate from one machine to the other, however.

  • Installed Debian on the new machine, giving it the same hostname (samba) as the old one, but with root on LVM.
  • Ran 'dpkg --get-selections' on the old machine, feeding the output to 'dpkg --set-selections' on the new one. Installed packages.
  • Rsynced over /home
  • Rsynced /etc over to a separate directory on the new box, and copy some files (such as fstab) from the live /etc on the new box into the separate directory.
  • Rsynced /srv over
  • Rsynced /var over to a separate LVM volume.
  • Created a snapshot volume of that /var LVM volume, moved most of the live /var over to some place else, and added the original LVM volume (i.e., not the snapshot) to fstab. Obvious exceptions were such things as databases, which usually have an architecture-specific on-disk format, and the dpkg directory.
  • Brought the original system to runlevel 1.
  • Rsynced /home, /srv, and /var over again to get the last-minute changes in.
  • Rsynced (with --delete) the separate /etc over the live /etc.
  • Brought the original system down.
  • Rebooted, checked which services died because their on-disk format also differed between i386 and amd64, utterly and completely killed those, copied the files from the original amd64 /var back again, and made some type of plain-text dump that I then imported into that service.
  • Rinse, repeat until all those services work. Then, removed the /var snapshot (didn't need that backup anymore, then), and voila.

Or, well, that was the idea. Because I forgot about one silly detail: if I wanted to make everything work out correctly, I would have had to keep UID numbers in synch. Unfortunately, I forgot about that; and as a result, a number of files were created with the wrong owner. This made all kinds of things fail horribly; and rather than stop and think about it, which would have made me copy the old passwd file over from the old installation, add some extra lines for users that were on the old server but were not yet created on the new one, and then finally leave it at that, everything would've been fixed. Instead, I started to change ownership of files all over the filesystem, creating a mess of things.

Silly me.

But, well, in the end I did get everything to work correctly; even if it took me longer than expected. No worries.