What would you do if you had to erase all the files securely on a server thousands of miles away?
In many ways, I feel sorry for people stuck with proprietary operating systems. When something goes wrong or if they have a problem to solve, the solution either is obvious, requires buying special software or is impossible. With Linux, I've always felt that I was limited only by my own programming and problem-solving abilities, no matter what problem presented itself. Throughout the years that Linux has been my primary OS, I've run into quite a few challenging and strange problems, such as how to hot-migrate from a two-disk RAID 1 to a three-disk RAID 5, or more often, how to somehow repair a system I had horribly broken.
Recently, I ran into an interesting challenge when I had to decommission an old server. The server had quite a bit of sensitive data on it, so I also had to erase everything on the machine securely. Finally, when I was done completely wiping away all traces of data, I had to power off the machine. This is a relatively simple request when the server is under your desk: boot a rescue disk, use a tool like shred to wipe the data on all the hard drives, then press the power button. When the server is in a remote data center, it's a little more challenging: use a remote console to reboot into a rescue disk, wipe the server, then remotely pull the power using some networked PDU. When, like me, you have to wipe a server thousands of miles away with no remote console, no remote power, no remote help and only an SSH connection, you start scratching your head.
At this point, some of you might be asking: “Why would you ever need to do this?” It turns out there are a few different reasons both legitimate and shady:
You have broken hardware. This could be a server with a broken video card, a malfunctioning KVM or remote serial console, or some other problem where physical hardware access just doesn't work.
You are locked out from your server. This could happen, for instance, if you colocate your server in a data center but stop paying your bills or somehow have a falling out with the provider. They revoke your physical access to your server, but you need to remove all the sensitive files while the machine is still available over the network.
You have a bad consulting client. Perhaps you are a responsible and talented sysadmin who sets up a server for a client in good faith only to have that client refuse to pay you once the server is on-line. You want to remove your work securely, the client won't return your calls, yet you still have SSH access to the machine.
You bought a cloud server with inadequate tools. It is very popular these days to host your server environment in the cloud; however, one downside is that many cloud providers cut costs by giving you limited access to management of your cloud instance. Do you really trust that when you terminate a server instance it is securely erased? Do you get access to tools that would let you boot a rescue disk on your cloud instance? In some cases, about the only remote management you have for a cloud server might be your SSH connection.
You are an evil, malicious hacker who wants to cover his tracks. Yes, this is the least legitimate and most shady reason to wipe a server remotely, but I figured I should mention it in the interest of completeness.
It's a challenge. Some people climb mountains, others run marathons, still others try to wipe servers remotely over SSH. You could just be a person who likes to push things to the limit, and this sounds like an interesting challenge.
Now that you have worked through the reasons you might need to know how to wipe a server remotely, let's talk about how you actually would do it. First, and most important, there are no redos! When you write random bits to a raw disk device, especially over SSH, you have only one shot to get it right. When I was preparing this process, I tested my procedure multiple times on virtual machines to make sure my steps were sound. I'm glad I did, as it took a few times to get all the steps right, confirm my assumptions and get the commands in the correct order.
What makes this challenge tricky is the fact that you will write randomly over the very filesystem you are logged in to. What happens if you overwrite the sshd and shred files while you are running shred and logged in over SSH? More important, what happens when you overwrite the kernel? The main principle that will make this procedure work is the fact that Linux likes to cache files to RAM whenever it can. As long as you can make sure everything you need is stored in RAM, you can overwrite the filesystem as much as you want. The trick is just identifying everything you need to store in RAM.
So, I mentioned there was no redo to this procedure, but that doesn't mean you can't set up some sort of safety net for yourself. Although I knew that once I launched the shred command it would run completely from RAM, what I had to figure out was what commands I would need to run after shred. Even commands like ls won't work if there's no filesystem to read. So that I would have some sort of backup plan, I took advantage of the /dev/shm ramdisk that all modern Linux systems make available. This is a directory that any user on the system can write to, and all files will be stored completely in RAM.
Because I wasn't sure whether commands like echo (which I would need later) would work after I had shredded the hard drive, I copied it to /dev/shm along with any other files I thought I would need. If you have the space, why not copy all of /bin, /sbin and /lib if you can. Finally, I knew I would need access to the /proc filesystem to power off the server. I assumed I still would have access to /proc even if I had overwritten the root filesystem, but I wasn't 100% certain, so just to be safe, I became root (you can't assume sudo will work later) and mounted an extra copy of /proc under /dev/shm as the root user:
$ sudo -s # mkdir /dev/shm/proc # mount -t proc proc /dev/shm/proc
It turns out I ultimately didn't need any of these precautions, but it doesn't hurt to be prepared.
Now is the point of no return. Just to be safe, I changed to the /dev/shm directory so my current working directory would be on a ramdisk. Then, I unmounted any unnecessary mountpoints (like /home) and ran the shred command below on every nonroot drive on the system. In my case, I used software RAID, so I also took the extra step of hot-removing all but one drive from any RAID array and shredded them separately. Finally, I was left with just my root filesystem stored on /dev/sda, so I took a deep breath and typed the following command:
# shred -n2 -z -v /dev/sda
This command writes random bits to /dev/sda for two complete passes (-n2) then does a final pass with zeros so the drive looks perfectly clean (-z) with verbose output so I can see what's going on (-v). Of course, adjust the -n argument to your particular level of paranoia—two passes was fine for me. I have to admit, there's something satisfying and strange about overwriting the root filesystem while you are still logged in.
Once the shred process completed, I had a completely empty filesystem. It was weird—commands like ls gave odd errors, and I knew I was isolated in my /dev/shm jail. All that was left was to shut down the server, but how do you do that when /sbin/shutdown is erased? No problem, you might say, just kill PID 1, since if you kill init, it will halt the system. That would work if, say, the kill program still were around. In this case, the only way I had left to shut down the system was via the /proc interface. The /proc directory is a special filesystem that allows you access to processes and kernel information, and it resides entirely in RAM, so my little shred stunt didn't wipe it out. To halt the machine, just enable the sysrq interface in the kernel, and then send the right command to sysrq:
# echo 1 > /proc/sys/kernel/sysrq # echo o > /proc/sysrq-trigger
If the halt command doesn't work, or if you just want to reboot the machine instead, you would type:
# echo b > /proc/sysrq-trigger
Now you might be asking yourself, didn't I overwrite the echo command? After all, /bin/echo is on the root filesystem. It turns out I didn't even have to rely on my copy of the command under /dev/shm—echo is one of the programs that are built in to the bash shell. When you execute echo, bash executes the version that is built in to itself, and because I already was inside a bash shell, the executable ran from RAM. Once you run the last echo command, the kernel instantly will halt. Any remote pings or other commands will stop, and the system will be powered off.
As a final note, I want to say that even if you don't think you'll ever need to go to such lengths to wipe a server, I think this procedure is such fun that you should at least try it in a disposable virtual machine. Shred the system and see which commands still work and which ones don't. As an extra challenge, see if you can get commands to run within /dev/shm.