02 January 2012

File server WoL

A while ago I decided to upgrade my file server hardware but didn't want it to run 24/7 anymore.
Obviously I only use it for a small portion of the day, so it shouldn't be running all the time.
Saves money and energy, right?
I quickly decided to look into wake-on-lan to wake it up using an app on my phone.
The minor inconvenience of having to push a button on my phone and waiting a couple seconds would be offset by the saved money.

There are plenty WoL apps on the Android market, but I settled on "Wol Wake on Lan Wan", the main reason being the simple straightforward widget it has.
Push the widget and it sends the magic packet, that's it... works like a dream.

Server side, there were a couple of problems.
I use encryption for the storage volumes so just booting the server each time wasn't an option.
Having to enter the key each time kind of retarded for file server usage.
So that left me with a couple options:

  • Use a USB device with the decryption key on.
    • Insecure, physical access means access to the encrypted data.
  • Don't use encryption
    • Insecure, obviously.
  • suspend-to-disk / hibernate / S4.
    • Insecure, physical access means access to the encrypted data as hiberation writes the complete content of the memory to disk and then completely shuts down the server. Next time it boots it just puts the content back to memory.
    • Relatively slow.
  • suspend-to-ram / sleep / S3.
    • More secure. S3 doesn't write anything to disk, the system goes into a power saving state where everything but a couple components still get power to ensure the system can boot to full operational state quickly.
    • Less energy saving than the other options.
    • Wakes up very quickly (matter of seconds).

In the end, I went for S3 as it'd be secure enough for my needs, the power saving would still be a lot better than running it 24/7 and usability is exactly what I wanted.

For some reason the suspend wouldn't work correctly with a 2.6 kernel, so I installed 3.0.0 through apt-get.
It would go into sleep but never wake or work fine for a couple of times before stopping to work.
After troubleshooting the hardware I decided it must have had something to do with the kernel/LKM's so I tried the latest and greatest and it worked fine.
ACPI Linux implementation redesign? Beats me and tbh I've been too lazy to check out what exactly went wrong. Linux 3.0 works, so I'll be using it.

Moving on; the main focus of this project was to have an easy way to serve the files to my PS3 through Mediatomb.
I first thought about a system where a packet sniffer would constantly be sniffing the network for possible connections to the file server and send wake it up as soon as it saw any.
But that required a constantly running server and I wanted as less devices to run 24/7 as possible.
So then I though it'd be pretty cool to have the file server itself look if anything was connected to it and go to S3 state once a certain timeout when nothing connecting to it would be reached.
This is the script I'm using right now to do just that:
--------------------------------------------------
#!/bin/bash
timer=0
timeout=4
interval=60
while [ True ]; do
sync
date=`date`
echo "$date - MARK" >> /var/log/night
# If we're coming out of S3, we want to kill these
killall s2ram 2>/dev/null
killall pm-suspend 2>/dev/null
# Sometimes resyncing gets stuck on Debian
grep resync=PENDING /proc/mdstat 2>/dev/null
if [ $? -eq 0 ]; then
date=`date`
echo "$date - Setting md0 r/w" >> /var/log/night
mdadm --readwrite /dev/md0
fi
# Check our networking status
ping -c 1 192.168.1.1 2>/dev/null
if [ $? -ne 0 ]; then
date=`date`
echo "$date - Couldn't reach gateway, restarting networking" >> /var/log/night
/etc/init.d/networking restart
fi
# Check the WoL setting
ethtool eth2 | grep "Wake-on: g"
if [ $? -ne 0 ]; then
date=`date`
echo "$date - WoL setting wrong, correcting" >> /var/log/night
/sbin/ethtool -s eth2 wol g
fi
connections=`netstat -natp | grep ESTABLISHED | wc -l`
udpconnections="0"
if [ $connections == "0" -a $udpconnections == "0" ]; then
# Ok, no connection, let's increase the timer value
date=`date`
echo "$date - No connections, pass $timer of $timeout" >> /var/log/night
let timer=$timer+1
sleep 10
else
# Oops, there's a connection, resetting the timer
date=`date`
echo "$date - Connections detected" >> /var/log/night
timer=0
fi
# It's time to go to sleep
if [ $timer == $timeout ]; then
date=`date`
echo "$date - Going night night" >> /var/log/night
timer=0
pm-suspend &
sleep 30
fi
date=`date`
echo "$date - Lingering for $interval seconds" >> /var/log/night
sleep $interval
done
---------------------------------------

Of course, this script applies to my situation.
If you want to use it, you'll have to adjust some settings like the networking interface, the timeout you want, the ip address of the server itself and so on.
But it works pretty well and manages to avoid some pitfalls I encountered along the way.

Right, so the script was ready and worked as it should.
Now how could I get it to run "constantly"?
I tried running it through a startup script as a daemon, but for some reason it wouldn't survive the S3.
I tried running it through cron, which was fine but it wouldn't run 'all the time' as I originally wanted.
Even if the above solution would've been fine, it still wouldn't be optimal as the daemon/script could've been killed by other processes and I don't want that. I want it running, all the time.
So the perfect solution was to use init(8) through /etc/inittab:

hy:2345:respawn:/usr/local/sbin/hydragoesnightynight

The respawn line makes it respawn whenever its not running, no  matter what.
Even if the OOM killer should kill it, it'll be restarted by init and if init(8) gets killed off, there's a far bigger problem going on so it's perfect for my set up.
Anyways, initialize the new line with init q and it works.

The setup has been running for a couple months now and it suits my needs, YMMV of course.

Edit: refer to Label: wurmd for an easy way of waking the box back up.

For completeness' sake:
I'm running a combination of Debian testing and sid.
Linux: 3.0.0-1-486 (486 as I used the same disks as in my previous x86 compatible hardware).
NIC:  Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12).