Some operating systems have a reputation for needing regular reboots to keep them happy. Rebooting a Linux server is thankfully very rare, but occasionally they do get so stuck that only force will do. Let’s look at three methods of regaining order, by increasing order of desperation.
Of course a reboot will usually mean a service interruption. That can be mitigated by using high-availability techniques, so that even emergency maintenance for one server doesn’t compromise the service overall.
The gentle way: requesting a reboot
Most Linux systems are configured so that the key combination Ctrl+Alt+Del triggers a shutdown and restart procedure. Just like in a shutdown, applications get a small amount of time to clean up and make sure their data is safe before being closed. If an application isn’t responding, it gets stopped forcefully after a timeout.
This doesn’t work if you don’t have physical access to the server and a keyboard plugged into it, though. The best you can do in a remote situation is shutdown -r now
or reboot
, which both assume that the system is responsive enough to acknowledge the command.
Another alternative is to have a remote network-accessible unit simulate a keyboard, independent from the problematic server. That allows you to ‘press’ Ctrl-Alt-Del remotely, but at the risk that there’s another piece of hardware involved.
The abrupt way: triggering a hardware reset
You might have wondered what the mysterious SysRq key is used for: it’s to send instructions directly into the Linux kernel and bypass the whole of userspace, avoiding a misbehaved application eating the keyboard input.
There are several useful key combinations to get the kernel’s attention; they all start by holding down Alt and SysRq and then simultaneously pressing:
- R: switch the keyboard from raw mode, ready for further instructions
- E: send a graceful
TERM
signal to all processes except init - I: send a forceful
KILL
signal to all processes except init - S: sync all filesystems to disk
- U: mount all filesystems read-only
- B: trigger a hardware reset
Use the signals in this order for an orderly shutdown, but only from the physical console. If you have remote access only, skip out the ‘E’, ‘I’, and ‘U’ steps – otherwise your remote shell will be terminated before you reach ‘B’!
The downright rude way: pull the power
If all else fails, unplugging the server is a sure-fire way to get control back. This is the most risky though, because the OS has no opportunity to make sure data is safely on disk before resetting hardware.
Most distributions will automatically try to recover filesystems during the next boot and more often than not they are successful, but it’s never a guarantee.
Remote power control units can be useful if you don’t have physical access, although they also suffer from the same redundancy problem as remote keyboards.
Photo by Markus Spiske on Unsplash