Our frontend web server went down at 6:24CET this morning, we will be updating this post as we bring the server back up. Here’s what we know right now:
- At 6:24 CET a Kernel oops occured. The alarms at our hosting provider went off, and the server was booted.
- Since the file system keeping the repositories hasn’t had a full consistency check since August 2012 a fsck was started
- When fsck hadn’t completed at 8:00 CET, the server was routinely rebooted, and another fsck process was started at 8:04 CET
- The last time we ran a full fsck on the file system, it took about 2.5 hours. Since then, however, we have installed dedicated storage for our servers, and this has higher IO capacity than the one we were running from in August last year.
- 10:06 CET: The server is back up. We will upgrade the kernel and do another reboot, hopefully the kernel issue we encountered earlier today has been resolved. Expect a few minutes downtime in a few minutes
- 10:13 CET: All systems are running again, with an updated kernel