md array with disk gone missing – recovering data

Had a server that decided to drop a disk (or disk went faulty) in a RAID5 array. On reboot the array didn’t want to start. Output of mdadm --detail /dev/md0

/dev/md0:
Version : 1.2
Creation Time : Wed May 22 18:17:58 2013
Raid Level : raid5
Used Dev Size : -1
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Sun Jan 3 06:50:05 2021
State : active, degraded, Not Started
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : nebula:0 (local to host nebula)
UUID : 962b8ff0:00d88161:5a030e1f:236466af
Events : 31168

Number Major Minor RaidDevice State
4 8 16 0 active sync /dev/sdb
1 0 0 1 removed
3 8 48 2 active sync /dev/sdd

Try to start array with mdadm --run /dev/md0

mdadm: failed to run array /dev/md0: Input/output error

As expected, given it didn’t autorun on boot.

Trying mdadm --assemble --run /dev/md0 /dev/sdb /dev/sdd

mdadm: /dev/sdb is busy - skipping
mdadm: /dev/sdd is busy - skipping

This is because the md device is using the disks. Stop it: with mdadm --stop /dev/md0

then reassemble with mdadm --assemble --run --force /dev/md0 /dev/sd[bd]

mdadm: Marking array /dev/md0 as 'clean'
mdadm: /dev/md0 has been started with 2 drives (out of 3).

Ran mount -a to rerun fstab. Took a few seconds but worked. Copy data off ASAP!

 

RAID Monitoring: PERC H710 Mini and Debian

Debian 7 works with the PERC H710 Mini in a Dell R520 out of the box. To monitor this however you need a binary from LSI. Helpfully, this comes in a packaged form from http://hwraid.le-vert.net/wiki/DebianPackages

http://blog.mattandanne.org/2012/01/hardware-raid-controllersrequire.html pointed me in the right direction. To install, add the repo in a .list file:

deb http://hwraid.le-vert.net/debian wheezy main

and then add the key:

wget -O - http://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | apt-key add -

Update the repository lists, and then install the relevant package:

apt-get install megaclisas-status

Running megaclisas-status should then give the status of the array(s). The script is set up by default to email root every two hours if there is a problem. These defaults can be overridden using a defaults file:

/etc/default/megaclisas-statusd

You need to create this if necessary. The defaults are:

MAILTO=root # Where to report problems
PERIOD=600 # Seconds between each check (default 10 minutes)
REMIND=7200 # Seconds between each reminder (default 2 hours)
RUN_DAEMON=yes

(can be found in)

/etc/init.d/megaclisas-statusd

If you have the system set up to divert root email to you then it should just work.