6.5 Monitoring RAID arrays
You can run mdadm as a daemon by using the follow-monitor mode.
If needed, that will make mdadm send email alerts to the system
administrator when arrays encounter errors or fail. Also, follow mode
can be used to trigger contingency commands if a disk fails, like
giving a second chance to a failed disk by removing and reinserting it,
so a non-fatal failure could be automatically solved.
Let's see a basic example.
Running
mdadm --monitor --mail=root@localhost --delay=1800 /dev/md2
should release a mdadm daemon to monitor /dev/md2.
The delay parameter means that polling will be done in intervals of
1800 seconds. Finally, critical events and fatal errors should be
e-mailed to the system manager. That's RAID monitoring made easy.
Finally, the --program or --alert parameters
specify the program to be run whenever an event is detected.
Note that the mdadm daemon will never exit once it decides that
there are arrays to monitor, so it should normally be run in the
background. Remember that your are running a daemon, not a
shell command.
Using mdadm to monitor a RAID array is simple and effective. However,
there are fundamental problems with that kind of monitoring - what
happens, for example, if the mdadm daemon stops? In order to overcome
this problem, one should look towards "real" monitoring
solutions. There is a number of free software, open source, and
commercial solutions available which can be used for Software RAID
monitoring on Linux. A search on
FreshMeat should return a good number of matches.