ProLiant ML350 G5 running SLES 11 in a RAID5 configuration.
I've been installing new SAS drives in our array, one at a time and waiting for each rebuild successfully before adding the next drive. The last drive I installed resulted in numerous write errors on the swap-device and services becoming slow and sluggish. Disable the swap file for the short term has alleviated any slow downs. However, the array is still showing a status of "ready for rebuild" with no further activity. It's trying to rebuild the boot partition (/dev/cciss/c0d0). The HPSMH is showing drive 6 as having 14961 read errors. That number has not increased since turning off swap.
/var/log/messages only seems to report errors on the "swap-device" and not necessarily on any real data area. And it's write errors, whereas hpsmh reports read errors.
The drive POST complained about is in 1I:1:3.
but
hpsmh reports read errors on the drive 2I:1:6
Those are the 2 most recent drive swaps I performed.
I have tried booting from a linux livecd and used a DD command to image the drives/partitions. It was going excruciatingly slow on the boot partition because of the disk errors so I proceeded to image the other drives/partitions. As we got into the backup process it was going to take 15 hours to complete and then another 15 to restore once the array was reconfigured after removing the faulty drives. That was not feasible so that planned was scrapped.
SO, at this point, kind of stuck. Thought about throwing a USB 3.0 expansion card in to increase the throughput of the backups but turns out this server does not respond well to USB 3.0 because of its age.
My question is this: If I remove the drive (bay 6) that hpsmh is reporting as having read errors while logical drive 1 is stuck on "ready for rebuild" am I going to destroy my raid? Same goes for the drive that /var/log/messages reports as having write errors, if I remove that drive and install a fresh drive, will I destroy the raid and/or boot partition?
I'm not terribly fluent with linux or raid configuration so let me know if I'm not clear with any of this. I have a diagnostic file created with hpacucli if that would be helpful. My other option is to upgrade the server itself due to its age. Was planning to upgrade servers this year anyway.
---------------------------------------------
# /opt/compaq/hpacucli/bld/hpacucli ctrl all show config detail
Smart Array E200i in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: QT83MP3021
Cache Serial Number: P9A3A0BXQUD0PI
RAID 6 (ADG) Status: Disabled
Controller Status: OK
Hardware Revision: A
Firmware Version: 1.86
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Surface Scan Mode: Idle
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Cache Status Details: A cache error was detected. Run a diagnostic report for more information.
Cache Ratio: 50% Read / 50% Write
Drive Write Cache: Disabled
Total Cache Size: 128 MB
Total Cache Memory Available: 96 MB
No-Battery Write Cache: Disabled
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: FalseArray: A
Interface Type: SAS
Unused Space: 0 MB
Status: OK
Array Type: DataLogical Drive: 1
Size: 32.0 GB
Fault Tolerance: RAID 5
Heads: 255
Sectors Per Track: 32
Cylinders: 8224
Strip Size: 64 KB
Full Stripe Size: 448 KB
Status: Ready for Rebuild
Caching: Enabled
Parity Initialization Status: Initialization Completed
Unique Identifier: 600508B1001032333720202020200002
Disk Name: /dev/cciss/c0d0
Mount Points: / 30.0 GB
OS Status: LOCKED
Logical Drive Label: AFC929D4QT7BMU0237 4700
Drive Type: Data
Logical Drive: 2
Size: 924.9 GB
Fault Tolerance: RAID 5
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Strip Size: 64 KB
Full Stripe Size: 448 KB
Status: OK
Caching: Enabled
Parity Initialization Status: Initialization Completed
Unique Identifier: 600508B1001032333720202020200003
Disk Name: /dev/cciss/c0d1
Mount Points: None
OS Status: LOCKED
Logical Drive Label: AC2929DFQT7BMU0237 5207
Drive Type: Dataphysicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 10000
Firmware Revision: HPDF
Serial Number: 3SE218NT00009038WDUR
Model: HP EG0300FAWHV
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknownphysicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 10000
Firmware Revision: HPDF
Serial Number: 3SE1YRLG00009037N4EM
Model: HP EG0300FAWHV
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknownphysicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 10000
Firmware Revision: HPDE
Serial Number: 6SE1K84Y0000B116KXFY
Model: HP EG0300FAWHV
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknownphysicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 146 GB
Rotational Speed: 10000
Firmware Revision: HPDF
Serial Number: 3SD25WGZ00009021VSVN
Model: HP DG0146FAMWL
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknownphysicaldrive 2I:1:5
Port: 2I
Box: 1
Bay: 5
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 10000
Firmware Revision: HPD6
Serial Number: ECA1PCC0JRC31251
Model: HP EG0300FBDSP
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknownphysicaldrive 2I:1:6
Port: 2I
Box: 1
Bay: 6
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 10000
Firmware Revision: HPDF
Serial Number: 3SE21B0E00009038VY45
Model: HP EG0300FAWHV
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknownphysicaldrive 2I:1:7
Port: 2I
Box: 1
Bay: 7
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 10000
Firmware Revision: HPDF
Serial Number: 3SE1YGHH00009037KFUJ
Model: HP EG0300FAWHV
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknownphysicaldrive 2I:1:8
Port: 2I
Box: 1
Bay: 8
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 10000
Firmware Revision: HPDF
Serial Number: 3SE200DE00009014LY7R
Model: HP EG0300FAWHV
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknown