I'm trying to pinpoint some platform issues, so I'm doing a deep dive into potential disk/RAID controller issues. Checking the controller serial log, I'm seeing a lot of different errors. Some are easy to diagnose (certain KCS codes). Others, not so much (other KCQs. I've spent the last week digging into SCSI command codes, KCQs, ASC/ASCQs, Sense Codes, Opcodes, etc. I've got a fair handle on most of the errors I'm seeing, except for one.
There's one group of errors that I have that I cannot decipher:
I get a pair of these errors for each SSD attached (16 SSD drives, 32 messages). I do not get any errors for the SAS disks. I can't find a KCQ 1:00:1D.
These errors correlate to periods of poor performance. It almost seems like there's a bus reset happening, but I would imagine that would impact the SAS drives as well. Any ideas?
Thanks in advance.