Server crashes

Hi,

Proliant DL 165 G7. P410/256 Int and P212 Ext with D2600 Disk enclosure.

Linux kernel 2.6.18-194.17.4.el5xen , RedHat 5.5. P410 has a mirrored pair of 146GB SAS disk for OS.

This server has crashed about 4 times in the last two weeks. The problem seems to be with the P212 controller and/or the disk enclosure. The symptons are that the server becomes un-responsive and the console will show a panic message like this (big post sorry):

[419160.430190] INFO: task jbd2/dm-0-8:584 blocked for more than 120 seconds.

[419160.430224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[419160.430250] jbd2/dm-0-8 D ffff88000959e980 0 584 2 0x00000000

[419160.430268] ffff8805e3d53c10 0000000000000246 ffffffff00000000 0000000000015980

[419160.430289] ffff8805e3d53fd8 0000000000015980 ffff8805e3d53fd8 ffff8805e0bbadc0

[419160.430308] 0000000000015980 0000000000015980 ffff8805e3d53fd8 0000000000015980

[419160.430327] Call Trace:

[419160.430350] [<ffffffff8117d510>] ? sync_buffer+0x0/0x50

[419160.430367] [<ffffffff8159c923>] io_schedule+0x73/0xc0

[419160.430379] [<ffffffff8117d555>] sync_buffer+0x45/0x50

[419160.430391] [<ffffffff8159cf9f>] __wait_on_bit+0x5f/0x90

[419160.430402] [<ffffffff8117d510>] ? sync_buffer+0x0/0x50

[419160.430413] [<ffffffff8159d048>] out_of_line_wait_on_bit+0x78/0x90

[419160.430427] [<ffffffff8107f0c0>] ? wake_bit_function+0x0/0x40

[419160.430439] [<ffffffff8117d506>] __wait_on_buffer+0x26/0x30

[419160.430454] [<ffffffff8122874a>] jbd2_journal_commit_transaction+0x97a/0x1350

[419160.430469] [<ffffffff81008696>] ? __switch_to+0x166/0x320

[419160.430483] [<ffffffff8159e87e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30

[419160.430498] [<ffffffff81070933>] ? try_to_del_timer_sync+0x83/0xe0

[419160.430512] [<ffffffff8122d87d>] kjournald2+0xbd/0x220

[419160.430523] [<ffffffff8107f080>] ? autoremove_wake_function+0x0/0x40

[419160.430534] [<ffffffff8122d7c0>] ? kjournald2+0x0/0x220

[419160.430545] [<ffffffff8107eb26>] kthread+0x96/0xa0

[419160.430557] [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10

[419160.430570] [<ffffffff8100a313>] ? int_ret_from_sys_call+0x7/0x1b

[419160.430582] [<ffffffff8159ee1d>] ? retint_restore_args+0x5/0x6

[419160.430594] [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10

[419160.430627] INFO: task perl:31318 blocked for more than 120 seconds.

[419160.430647] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[419160.430667] perl D ffff880009562980 0 31318 31221 0x00000000

[419160.430684] ffff88059db11a58 0000000000000286 ffffffff00000000 0000000000015980

[419160.430702] ffff88059db11fd8 0000000000015980 ffff88059db11fd8 ffff880003ad8000

[419160.430720] 0000000000015980 0000000000015980 ffff88059db11fd8 0000000000015980

[419160.430738] Call Trace:

[419160.430750] [<ffffffff8117d510>] ? sync_buffer+0x0/0x50

[419160.430761] [<ffffffff8159c923>] io_schedule+0x73/0xc0

[419160.430772] [<ffffffff8117d555>] sync_buffer+0x45/0x50

[419160.430783] [<ffffffff8159cf9f>] __wait_on_bit+0x5f/0x90

[419160.430794] [<ffffffff8117d510>] ? sync_buffer+0x0/0x50

[419160.430805] [<ffffffff8159d048>] out_of_line_wait_on_bit+0x78/0x90

[419160.430817] [<ffffffff8107f0c0>] ? wake_bit_function+0x0/0x40

[419160.430828] [<ffffffff8117d506>] __wait_on_buffer+0x26/0x30

[419160.430841] [<ffffffff811f56f2>] ext4_find_entry+0x1a2/0x4a0

[419160.430854] [<ffffffff81168a01>] ? d_delete+0xc1/0x100

[419160.430866] [<ffffffff81168a67>] ? d_alloc+0x27/0x1c0

[419160.430878] [<ffffffff811f5a3d>] ext4_lookup+0x4d/0x110

[419160.430889] [<ffffffff8115f193>] do_lookup+0x1e3/0x280

[419160.430900] [<ffffffff8115fe7d>] link_path_walk+0x4cd/0xab0

[419160.430911] [<ffffffff811605c7>] path_walk+0x67/0xe0

[419160.430922] [<ffffffff8116079b>] do_path_lookup+0x5b/0xa0

[419160.430932] [<ffffffff81161467>] user_path_at+0x57/0xa0

[419160.430943] [<ffffffff810043c6>] ? xen_mc_flush+0x96/0x1c0

[419160.430956] [<ffffffff81006b3d>] ? xen_force_evtchn_callback+0xd/0x10

[419160.430968] [<ffffffff810072d2>] ? check_events+0x12/0x20

[419160.430980] [<ffffffff8115756c>] vfs_fstatat+0x3c/0x80

[419160.430991] [<ffffffff810072d2>] ? check_events+0x12/0x20

[419160.431002] [<ffffffff8115768b>] vfs_stat+0x1b/0x20

[419160.431013] [<ffffffff811576b4>] sys_newstat+0x24/0x50

[419160.431025] [<ffffffff810072bf>] ? xen_restore_fl_direct_end+0x0/0x1

[419160.431035] [<ffffffff810041a1>] ? xen_clts+0x71/0x80

[419160.431046] [<ffffffff8100b322>] ? math_state_restore+0x42/0x60

[419160.431059] [<ffffffff8159f3de>] ? do_device_not_available+0xe/0x10

[419160.431070] [<ffffffff8100b00b>] ? xen_hypervisor_callback+0x1b/0x20

[419160.431082] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b

[419160.431092] INFO: task perl:31320 blocked for more than 120 seconds.

[419160.431111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[419160.431131] perl D ffff880009562980 0 31320 31221 0x00000000

[419160.431148] ffff880563873a58 0000000000000282 ffffffff00000000 0000000000015980

[419160.431164] ffff880563873fd8 0000000000015980 ffff880563873fd8 ffff880003ad96e0

[419160.431184] 0000000000015980 0000000000015980 ffff880563873fd8 0000000000015980

[419160.431200] Call Trace:

[419160.431212] [<ffffffff8117d510>] ? sync_buffer+0x0/0x50

...

....

dm-0 is one of the LVM volumes on the disk enclosure.

At this point, I have to power-down the server as nothing I have tried seems to make the server close down gracefully. When the server comes up I get a lot of message like the one below.

ioctl32(cmaidad:5612): Unknown cmd fd(6) cmd(c058420b){00} arg(ffe64520) on /dev /cciss/c0d0

ioctl32(cmaeventd:5495): Unknown cmd fd(5) cmd(c058420b){00} arg(fff1c460) on /d ev/cciss/c0d0

ioctl32(cmaeventd:5495): Unknown cmd fd(5) cmd(c058420b){00} arg(fff1c460) on /d ev/cciss/c1d0

During the POST, there is a message from the P212 controller about "a controller failure event occurred prior to this power-up, previous lock code..."

I have seen other posts about the ioctl32 error. One of the posts on the HP forum said the problem was resolved when he changed the cable. I switched the SAS cable but I am still getting the error. I applied the last firmware (5.6) and used the 9.5 firmware DVD to flash all the components on the system two days ago. I still get the messages above. The server was not resposive yesterday morning and I have to power it down again.

I do not know if the messages above as related to the crash. I'd like to hear any suggestions at this point because the current situation is loosing me sleep.

Thanks in advance.

Dermot

Server crashes

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112