Thursday, July 23, 2009

Using gdb for debugging kernel modules

This is a very simple and maybe trivial post.. and nothing novel ! But something that has helped me a lot in debugging issues with my kernel modules and kernel panics in general. So, when you get a panic originating in a kernel module, you normally get it in the following format:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
IP: [<ffffffff8023262c>] mmput+0x11/0xb0
PGD 22bcf6067 PUD 211839067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/block/sdb/size
Modules linked in: <a routine in my module> ehci_hcd uhci_hcd [last unloaded: <my module>]
Pid: 26830, comm: <my kernel thread> Not tainted 2.6.28-anki #13
RIP: 0010:[<ffffffff8023262c>] [<ffffffff8023262c>] mmput+0x11/0xb0
RSP: 0000:ffff8802279fbed0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000045
RDX: ffff88022e150300 RSI: 0000000000000282 RDI: 0000000000000000
RBP: ffff8802279fbee0 R08: 0000000000000018 R09: ffff88022e150300
R10: ffff88022e150300 R11: 00007fffe6dfefff R12: ffff88022e52f860
R13: ffff88022e150300 R14: ffff88022d99aed0 R15: ffff88022d99b110
FS: 0000000000000000(0000) GS:ffff88022f818280(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000050 CR3: 0000000211829000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process memref (pid: 26830, threadinfo ffff8802279fa000, task
ffff88022e150300 0000000000000000 ffff8802279fbf20 ffffffffa003314b
ffff8802279fbf20 ffff88022d99aed0 ffffffffa00330f2 ffff880210cf1c68
ffff88022f855ee0 ffff88022f855f00 ffff8802279fbf40 ffffffff802468e3
Call Trace:
[<ffffffffa003314b>] <routine in my module>+0x59/0xb2 [<my kernel thread>]
[<ffffffffa00330f2>] ? <my modulee>+0x0/0xb2 [<my kernel thread>]
[<ffffffff802468e3>] kthread+0x49/0x76
[<ffffffff8020cce9>] child_rip+0xa/0x11
[<ffffffff802277a5>] ? dequeue_task+0xbf/0xca
[<ffffffff8024689a>] ? kthread+0x0/0x76
[<ffffffff8020ccdf>] ? child_rip+0x0/0x11
Code: df e8 88 c7 fd ff 48 8b 3d ba 28 6b 00 48 89 de e8 b1 af 05 00 41 59 5b
c9 c3 55 48 89 e5 53 48 89 fb 48 83 ec 08 e8 6a fa 39 00 <f0> ff 4b 50 0f 94
c0 84 c0 0f 84 8b 00 00 00 48 89 df e8 8b 76
RIP [<ffffffff8023262c>] mmput+0x11/0xb0
RSP <ffff8802279fbed0>
CR2: 0000000000000050

Now, to find out from where exactly in my module the panic happened, I simply use gdb, as below:

#gdb <path to my module.o file from the kernel source root dir> [for example, #gdb drivers/misc/ankita.o]
(gdb) list *(the routine)+0x59

The above gives a source listing, pointing to the lines which correspond to 0x59 address. This serves as a good starting point to debug the issue. Hope this helps some folks atleast :-)

1 comment:

Sankar said...

Thanks. Saved my day.