I am trying to debug a device driver which apparently causes other task to

Question

0

Asked: May 29, 20262026-05-29T05:44:34+00:00 2026-05-29T05:44:34+00:00

I am trying to debug a device driver which apparently causes other task to

0

I am trying to debug a device driver which apparently causes other
task to hang. It is deterministic that which task or at what time it
will hang.

Basically I got some error message from kernel saying that “task has
been blocked for more than 120 seconds”, along with some stack trace.
The hung task vary from sendmail to mkfs to pdflush(a kernel thread”.
And the topmost function in the stack trace vary from “getnstimeofday”
to “bio_submit” to “mark_locks_held”.

I am having a hard time debugging this as it’s very hard to locate the
problem. The stack trace provided by the kernel is not very helpful
neither. According to those stack traces, some of those hung process
are not even trying to grab a lock (like in the getnstimeofday
function), and I have no idea why they hang.

So I am wondering if anyone have some idea on how to debug such a
problem. Would kgdb be useful here, maybe by giving me exactly at what
point the process hangs, and what kind of lock it is waiting for?

Any suggestions are appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T05:44:35+00:00

When you don’t have frame pointers enabled in the kernel, stack traces won’t be reliable, and it’s confusing you. The kernel resorts to scanning the entire stack for values that might be pointers into kernel code (i.e. potential return addresses). This means that past function calls that have already returned might still be printed.

If you had code that looked like this:

void A(void) {
    printk("foo\n");
}

void B(void) {
    int x;
    A();
}

void crash(void) {
    char buf[32];
    *(int*)0 = 0;
}

void trouble(void) {
    int x;
    B();
    crash();
}

Your stack dump might appear something like:

printk
A
crash
foo
trouble
...

As for how to debug your problem, I have two suggestions:

Knowing that some of the debug output is bad, use your own knowledge of the code to figure out the real call stack. It might help to look for the common functions across multiple stack dumps.
Recompile the kernel to use frame pointers.

The kernel will still print every value that looks like a return address, but it will flag the unreliable addresses with a “?”. So your stack dump might look like this instead:

? printk
? A
crash
? foo
trouble

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to debug a device driver which apparently causes other task to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply