Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8970881
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T17:49:01+00:00 2026-06-15T17:49:01+00:00

General question: What could cause a script that works fine in itself, to hang

  • 0

General question: What could cause a script that works fine in itself, to hang if the script or shell (bash) command that calls it calls it into a variable?

In other words, how could there be a script that works when called like this… /path/to/script arg arg …fail and hang when called like this… VAR=$(/path/to/script arg arg);?


(major edits after noticing that a software glitch caused much initial testing to give incorrect results)


My specific case: I’ve got a script that works fine (to start, stop or restart the java application Apache Solr, adapted from here). Code is below, its command is sbin/service solr [action], for example sbin/service solr start.

When called from a script or directly from a console (bash in my case), like sbin/service solr start, it works fine and completes quickly. If, however, it’s called into a variable, like VAR=$(sbin/service solr start);, it works, but hangs with a futext / clock_gettime loop (trace below). It also hangs if it’s called not into a variable, but into strace.

Curiously, other scripts called in the same way with the same syntax, e.g. sbin/service httpd start, work just fine when called into a variable. So it’s clearly possible for there to be something about a script that makes it hang when the output is stored as a variable, but operate perfectly well when this isn’t the case.


Here’s the results of testing what calls hang and what don’t:

HANGS————————————————

  • VAR=$(/sbin/service solr start);
  • VAR=$(source /sbin/service solr start);
  • VAR=$(nohup /sbin/service solr start &);

(so it doesn’t matter which process it’s called from) Also, editting the script file to start the service using source causes the service to not work.

DOESN’T HANG————————————-

  • VAR=$(/sbin/service solr start >> /dev/null);

Outputting to /dev/null allows us to request the output without causing it to hang. However, it’s not much use, as no actual output is received.

  • /sbin/service solr start

Contrary to what I first thought. This outputs a simple update message which, ideally, we’d capture in a variable and log – but attempting to do so causes it to hang.

  • VAR=$(/sbin/service httpd restart);

The syntax that hangs works just fine on other service scripts, and the output of the script is passed to the variable without issue.


Here’s the full code of that script: (comments removed, naturally the $SOLR_DIR path is a real one in the real script)

SOLR_DIR="[path/to/application]"
JAVA_OPTIONS="-Xms64m -Xmx64m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar"
LOG_FILE="/var/log/solr.log"
JAVA="/usr/bin/java"

case $1 in
    start)
        echo "Starting Solr"
        cd $SOLR_DIR
        $JAVA $JAVA_OPTIONS 2> $LOG_FILE &
        ;;
    stop)
        echo "Stopping Solr"
        cd $SOLR_DIR
        $JAVA $JAVA_OPTIONS --stop
        ;;
    restart)
        $0 stop
        sleep 1
        $0 start
        ;;
    *)
        echo "Usage: $0 {start|stop|restart}" >&2
        exit 1
        ;;
esac

No errors or anything unusual in var/log/solr.log (log file named in script). Centos Linux server if that’s relevant.


In response to an earlier version of the question, @cdarke suggested I run strace -f -o strace.out /path/to/script on a script calling this script, and look in the (massive!) output file strace.out. It’s nearly 3mbs, here’s some observations:

  1. Starts with lots of activity that looks like the script functioning as intended.

  2. Then, the last 15% or so of the log file is this, repeated with different integers but seemingly the same hex codes:

…

25687 futex(0x688d454, FUTEX_WAIT_PRIVATE, 1, {0, 49980000}) = -1 ETIMEDOUT (Connection timed out)
25687 futex(0x688d428, FUTEX_WAKE_PRIVATE, 1) = 0
25687 clock_gettime(CLOCK_MONOTONIC, {39074112, 932735888}) = 0
25687 clock_gettime(CLOCK_REALTIME, {1355007234, 333458000}) = 0

These PIDs come up with nothing when put through ps -p even if I do so while the script is still running, while the output file is still getting bigger and while these lines of code are still being written. I’m not quite sure how that’s possible.

Here’s the final bit out the output before it gets in the never-ending futex/clock_gettime loop, after the last part that is clearly the script executing correctly (solr/solr.xml is a Solr config file, which needs to be read to start the Solr process):

25874 stat("solr/solr.xml", {st_mode=S_IFREG|0777, st_size=1320, ...}) = 0
25874 write(2, "Dec 8, 2012 5:12:05 PM org.apach"..., 106) = 106
25874 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 89
25874 fcntl(89, F_GETFL)                = 0x2 (flags O_RDWR)
25874 fcntl(89, F_SETFL, O_RDWR|O_NONBLOCK) = 0
25874 setsockopt(89, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
25874 bind(89, {sa_family=AF_INET, sin_port=htons(8983), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
25874 listen(89, 50)                    = 0
25874 setsockopt(89, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
25874 lseek(12, 57747, SEEK_SET)        = 57747
25874 read(12, "PK\3\4\n\0\0\0\10\0\221Vi>F\347\254\364\325\4\0\0002\t\0\0002\0\0\0", 30) = 30
25874 lseek(12, 57827, SEEK_SET)        = 57827
25874 read(12, "\225V\377oSU\24\377\334\273\256\257_\36l\216m\254\262\351\224\241]\273\255\200\314/\5\246c\200"..., 1237) = 1237
25874 futex(0x2aaab0173054, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2aaab0173050, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
25894 <... futex resumed> )             = 0
25894 futex(0x2aaab0173028, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
25874 <... futex resumed> )             = 1
25874 futex(0x2aaab0173028, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
25894 <... futex resumed> )             = 0
25894 futex(0x2aaab0173028, FUTEX_WAKE_PRIVATE, 1) = 0
25894 clock_gettime(CLOCK_REALTIME, {1355008325, 376033000}) = 0
25894 futex(0x2aaab0173054, FUTEX_WAIT_PRIVATE, 3, {0, 983000} <unfinished ...>
25874 <... futex resumed> )             = 1
25874 futex(0x2aaab0173054, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2aaab0173050, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
25894 <... futex resumed> )             = 0
25894 futex(0x2aaab0173028, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
25874 <... futex resumed> )             = 1
25874 futex(0x2aaab0173028, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
25894 <... futex resumed> )             = 0
25894 futex(0x2aaab0173028, FUTEX_WAKE_PRIVATE, 1) = 0
25894 poll([{fd=89, events=POLLIN|POLLERR}], 1, -1 <unfinished ...>
25874 <... futex resumed> )             = 1
25874 write(2, "2012-12-08 17:12:05.376:INFO::St"..., 66) = 66
25874 write(2, "\n", 1)                 = 1
25874 mmap(0x41348000, 12288, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x41348000
25874 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
25874 sched_getaffinity(25874, 32,  { ffff, 0, 0, 0 }) = 32
25874 sched_getaffinity(25874, 32,  { ffff, 0, 0, 0 }) = 32
25874 gettid()                          = 25874
25874 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
25874 rt_sigprocmask(SIG_UNBLOCK, [HUP ILL BUS FPE SEGV USR2 TERM], NULL, 8) = 0
25874 rt_sigprocmask(SIG_BLOCK, [QUIT], NULL, 8) = 0
25874 mmap(0x41348000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x41348000
25874 mprotect(0x41348000, 12288, PROT_NONE) = 0
25874 futex(0x10632d54, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
25882 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
25882 futex(0x106cc428, FUTEX_WAKE_PRIVATE, 1) = 0
25882 clock_gettime(CLOCK_MONOTONIC, {39075204, 21489888}) = 0
25882 clock_gettime(CLOCK_REALTIME, {1355008325, 422198000}) = 0
25882 futex(0x106cc454, FUTEX_WAIT_PRIVATE, 1, {0, 49984000}) = -1 ETIMEDOUT (Connection timed out)
25882 futex(0x106cc428, FUTEX_WAKE_PRIVATE, 1) = 0
25882 clock_gettime(CLOCK_MONOTONIC, {39075204, 72479888}) = 0
25882 clock_gettime(CLOCK_REALTIME, {1355008325, 473185000}) = 0
25882 futex(0x106cc454, FUTEX_WAIT_PRIVATE, 1, {0, 49987000}) = -1 ETIMEDOUT (Connection timed out)
25882 futex(0x106cc428, FUTEX_WAKE_PRIVATE, 1) = 0

So the last line before the death spiral is a read() on channel 12. Then it just loops futex and clock_gettime until it’s manually killed.


This last point may be irrelevant, but if, similar to in this question, I run the script that calls this script using nohup and divert output to /dev/null, I get the following near the start ( about 100kb into the output file): a huge number of these:

25664 close(67) = -1 EBADF (Bad file descriptor)

They go from 67, increasing +1 each time, to:

25664 close(1023) = -1 EBADF (Bad file descriptor)

Then they’re followed by

25664 open("/dev/null", O_RDWR) = 3

Again, the PIDs are empty, so far as I can see. Not sure if this is relevant – I guess this opens the possibility that using nohup with output to /dev/null really is a general fix for problems like this but I’m doing it wrong somehow, causing these errors.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T17:49:02+00:00Added an answer on June 15, 2026 at 5:49 pm

    I’m pretty sure the problem is that the shell is capturing output from the /sbin/service script and the solr service it starts, and hence will wait for the service to exit (or at least close its stdout) before continuing. Here’s a simple demo:

    $ bg_service() { while true; do sleep 10; done; }
    $ start_bg_service() { echo "starting"; bg_service& echo "running"; }
    $ start_bg_service 
    starting
    [1] 8656
    running
    $ var=$(start_bg_service)
    [It hangs at this point... until I open another shell and kill the background process]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a specific question, that could use a general answer... When building Multi-Tier
General question: Let's say you have a list of vertices that you pass into
general question is i like to build logger class that writes to single log
(This was meant to be a general hypothetical question, me whining that .NET was
This question is about a general technique in SQL, that I can't quite work
This is a general question that comes to answer an issue that interests me
this is a general question regarding XSLT, but since I could not find an
I'm working in AS3, but I guess this could be a general question, so
This is sort of a general question that has come up in several contexts,
Yes, I know that general forms of this question have been asked time and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.