Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6548587
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T11:59:18+00:00 2026-05-25T11:59:18+00:00

I have a python-based daemon that provides a REST-like interface over HTTP to some

  • 0

I have a python-based daemon that provides a REST-like interface over HTTP to some command line tools. The general nature of the tool is to take in a request, perform some command-line action, store a pickled data structure to disk, and return some data to the caller. There’s a secondary thread spawned on daemon startup that looks at that pickled data on disk periodically and does some cleanup based on what’s in the data.

This works just fine if the disk where the pickled data resides happens to be local disk on a Linux machine. If you switch to NFS-mounted disk the daemon starts life just fine, but over time the NFS-mounted share “disappears” and the daemon can no longer tell where it is on disk with calls like os.getcwd(). You’ll start to see it log errors like:

2011-07-13 09:19:36,238 INFO Retrieved submit directory '/tech/condor_logs/submit'
2011-07-13 09:19:36,239 DEBUG CondorAgent.post_submit.do_submit(): handler.path: /condor/submit?queue=Q2%40scheduler
2011-07-13 09:19:36,239 DEBUG CondorAgent.post_submit.do_submit(): submitting from temporary submission directory '/tech/condor_logs/submit/tmpoF8YXk'
2011-07-13 09:19:36,240 ERROR Caught un-handled exception: [Errno 2] No such file or directory
2011-07-13 09:19:36,241 INFO submitter - - [13/Jul/2011 09:19:36] "POST /condor/submit?queue=Q2%40scheduler HTTP/1.1" 500 -

The un-handled exception resolves to the daemon being unable to see the disk any more. Any attempts to figure out the daemon’s current working directory with os.getcwd() at this point will fail. Even trying to change to the root of the NFS mount /tech, will fail. All the while the logger.logging.* methods are happily writing out log and debug messages to a log file located on the NFS-mounted share at /tech/condor_logs/logs/CondorAgentLog.

The disk is most definitely still available. There are other, C++-based daemons, reading and writing with a much higher rate of frequency on this share at the time that the python-based daemon.

I’ve come to an impasse debugging this problem. Since it works on local disk the general structure of the code must be good, right? There’s something about NFS-mounted shares and my code that are incompatible but I can’t tell what it might be.

Are there special considerations one must implement when dealing with a long-running Python daemon that will be reading and writing frequently to an NFS-mounted file share?


If anyone wants to see the code the portion that handles the HTTP request and writes the pickled object to disk is in github here. And the portion that the sub-thread uses to do periodic cleanup of stuff from disk by reading the pickled objects is here.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T11:59:19+00:00Added an answer on May 25, 2026 at 11:59 am

    I have the answer to my problem and it had nothing to with the fact that I was doing file I/O on an NFS share. It turns out the problem just showed up faster if the I/O was over an NFS mount versus local disk.

    A key piece of information is that the code was running threaded via the SocketServer.ThreadingMixIn and HTTPServer classes.

    My handler code was doing something close to the following:

    base_dir = getBaseDirFromConfigFile()
    current_dir = os.getcwd()
    temporary_dir = tempfile.mkdtemp(dir=base_dir)
    chdir(temporary_dir)
    doSomething()
    chdir(current_dir)
    cleanUp(temporary_dir)
    

    That’s the flow, more or less.

    The problem wasn’t that the I/O was being done on NFS. The problem was that os.getcwd() isn’t thread-local, it’s a process global. So as one thread issued a chdir() to move to the temporary space it just created under base_dir, the next thread calling os.getcwd() would get the other thread’s temporary_dir instead of the static base directory where the HTTP server was started in.

    There’s some other people reporting similar issues here and here.

    The solution was to get rid of the chdir() and getcwd() calls. To startup and stay in one directory and access everything else through absolute paths.

    The NFS vs local file stuff through me for a loop. It turns out my block:

    chdir(temporary_dir)
    doSomething()
    chdir(current_dir)
    cleanUp(temporary_dir)
    

    was running much slower when the filesystem was NFS versus local. It made the problem occur much sooner because it increased the chances that one thread was still in doSomething() while another thread was running the current_dir = os.getcwd() part of the code block. On local disk the threads moved through the entire code block so quickly they rarely intersected like that. But, give it enough time (about a week), and the problem would crop up when using local disk.

    So a big lesson learned on thread safe operations in Python!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I develop a Python-based drawing program, Whyteboard . I have tools that the user
I have a Python-based app that can accept a few commands in a simple
For our company I'd like to have a Python based IRC bot which checks
I have a working Python based program that I want to run as a
So I'm writing yet another Twisted based daemon. It'll have an xmlrpc interface as
I have this Python based service daemon which is doing a lot of multiplexed
I have a python based application which works like a feed aggregator and needs
I have a python-based program that reads serial data off an a port connected
I have a Python-based program that adds events to a Google Calendar. To add
Python based Unit test Frameworks like nose have a lot of rich features, i

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.