Hopefully someone can help, I have a challenging situation that I cannot not seem to script for. My aim is to automate loading SQL files into PostgreSQL.
I wont know how many folders of SQL files I have so intially I check a folder exists and then loop through each file and load it into PostgreSQL using psql.exe
My current code looks like this
if os.path.exists("sql1"):
for files in os.listdir("sql1"):
load1 = subprocess.Popen("psql -d data -U postgres -f sql1\%s" %files)
if os.path.exists("sql2"):
for files in os.listdir("sql2"):
load2 = subprocess.Popen("psql -d data -U postgres -f sql2\%s" %files)
However this spools so many subprocesses as it creates a subprocess for each SQL file in the folder as well as more subprocesses for each folder.
If I change it to a subprocess.call it will of course seriliase the loading and block loading the files from the next folder, rather than running a single process for each folder.
Does anyone know how I could create a single process for each folder that exists?
In addition to this I will then run the indexes but only once all processes have finished.
I could use load.wait() but that would only work for one process.
thanks for advice and help in advance
EDIT ADDED:
Taking Steve’s advice I introduced some threads but it still causes the indexing to start before the subprocesses have finished
def threads(self):
processors = multiprocessing.cpu_count()
n = 1
name = "sql%i" %n
for i in range(processors):
if os.path.exists(name):
thread = Thread(target=self.loadData, args=(name,))
thread.start()
n += 1
name = "sql%i" %n
def loadData(self, name):
for files in os.listdir(name):
load = subprocess.Popen("psql -d osdata -U postgres -f %s\%s" %(name, files))
load.wait()
But the indexing starts before the processes have finished.
Any ideas how to prevent that
I would suggest creating a thread for each folder. Then use
subprocess.callto serialise calls within each thread.If you want to throttle the number of threads executing concurrently, you should look at Python’s futures module.
http://docs.python.org/dev/library/concurrent.futures.html