I’ve deployed a Django application on Heroku. The application by itself works fine. I

Question

0

Asked: May 29, 20262026-05-29T05:13:53+00:00 2026-05-29T05:13:53+00:00

I’ve deployed a Django application on Heroku. The application by itself works fine. I

0

I’ve deployed a Django application on Heroku. The application by itself works fine. I can run commands such as heroku run python project/manage.py syncdband heroku run python project/manage.py shell and this works well.

My Django project makes use of the Python web scraping library called Scrapy. Scrapy comes with a command called scrapy crawl abc which helps me scrape websites I have defined in the scrapy application. When I run a scrapy command such as scrapy crawl spidername on my local machine, the application is able to scrape date and copy it to my database. However when I run the same command on Heroku under a sub-directory of my project directory heroku run scrapy crawl spidername, nothing happens.

I don’t see anything in the Heroku logs which can point to where I’m going wrong:

2012-01-26T15:45:38+00:00 heroku[run.1]: State changed from created to starting
2012-01-26T15:45:43+00:00 app[run.1]: Awaiting client
2012-01-26T15:45:43+00:00 app[run.1]: Starting process with command `project/spiderMainDir scrapy crawl spidername`
2012-01-26T15:45:44+00:00 heroku[run.1]: State changed from starting to up
2012-01-26T15:45:46+00:00 heroku[run.1]: State changed from up to complete
2012-01-26T15:45:46+00:00 heroku[run.1]: Process exited

Some additional information:

My scrapy app calls pipelines.py to save the scraped items to the database. In the pipelines.py file, this is what I’ve written to invoke the Django settings so that I can import my models and save data to the database from the scrapy application.

import os,sys
PROJECT_PATH = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(PROJECT_PATH)
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'

Any pointers on where exactly am I going wrong? How do I execute the scrapy command on Heroku such that my application can scrape an external website and save that data to the database. Isn’t the way external commands are run in Heroku like – heroku run command?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T05:13:54+00:00

I’m answering my own question because I discovered what the problem was. Heroku for some reason was not able to find scrapy when I executed the command from a sub-directory and not the top-level directory.

The command heroku run ... is generally run from the top-level directory. For my project which uses scrapy, I was required to go to a sub-directory and run the scrapy command from the sub-directory (this is how scrapy is designed). This wasn’t working in Heroku. So I went to the Heroku bash by typing heroku run bash to see what was going on. When I ran the scrapy command from the top-level directory, Heroku recognized the command but when I went to a sub-directory, it failed to recognize the scrapy command. I suppose there is some problem related to the path. From the sub-directory, I had to specify the complete path to scrapy (~/bin/scrapy crawl spidername) to be able to execute it.

To run the scrapy command without going to the Heroku bash manually each time, my work around this problem was that I created a shell script containing the following code and put it under the bin directory of my top-level directory and pushed the changes to Heroku.

bin/scrapy.sh :

#!/usr/bin/env bash 
cd ~/project/spiderSubDirectory
~/bin/scrapy $@

After this was done, I could execute $ heroku run scrapy.sh crawl spidername from my local bash. I suppose its not the best solution but this works.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve deployed a Django application on Heroku. The application by itself works fine. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply