Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9275967
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T16:43:08+00:00 2026-06-18T16:43:08+00:00

I’ve managed to write a Scrapy project that scrapes data from a web page,

  • 0

I’ve managed to write a Scrapy project that scrapes data from a web page, and when I call it with the scrapy crawl dmoz -o items.json -t json at the command line, it successfully outputs the scraped data to a JSON file.

I then wrote another script that takes that JSON file, loads it, changes the way the data is organized (I didn’t like the default way it was being organized), and spits it out as a second JSON file. I then use Django’s manage.py loaddata fixture.json command to load the contents of that second file into a Django database.

Now, I’m sensing that I’m going to get laughed out of the building for doing this in three separate steps, but I’m not quite sure how to put it all together into one script. For starters, it does seem really stupid that I can’t just have my Scrapy project output my data in the exact way that I want. But where do I put the code to modify the ‘default’ way that Feed exports is outputting my data? Would it just go in my pipelines.py file?

And secondly, I want to call the scraper from inside a python script that will then also load the resulting JSON fixture into my database. Is that as simple as putting something like:

from twisted.internet import reactor  
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log
from testspiders.spiders.followall import FollowAllSpider

spider = FollowAllSpider(domain='scrapinghub.com')
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

at the top of my script, and then following it with something like:

from django.something.manage import loaddata

loaddata('/path/to/fixture.json')

? And finally, is there any specific place this script would have to live relative to both my Django project and the Scrapy project for it to work properly?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T16:43:10+00:00Added an answer on June 18, 2026 at 4:43 pm
    1. Exactly that. Define a custom item pipeline in pipelines.py to output the item data as desired, then add the pipeline class to settings.py. The scrapy documentation has a JSONWriterPipeline example that may be of use.
    2. Well, on the basis that the script in your example was taken from the scrapy documentation, it should work. Have you tried it?
    3. The location shouldn’t matter, so long as all of the necessary imports work. You could test this by firing a Python interpreter in the desired location and then checking all of the imports one by one. If they all run correctly, then the script should be fine.

    If anything goes wrong, then post another question in here with the relevant code and what was tried and I’m sure someone will be happy to help out. 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

That's pretty much it. I'm using Nokogiri to scrape a web page what has
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I would like my Web page http://www.gmarks.org/math_in_e-mail.txt on my Apache 2.2.14 server to display
I am using jsonparser to parse data and images obtained from json response. When
link Im having trouble converting the html entites into html characters, (&# 8217;) i
For some reason, after submitting a string like this Jack’s Spindle from a text
I've got a string that has curly quotes in it. I'd like to replace
I have a small JavaScript validation script that validates inputs based on Regex. I
I have a French site that I want to parse, but am running into
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.