Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8381981
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T16:49:18+00:00 2026-06-09T16:49:18+00:00

I use Celery to run web spiders which crawl some data, and after that

  • 0

I use Celery to run web spiders which crawl some data, and after that I need to save this data somewhere in database (SQLite for example), but as I understand I can’t share SQLAlchemy session between Celery workers. How do you solve this problem? Which way is common?

Currently I am trying to use Redis as a middle storage for data.

@celery.task
def run_spider(spider, task):
    # setup worker
    logger = logging.getLogger('Spider: %s' % spider.url)
    spider.meta.update({'logger': logger, 'task_id': int(task.id)})

    # push task data inside worker
    spider.meta.update({'task_request': run_spider.request})

    spider.run()

    task.state = "inactive"
    task.resolved = datetime.datetime.now()
    db.session.add(task)
    db.session.commit()

EDIT: Actually i was wrong, i don’t need to share sessions, i need to create new database connection for each celery process/task.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T16:49:19+00:00Added an answer on June 9, 2026 at 4:49 pm

    I too have used redis for persistence in a large celery application.

    It is common for my tasks to look like this:

    @task
    def MyTask(sink, *args, **kwargs):
        data_store = sharded_redis.ShardedRedis(sink)
        key_helper = helpers.KeyHelper()
        my_dictionary = do_work()
        data_store.hmset(key_helper.key_for_my_hash(), my_dictionary)
    
    • sharded_redis is just an abstraction of several redis shards handling sharding keys via the client.
    • sink is a list of (host, port) tuples that are used to make the appropriate connection after the shard is determined.

    Essentially you are connecting and disconnecting from redis with each task (really cheap) rather than creating a connection pool.

    Using a connection pool would work, but it you are going to really utilize celery (run a lot of concurrent tasks) then you would be better off (in my opinion) using this method since you run the risk of exhausting your connection pool, especially if you are doing anything that takes a bit longer in redis (like reading a large dataset into memory).

    Connections to redis are pretty cheap, so this should scale well. We were handling several hundred thousand tasks per minute on a couple instances.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Question I use celery to launch task sets that look like this: I perform
I need in my django project run long tasks. Desided to use celery with
So I have a background procss that I need to expose/control as a web
we use celery with rabbitMQ backend and some of our servers hang with error:
I know (but I do not understand) that Celery can use Beanstalk as delivery
I'm running a Django website where I use Celery to implement preventive caching -
use strict; use warnings; use Data::Dumper; my %h; my $undef = undef; $h{''}='test2'; $h{$undef}
I have a Django project that uses Celery for running asynchronous tasks. I'm doing
Use case: Fitnesse is used for automated testing of the web site. SUT (software
In a django project, I need to generate some pdf files for objects in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.