Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7646671
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T10:16:06+00:00 2026-05-31T10:16:06+00:00

I am working on a new Python package that depends upon many rather large

  • 0

I am working on a new Python package that depends upon many rather large (>20Mb each) data files. Specifically, the library expects the data files to be in a data/ directory at run time.

Currently, I have them in a “data” directory as part of the distribution package and have my setup.py script configured to install these files on the user’s system via python setup.py install. This works for now, but it seems that it would prevent me from uploading the distribution to PyPI given that the tarball would likely exceed
a few hundred Mb.

As an alternative, I’d like to “host” the files on a remote site so as to be kind to PyPI, and have the files automatically retrieved and installed. Is this possible using the existing Python distribution techniques? If so, could you please describe how to do this or provide an example? If it is not possible, what are the best practices for pulling this off?

Any insight you could offer would be most welcome.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T10:16:07+00:00Added an answer on May 31, 2026 at 10:16 am

    NLTK has a similar situation in the distribution of their corpus data. On my linux distribution, the data is in a separate package, so I did some investigation by installing it with setuptools on Windows.

    If you try to use the corpus and you don’t have it, nltk asks you to run the downloader function (nltk.download()). Internally, it uses a LazyCorpusLoader as a standin for the corpus objects that need the data and then loads the data once it’s needed.

    Like sys.path it searches a number of paths beforehand so that the user can put it wherever they want. You can also modify nltk.data.path to add your own location for the data.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am pretty new to python and working with firmata I am trying to
I am new to Python, and I'm working on writing some database code using
In the course of my career I've noticed that developers working on new functionality
I'm new to python and django. Apps | Versions : Python 2.6.2 Django (working
I'm very new to python. two days. trying to get a plot working with
I am new to python and have been working through the examples in Swaroop
New to python and trying to learn the ropes of file i/o. Working with
I'm fairly new to Python, and I've just started working with XML parsing. I
I'm new to Python / GAE / Django. I get that with GAE there
PEP 8 says that Python package and module names should be short, since some

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.