Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6771699
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T15:27:47+00:00 2026-05-26T15:27:47+00:00

Here is my problem: I have many known locations (I have no influence to

  • 0

Here is my problem:
I have many known locations (I have no influence to these) with a lot of data. Each locations offers me in individual periods of a lot new data. Some give me differential updates, some just the whole dataset, some via xml, for some I have to build a webscraper, some need authentication etc…
These collected data should be stored in a database. I have to program an api to send requested data in xml back.

Many roads lead to Rome but which should i choose?

Which software would you suggest me to use?

I am familiar with C++,C#,Java,PHP,MySQL,JS but new stuff is still ok.

My idea is to use cron jobs + php (or shell script) + curl to fetch the data.
Then I need a module to parse and insert the data into a database (mysql).
The data requests from clients could answer a php script.

I think the input data volume is about 1-5GB/day.

The one correct answer doesn’t exist, but can you give me some advice?
It would be great if you can show me smarter ways to do this.

Thank you very much 🙂

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T15:27:47+00:00Added an answer on May 26, 2026 at 3:27 pm

    LAMP: Stick to PHP and MySQL (and make occasional forays into perl/python): availability of PHP libraries, storage solutions, scalability and API solutions and its community size well makes up for any other environment offerings.

    API: Ensure that the designed API queries (and storage/database) can meet all end-product needs before you get to writing any importers. Date ranges, tagging, special cases.

    PERFORMANCE: If you need lightning fast queries for insanely large data sets, sphinx-search can help. It’s got more than just text search (tags, binary, etc) but make sure you spec the server requirements with more RAM.

    IMPORTER: Make it modular: as in, for each different data source, write a pluggable importer that can be enabled/disabled by admin, and of course, individually tested. Pick a language and library based on what’s best and easiest fit for the job: bash script is okay.

    In terms of parsing libraries for PHP, there are many. One of recent popular ones is simplehtmldom and I found it to work quite well.

    TRANSFORMER: Make data transformation routines modular as well so it can be written as a need arises. Don’t make the importer alter original data, just make it the quickest way into an indexed database. Transformation routines (or later plugins) should be combined with API query for whatever end result.

    TIMING: There is nothing wrong with cron executions, as long as they don’t become runaway or cause your input sources to start throttling or blocking you so you need that awareness.

    VERSIONING: Design the database, imports, etc to where errant data can be rolled back easily by an admin.

    Vendor Solution: Check out scraperwiki – they’ve made a business out of scraping tools and data storage.

    Hope this helps. Out of curiosity, any project details to volunteer? A colleague of mine is interested in exchanging notes.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Here is my problem. I have created a pretty heavy readonly class making many
Here is the problem: I have two columns in a table that, for each
Here's the problem: I have a data-bound list of items, basically a way for
Here is the problem: I have many sets of points, and want to come
please I have the same problem as I found here MySQL - Selecting data
I have problem adding arraylist to list view, will explain about my problem here..
First, sorry for my bad english, I'm French. Here the problem : I have
Here is my problem : I have a list of messages which I can
Here is my problem: I have an array of model class(Let's say, 'addressModel' with
Here's my problem: I have do create a menu/list of actions (which would be

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.