Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6360843
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T23:42:00+00:00 2026-05-24T23:42:00+00:00

I have a stack of data in 200,000+ XML files, which are updated/created every

  • 0

I have a stack of data in 200,000+ XML files, which are updated/created every week. So, each week, I have to parse each XML file, check whether there are new XML files being created, then update my database with all updated data. In between, if there is new XML file, then my database will not have that record, so I have to create a new row.

Here’s my workflow plan:

  1. Put all table X row ID into array A. Let’s say 200,000 numerical values.
  2. Parse each XML files and gather each XML’s ID (ID will be the same as my table X ID), and store in array B. Let’s say now I have 200,010 numerical values, with 10 new records compared to my current table X.
  3. Compare array A and array B to see which values do not exist.
  4. Put the 10 new values into array C.
  5. Create new record in table X with that 10 new IDs from array C.
  6. Parse each XML file again, and store the desired values to my table X row-column.
  7. So now my table X will have 200,010 records, and each are updated, and the new 10 records will also now in the table X.

The problem I have to do this is because I can’t get any information of any new XML file created by the vendor. They just give me a stack of files.

Any better way to do it? I’m worried that my system will crash when they compare two arrays with 200,000+ values. Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T23:42:01+00:00Added an answer on May 24, 2026 at 11:42 pm

    I had to do something similar. In the end I did it like this:

    1. Baseline the whole setup by iterating over each record in all XML files, normalizing the item (removing newlines, cleaning up whitespace, substituting certain characters) and then doing a per record MD5 sum. Also import the record.

    2. When I get new data, I iterate through the records (SAX would be a good idea), if the record is not yet in the DB (based on UID) or has changed (based on MD5 sum) it gets imported.

    This works pretty well for what we usually need it for (around 350k records spread through around 100 files), but also worked ok-ish with (much) more data. It’s a wild mix of several tools, including Bash, AWK, sed, grep, the wonderful XMLStarlet and Ruby, and would be in dire need of a proper rewrite.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a stack which contains some integer data. I want to find out
I have two C functions, which basically operate on a stack data structure. This
We have huge stack of xml files (around 5000+ files) possibly about 80 MB
I am creating an application which sends the measurement data every 200 ms from
Using the Stack Overflow public data dump, I've created three simple tables: Questions (Question_Id,
I have use java stack data structure to maintain data. I have limit my
Can I have a stack data structure in matlab? For example a stack of
Ok. I have an array with multiple objects populated by my core data stack
Kinda stuck here... I have an application with lets say 5000 rows of data
I have stack panel with custom controls in it. I attach standard MouseDragElementBehavior to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.