Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8069723
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T13:09:17+00:00 2026-06-05T13:09:17+00:00

Here is the problem that I am trying to solve. I have two folders

  • 0

Here is the problem that I am trying to solve.

  1. I have two folders which contain XML files.
  2. One folder – lets say “source” folder – contains around 350,000 XML files.
  3. Another folder – lets say “compare” folder – contains the same 350,000 XML files and a few more.
  4. The 350,000 files that are present in both have the same names. Exact same.
  5. However, the files in “source” are slightly different from the files in “compare”. The files in compare may (or may not) have some extra nodes.
  6. I need to compare the “identically named files” from “source” and “compare”. If – for each file in “source” – all the nodes that are present in file of “source” are present in the file of “compare” – I need to produce a Ok report.
  7. If not, i.e.
  8. there is some file in “source” that is not present in “compare”
  9. in any file of “source” there is some node that is not present in the corresponding file of “compare”
  10. Then I need to create a error report with the details of what is missing.

I am currently pursuing Java + XMLUnit for this problem and am not sure if that can solve it. Even if it is, I am definitely not sure if this is the most optimal choice of tool.

Any help / suggestion will be much appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T13:09:19+00:00Added an answer on June 5, 2026 at 1:09 pm

    First things first. Let me go on record and say that XMLUnit is a gem. I loved it. If you are looking at some unit testing of XML values / attributes / structure etc. chances are that you will find a readymade solution with XMLUnit. This is a good place to start from.

    It is quite extensible. It already comes with an identity check (as in the XMLs have the same elements and attributes in the same order) or similarity check (as in the XMLs have the same elements and attributes regardless of the order).

    However, in my case I was looking for a slightly different usage. I had a big-ish XML (a few hundred nodes), and a bunch of XML files (around 350,000 of them). I needed to not compare certain particular nodes, that I could identify with XPATH. They were not necessarily always in the same position in the XML but there were some generic way of identifying them with XPATH. Sometimes, some nodes were to be ignored based on values of some other nodes. Just to give some idea

    1. The logic here is on the node that I want to ignore i.e price.
      /bookstore/book[price>35]/price

    2. The logic here is on a node that is at a relative position. I want to ignore author based on the value of price. And these two are related by position.
      /bookstore/book[price=30]/./author

    After much tinkering around, I settled for a low tech solution. Before using XMLUnit to compare the files, I used XPATH to mask the values of the nodes that were to be ignored.

        public static int massageData(File xmlFile, Set<String> xpaths, String mask)
            throws JDOMException, IOException {
        logger.debug("Data massaging started for " + xmlFile.getAbsolutePath());
        int counter = 0;
    
        Document doc = (Document) new SAXBuilder().build(xmlFile
                .getAbsolutePath());
    
        for (String xpath : xpaths) {
            logger.debug(xpath);
            XPathExpression<Element> xpathInstance = XPathFactory.instance()
                    .compile(xpath, Filters.element());
            List<Element> elements = xpathInstance.evaluate(doc);
            // element = xpathInstance.evaluateFirst(doc);
            if (elements != null) {
                if (elements.size() > 1) {
                    logger.warn("Multiple matches were found for " + xpath
                            + " in " + xmlFile.getAbsolutePath()
                            + ". This could be a *potential* error.");
                }
                for (Element element : elements) {
                    logger.debug(element.getText());
                    element.setText(mask);
                    counter++;
                }
            }
        }
    

    Hope this helps.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Here's the problem I'm trying to solve: I have a dynamic php-driven website that
I am trying to solve this exponentiation problem . Here is the code that
Here's the coding problem I am trying to solve... I have a base class,
EDIT Here is the problem I am trying to solve: I have a string
I'm trying to solve a problem similar to the one described here Initializing strongly
I'm trying to follow the Django tutorial (for v1.1) here . the problem that
So Here is the Problem, I am trying to get that circle to align
I have a problem here that requires to design a data structure that takes
I have an interesting SQL problem that I need help with. Here is the
I currently have a dataset which has two columns that I'd like to compare.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.