Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8078641
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T15:51:35+00:00 2026-06-05T15:51:35+00:00

I have a huge XML document 200MB in size containing textual information. The data

  • 0

I have a huge XML document 200MB in size containing textual information. The data was earlier stored in pagemaker file with 2 Columns. After tagging I found that certain text is having hyphen. This is because the word(s) which were unable to fit the format were broken down in 2 words separated by hyphen. Also this XML document use hyphen for another reason. To separated short sentences (for Notes).

I want to find out those hyphens which are in between the words. I have noticed that the hyphen which I want to find an remove have a standard pattern. For Example.

The first use of hyphen – (Which I want to find and replace)

question is ques-tion
answer would be ans-wer

The other use of hyphen is – (Not to be found)

Pattern matchin - Regex Expressions - ...

So the standard format for both is –

space-space

letter-letter

How can I use XQuery to find all these , ie the second one…
Or any other way to find them… As finding and replacing these in huge XML file … my god ..

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T15:51:37+00:00Added an answer on June 5, 2026 at 3:51 pm

    200 MB is not huge. 🙂

    If you’re totally sure no hyphens are to be found in tag-/attribute-names, use sed (discouraged!):

    sed -E 's/([[:alpha:]]+)\-([[:alpha:]]+)/\1\2/g' doc.xml out.xml
    

    Better use XQuery for this, so you won’t have to deal with complex XML syntax parsing:

    declare function local:copy-replace($element as element()) {  
      element {node-name($element)}  
              {$element/@*, 
            for $child in $element/node()  
            return
                if ($child instance of element())
                then local:copy-replace($child)  
                else replace($child, "(\w+)\-(\w+)","$1$2")
              }  
    };
    
    local:copy-replace(/*)
    

    It doesn’t deal with attributes yet. If hyphenated texts occurs in attributes, you will have to extract and include them separately.

    Some credits go to some unknown user in this answer I gladly remembered as a pattern.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a huge XML file, about 2GB in size, containing Resumes. There are
I have a huge XML file full of employees and information, and have a
I have a thousands of data parsed from huge XML to be inserted into
I have one huge XML document. I have set of XSL representing each node
I am working with potentially huge XML files containing complex trace information from on
I have a huge XML file (114 KB/1719 lines; see the error message below
I have a huge set of data of tables in Open Office 3.0 document
I have a huge XML(>400MB) containing products. Using a DOM parser is therefore excluded,
So I have this huge XML file that I am converting to an object
I have a huge XML file of 9 GB where i need to add

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.