Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6093295
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T12:32:36+00:00 2026-05-23T12:32:36+00:00

so I found out it was possible to use the buffered reader/writer to copy

  • 0

so I found out it was possible to use the buffered reader/writer to copy an xml file over word for word to a new xml file. However, I was wondering if it would be possible to scrape out only a portion of the document?

For example, looking at this example:

<?xml version="1.0" encoding="UTF-8"?>
<BookCatalogue xmlns="http://www.publishing.org">
    <w:pStyle w:val="TOAHeading" />
    <Book>
    <Title>Yogasana Vijnana: the Science of Yoga</Title>
    <author>Dhirendra Brahmachari</Author>
    <Date>1966</Date>
    <ISBN>81-40-34319-4</ISBN>
    <Publisher>Dhirendra Yoga Publications</Publisher>
    <Cost currency="INR">11.50</Cost>
  </Book>
  <Book>
    <Title>The First and Last Freedom</Title>
    <v:imagedata r:id="rId7" o:title="" croptop="10523f" cropbottom="11721f" /> 
    <Author>J. Krishnamurti</Author>
    <Date>1954</Date>
    <ISBN>0-06-064831-7</ISBN>
    <Publisher>Harper &amp; Row</Publisher>
    <Cost currency="USD">2.95</Cost>
  </Book>
<w:pStyle w:val="TOAHeading2" />
</BookCatalogue> 

Sorry if this is not proper XML Code, I just added the tidbits from the document I was looking at to this sample I found. But basically, if I wanted to look for the an instance of “heading” (in this case, 3rd line -> TOAHeading), then scrape everything from heading down until another instance of heading is found and copy it to another xml file. Is that possible? Furthermore, if I wanted to make that a temporary file I’m storing to, and only keep that file if an instance of “image” (in this case, 14th line) is found, is that possible as well? I’m trying to do this in the simplest way possible, so does anyone have any ideas or experience with this? Thanks in advance.

public class IPDriver 
        {
            public static void main(String[] args) throws IOException
            {
                BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStreamReader("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Extracted Items/ProposalOne/word/document.xml"), "UTF-8"));
                BufferedWriter writer = new BufferedWriter(new OutputStreamReader(new FileOutputStreamReader("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Extracted Items/ProposalOne/word/tempdocument.xml"), "UTF-8"));

                String line = null;

                while ((line = reader.readLine()) != null)
                {
                    writer.write(line);
                }

                // Close to unlock.
                reader.close();
                // Close to unlock and flush to disk.
                writer.close();
            }
        }

Example From My Actual XML Document

- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="address">
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="Street">
- <w:r w:rsidRPr="00822244">
  <w:t>6841 Benjamin Franklin Drive</w:t> 
  </w:r>
  </w:smartTag>
  </w:smartTag>
  </w:p>
- <w:p w:rsidR="00B41602" w:rsidRPr="00822244" w:rsidRDefault="00B41602" w:rsidP="007C3A42">
- <w:pPr>
  <w:pStyle w:val="Address" /> 
  </w:pPr>
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="City">
- <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="place">

Just your basic document.xml file from a .docx

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T12:32:37+00:00Added an answer on May 23, 2026 at 12:32 pm

    I’ve seen a lot of technically-correct suggestions, but your request (when taken as-written) suggests to me that you have the following requirements:

    • Start parsing at a case-insensitive (and potentially PARTIAL) matching of an attribute value; in your case you wanted to match “heading” to the second half of “TAOHeading”.
    • Parse from that odd starting condition down to a matching (and equally odd) ending condition.

    If I understood your requirements, you are basically wanting to do a totally unstructured parse of a very structured piece of data (XML markup). In that case, using an XML parser, an XSLT, DOM parser for anything written against the XML spec is going to be a pain in the ass to mangle to your needs.

    You’ll need to do a case-insensitive scan of your document contents until you get your match, then pull all the characters between that match and an ending match.

    If the documents aren’t huge (say 1 MB or smaller) just read the whole thing into memory into a String and either use a really quick and dirty use of “indexOf” for the different cased versions of what you want, OR read the whole thing into a char[] do write some more efficient scanning code for a case-insensitive match for the starting value you want to begin parsing at.

    If I misunderstood your requirement and it is actually much more structured than it sounded in your description above, then please use one of the other suggestions that is more focused on true XML parsing. I am just putting this solution out there in the off chance that it was as random as you made it out to seem.

    (NOTE: I’m not saying it’s BAD, just never seen that request before. You have your own reasons for needing to do that and we’ll just try and help 😉

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I found out how to get the content of the NSPasteboard, and to copy
Today I found out that putting strings in a resource file will cause them
I've recently found out about protocol buffers and was wondering if they could be
So I just found out GCC could do inline assembly and I was wondering
I found out that build time of C# solution with many projects gets much
I found out how to generate an image in code-behind based on some input
I've found out how to convert errors into exceptions, and I display them nicely
I've just found out about Stack Overflow and I'm just checking if there are
I just found out that by converting PNG32 to PNG8 via Photoshop will fix
EDIT: I found out that I can get it to compile if I cast

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.