Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 284597
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T05:28:19+00:00 2026-05-12T05:28:19+00:00

I’m looking for a regular expression to match every new line character ( \n

  • 0

I’m looking for a regular expression to match every new line character (\n) inside a XML tag which is <content>, or inside any tag which is inside that <content> tag, for example :

<blog>
<text>
(Do NOT match new lines here)
</text>
<content>
(DO match new lines here)
<p>
(Do match new lines here)
</p>
</content>
(Do NOT match new lines here)
<content>
(DO match new lines here)
</content>
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T05:28:19+00:00Added an answer on May 12, 2026 at 5:28 am

    Actually… you can’t use a simple regex here, at least not one. You probably need to worry about comments! Someone may write:

    <!-- <content> blah </content> -->
    

    You can take two approaches here:

    1. Strip all comments out first. Then use the regex approach.
    2. Do not use regular expressions and use a context sensitive parsing approach that can keep track of whether or not you are nested in a comment.

    Be careful.

    I am also not so sure you can match all new lines at once. @Quartz suggested this one:

    <content>([^\n]*\n+)+</content>
    

    This will match any content tags that have a newline character RIGHT BEFORE the closing tag… but I’m not sure what you mean by matching all newlines. Do you want to be able to access all the matched newline characters? If so, your best bet is to grab all content tags, and then search for all the newline chars that are nested in between. Something more like this:

    <content>.*</content>
    

    BUT THERE IS ONE CAVEAT: regexes are greedy, so this regex will match the first opening tag to the last closing one. Instead, you HAVE to suppress the regex so it is not greedy. In languages like python, you can do this with the “?” regex symbol.

    I hope with this you can see some of the pitfalls and figure out how you want to proceed. You are probably better off using an XML parsing library, then iterating over all the content tags.

    I know I may not be offering the best solution, but at least I hope you will see the difficulty in this and why other answers may not be right…

    UPDATE 1:

    Let me summarize a bit more and add some more detail to my response. I am going to use python’s regex syntax because it is what I am more used to (forgive me ahead of time… you may need to escape some characters… comment on my post and I will correct it):

    To strip out comments, use this regex:

    Notice the “?” suppresses the .* to make it non-greedy.

    Similarly, to search for content tags, use:
    .*?

    Also, You may be able to try this out, and access each newline character with the match objects groups():

    <content>(.*?(\n))+.*?</content>
    

    I know my escaping is off, but it captures the idea. This last example probably won’t work, but I think it’s your best bet at expressing what you want. My suggestion remains: either grab all the content tags and do it yourself, or use a parsing library.

    UPDATE 2:

    So here is python code that ought to work. I am still unsure what you mean by “find” all newlines. Do you want the entire lines? Or just to count how many newlines. To get the actual lines, try:

    #!/usr/bin/python
    
    import re
    
    def FindContentNewlines(xml_text):
        # May want to compile these regexes elsewhere, but I do it here for brevity
        comments = re.compile(r"<!--.*?-->", re.DOTALL)
        content = re.compile(r"<content>(.*?)</content>", re.DOTALL)
        newlines = re.compile(r"^(.*?)$", re.MULTILINE|re.DOTALL)
    
        # strip comments: this actually may not be reliable for "nested comments"
        # How does xml handle <!--  <!-- --> -->. I am not sure. But that COULD
        # be trouble.
        xml_text = re.sub(comments, "", xml_text)
    
        result = []
        all_contents = re.findall(content, xml_text)
        for c in all_contents:
            result.extend(re.findall(newlines, c))
    
        return result
    
    if __name__ == "__main__":
        example = """
    
    <!-- This stuff
    ought to be omitted
    <content>
      omitted
    </content>
    -->
    
    This stuff is good
    <content>
    <p>
      haha!
    </p>
    </content>
    
    This is not found
    """
        print FindContentNewlines(example)
    

    This program prints the result:

     ['', '<p>', '  haha!', '</p>', '']
    

    The first and last empty strings come from the newline chars immediately preceeding the first <p> and the one coming right after the </p>. All in all this (for the most part) does the trick. Experiment with this code and refine it for your needs. Print out stuff in the middle so you can see what the regexes are matching and not matching.

    Hope this helps :-).

    PS – I didn’t have much luck trying out my regex from my first update to capture all the newlines… let me know if you do.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 183k
  • Answers 183k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer You over-using the jquery function ($j(), in your case) and… May 12, 2026 at 4:39 pm
  • Editorial Team
    Editorial Team added an answer Modify the palette of your plain text edit. Sample program:… May 12, 2026 at 4:39 pm
  • Editorial Team
    Editorial Team added an answer My guess is that wget doesn't update the timestamp on… May 12, 2026 at 4:39 pm

Related Questions

I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
In order to apply a triggered animation to all ToolTip s in my app,
I have a French site that I want to parse, but am running into
I have text I am displaying in SIlverlight that is coming from a CMS

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.