Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6592097
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T17:30:39+00:00 2026-05-25T17:30:39+00:00

Given a string representing the entire text body of an email, I would like

  • 0

Given a string representing the entire text body of an email, I would like to extract only the part that the sender composed if it is only one contiguous block of text. For example:

Dear Sir:
That is a good point.

On Wednesday, June 1, John wrote:
> Hello world.

Would extract:

Dear Sir:
That is a good point.

By contiguous, I mean that the block may contain single newlines but not consecutive newlines. So this would not match:

Dear Sir:

That is a good point.

On Wednesday, June 1, John wrote:
> Hello world.

By ‘the part the sender composed’, I mean that the email body may contain replied or forwarded text, or a signature, all of which I want to exclude (let’s call it “non-original content”). While there may be lots of variation in the wild, it would be sufficient (for now) to handle just the following cases:

1) a line starting with two dashes (eg: —– Forwarded message —–), since signatures also often have two dashes at the beginning of a line

2) a line starting with “On ” followed by a line starting with a “>” to catch this kind of format:

On Wednesday, June 1, John wrote:
> Hello world.

If there is nothing (no non-white-space) above a non-original block, then there should be no match.

Finally, keep in mind that there may be any amount of white space at the beginning of the message as well as between the targeted text block and the end of the message or between the targeted text block and the beginning of the non-original content. Also, keep in mind that carriage returns in email may be just a linefeed or a crlf.

This is my first attempt, which gets closer than I thought when I started writing this; it uses the s flag:

^\s*(\S[^(?:\n\n|\r\n\r\n)]*\S)\s*(?:$|(?:$|\-\-.*|On [^\n]*\n\>.*))

From my testing so far, it appears to work if the targeted text is just one line, but not if it’s more than one line. So the main flaw appears to be in this part:

_______[^(?:\n\n|\r\n\r\n)]*________________________________________

UPDATE: this is the solution I’m using:

'/\A\s*((?:[^\r\n]+\r?(?:\n|\z))+)\s*(?:\z|(--.*|On .+:\n\>.*))/s'

Note that the “On” line may wrap to multiple lines (eg- if the date and email address are long), but in general there will be a “:\n>” in there.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T17:30:40+00:00Added an answer on May 25, 2026 at 5:30 pm

    In the part you flagged:

    [^(?:\n\n|\r\n\r\n)]*
    

    Square braces mean a character class, and the carat inverts the characters to match. So I imagine the regular expression engine is building a character class that doesn’t match a (, doesn’t match a ?, doesn’t match a :, and so on.

    Here’s a regular expression that I believe does what you want for this part:

    ((?:[^\r\n]+\r?\n)*)
    

    This means “match anything but a CR or LF, any number but at least one, followed optionally by a CR and then definitely by an LF. Then when it repeats by the * (zero or more times) it won’t match two line endings in a row, because the beginning of the pattern is anything but a line ending. Then that whole thing is in parens to make a match group.

    Now, we need to anchor this so that it comes right where you want it. It looks like you are expecting three anchor cases: end of string, the “On wrote” line, or a signature line (“–\n”). Your regular expression is more complicated than it really needs to be to anchor these three cases; this would do:

    (?:$|--\r?\n|On \d\d/\d\d/\d\d\d\d \d\d:\d\d [AP]M, .*wrote:\r?\n)
    

    It’s longer than yours because I wanted to make sure it wouldn’t anchor on actual email message text that happens to start with the word “On” at the beginning of a line.

    And you allow any number of blank lines between the match group and the anchor:

    (?:\r?\n)*
    

    Put these together:

    ((?:[^\r\n]+\r?\n)*)(?:\r?\n)*(?:$|--\r?\n|On \d\d/\d\d/\d\d\d\d \d\d:\d\d [AP]M, .*wrote:\r?\n)
    

    I tested these with an actual email message from my inbox, using Python’s re module to test the regexp.

    NOTE: Actually, now that I think about it, I don’t recommend using such a rigorous regexp to match the “On” line. The “On” line is inserted by the email client that the sender was using, and you have no control over it. What if the user’s email client inserts 24-hour time instead of AM/PM? (I even have seen French people’s email clients insert French language instead of “On” so the whole line wouldn’t even match!) So you might want a looser match pattern for the “On” line, but beware that if it’s too loose and an email contains a line that happens to start with “On” you might chop early.

    Here’s a simple pattern that should work:

    On \d[^\n]+\n>
    

    On, followed by a digit and then whatever until end of line, but the next line must start with >. That ought to work, except for the pathological case where an email body has a line starting with “On” and a number and then the very next line starts with the word “From” so the email client inserts a > before “From”.

    Anyway, putting it all together:

    ((?:[^\r\n]+\r?\n)*)(?:\r?\n)*(?:$|--\r?\n|On \d[^\n]+\n>)
    

    EDIT: You asked me to do a quick edit and update it with your final pattern, so here you go:

    /\A\s*((?:[^\r\n]+\r?(?:\n|\z))+)\s*(?:\z|(--.*|On [^\n]+\n\>.*))/s
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm in need to modify a given string to contain only alpha numerical characters,
I've been looking for a way to hash a given string in C# that
How do I create an array in smarty from a given string like 22||33||50
what would be the regular expression to check if a given string contains atleast
What regular expression can I use (if any) to validate that a given string
A string 2012-03-02 representing March 2nd, 2012 is given to me as an input
I'd like to be able to parse a string of JSON representing an object
I'm trying to compress any given string to a shorter version, copy paste-able compressed
Possible Duplicate: how to rotate the given string to left or right in C?
Algorithm to generate all possible letter combinations of given string down to 2 letters

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.