Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6251849
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T13:41:25+00:00 2026-05-24T13:41:25+00:00

Tried using this function on a paragraph consisting of 3 strings and abbreviations. #!/usr/bin/env

  • 0

Tried using this function on a paragraph consisting of 3 strings and abbreviations.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

def splitParagraphIntoSentences(paragraph):
    ''' break a paragraph into sentences
        and return a list '''
    import re
    # to split by multile characters

    #   regular expressions are easiest (and fastest)
    sentenceEnders = re.compile('[.!?][\s]{1,2}[A-Z]')
    sentenceList = sentenceEnders.split(paragraph)
    return sentenceList

if __name__ == '__main__':
    p = "While other species (e.g. horse mango, M. foetida) are also grown ,Mangifera indica – the common mango or Indian mango – is the only mango tree. Commonly cultivated in many tropical and subtropical regions, and its fruit is distributed essentially worldwide.In several cultures, its fruit and leaves are ritually used as floral decorations at weddings, public celebrations and religious "

    sentences = splitParagraphIntoSentences(p)
    for s in sentences:
        print s.strip()

The first character of the next beggining sentence is eliminated,

O/p Recieved:
 While other Mangifera species (e.g. horse mango, M. foetida) are also grown on a
 more localized basis, Mangifera indica ΓÇô the common mango or Indian mango ΓÇô
 is the only mango tree
ommonly cultivated in many tropical and subtropical regions, and its fruit is di
stributed essentially worldwide.In several cultures, its fruit and leaves are ri
tually used as floral decorations at weddings, public celebrations and religious.

Thus the string got spliited into only 2 strings and the first character of the next sentence got eliminated.Also some strange charactes can be seen, I guess python wasn`t able to convert the hypen.

Incase I alter the regex to [.!?][\s]{1,2}

While other species (e.g
horse mango, M
foetida) are also grown ,Mangifera indica ΓÇô the common mango or Indian mango Γ
Çô is the only mango tree
Commonly cultivated in many tropical and subtropical regions, and its fruit is d
istributed essentially worldwide.In several cultures, its fruit and leaves are r
itually used as floral decorations at weddings, public celebrations and religiou
s

Thus even the abbreviations get splitted.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T13:41:26+00:00Added an answer on May 24, 2026 at 1:41 pm

    The regex you want is:

    [.!?][\s]{1,2}(?=[A-Z])
    

    You want a positive lookahead assertion, which means you want to match the pattern if it’s followed by a capital letter, but not match the capital letter.

    The reason only the first one got matched is you don’t have a space after the 2nd period.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have this function which I would like to parallelize using openmp: for(i=len-1;i>=0;i--){ if(bin[i]==49)
i've tried using this code and this to make a random quote generator, but
I want to create multilevel accordion Navigation using jQuery , I tried using this
Can someone show me how to use the System.Numerics.BigInteger datatype? I tried using this
I tried using height: 100% but this makes the div only as high as
I tried to send an email using this class below, but no success, no
I just got Java5 project that has this error, i tried using Java5 and
I tried this aproach without any success the code I'm using: // File name
Using PyCrypto (although I've tried this in ObjC with OpenSSL bindings as well) :
I've tried to move WCF to NetDataContractSerializer using the code in this post: http://lunaverse.wordpress.com/2007/05/09/remoting-using-wcf-and-nhibernate

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.