I am trying to tokenize a string using the pattern as below. >>> splitter

Question

0

Asked: May 16, 20262026-05-16T01:30:51+00:00 2026-05-16T01:30:51+00:00

I am trying to tokenize a string using the pattern as below. >>> splitter

0

I am trying to tokenize a string using the pattern as below.

>>> splitter = re.compile(r'((\w*)(\d*)\-\s?(\w*)(\d*)|(?x)\$?\d+(\.\d+)?(\,\d+)?|([A-Z]\.)+|(Mr)\.|(Sen)\.|(Miss)\.|.$|\w+|[^\w\s])')
>>> splitter.split("Hello! Hi, I am debating this predicament called life. Can you help me?")

I get the following output. Could someone point out what I’d need to correct please? I’m confused about the whole bunch of “None”‘s. Also if there is a better way to tokenize a string I’d really appreciate the additional help.

['', 'Hello', None, None, None, None, None, None, None, None, None, None, '', '!', None, None, None, None, None, None, None, None, None, None, ' ', 'Hi', None,None, None, None, None, None, None, None, None, None, '', ',', None, None, None, None, None, None, None, None, None, None, ' ', 'I', None, None, None, None, None, None, None, None, None, None, ' ', 'am', None, None, None, None, None, None,None, None, None, None, ' ', 'debating', None, None, None, None, None, None, None, None, None, None, ' ', 'this', None, None, None, None, None, None, None, None, None, None, ' ', 'predicament', None, None, None, None, None, None, None, None, None, None, ' ', 'called', None, None, None, None, None, None, None, None, None, None, ' ', 'life', None, None, None, None, None, None, None, None, None, None, '', '.', None, None, None, None, None, None, None, None, None, None, ' ', 'Can', None, None, None, None, None, None, None, None, None, None, ' ', 'you', None, None, None, None, None, None, None, None, None, None, ' ', 'help', None, None,None, None, None, None, None, None, None, None, ' ', 'me', None, None, None, None, None, None, None, None, None, None, '', '?', None, None, None, None, None, None, None, None, None, None, '']

The output that I’d like is:-

['Hello', '!', 'Hi', ',', 'I', 'am', 'debating', 'this', 'predicament', 'called', 'life', '.', 'Can', 'you', 'help', 'me', '?']

Thank you.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T01:30:52+00:00

Editorial Team

2026-05-16T01:30:52+00:00Added an answer on May 16, 2026 at 1:30 am

I recommend NLTK‘s tokenizers. Then you don’t need to worry about tedious regular expressions yourself:

>>> import nltk
>>> nltk.word_tokenize("Hello! Hi, I am debating this predicament called life. Can you help me?")
['Hello', '!', 'Hi', ',', 'I', 'am', 'debating', 'this', 'predicament', 'called', 'life.', 'Can', 'you', 'help', 'me', '?']

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to tokenize a string using the pattern as below. >>> splitter

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply