I’m designing architecture of a text parser. Example sentence: Content here, content here. Whole

Question

0

Asked: May 13, 20262026-05-13T13:19:30+00:00 2026-05-13T13:19:30+00:00

I’m designing architecture of a text parser. Example sentence: Content here, content here. Whole

0

I’m designing architecture of a text parser. Example sentence: Content here, content here.

Whole sentence is a… sentence, that’s obvious. The, quick etc are words; , and . are punctuation marks. But what are words and punctuation marks all together in general? Are they just symbols? I simply don’t know how to name what a single sentence consists of in the most reasonable abstract way (because one may write it consists of letters/vowels etc).

Thanks for any help 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T13:19:30+00:00

What you’re doing is technically lexical analysis (“lexing”), which takes a sequence of input symbols and generates a series of tokens or lexemes. So word, punctuation and white-space are all tokens.

In (E)BNF terms, lexemes or tokens are synonymous with “terminal symbols”. If you think of the set of parsing rules as a tree the terminal symbols are the leaves of the tree.

So what’s the atom of your input? Is it a word or a sentence? If it’s words (and white-space) then a sentence is more akin to a parsing rule. In fact the term “sentence” can itself be misleading. It’s not uncommon to refer to the entire input sequence as a sentence.

A semi-common term for a sequence of non-white-space characters is a “textrun”.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m designing architecture of a text parser. Example sentence: Content here, content here. Whole

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply