Problem* Given some data (text) which has style applied to it with a loosely

Question

0

Editorial Team

Asked: May 26, 20262026-05-26T15:56:22+00:00 2026-05-26T15:56:22+00:00

Problem* Given some data (text) which has style applied to it with a loosely

0

Problem*

Given some data (text) which has style applied to it with a loosely defined markup, such as:

The [blower]cat[elower] [weight 15]sat[normal] on the mat.[newline]

Which would ideally be represented as something like:

The <text class="lower">cat</text> <strong>sat</strong> on the mat.<br />

The markup has the following properties:

A tag represents an instruction to format text in a given way from that point onward
An end tag may exist, but only for a small set of tags. Other tags are linear (see point 1)
Each tag has it’s own behaviour, and may affect previously applied tags in different ways
Some nesting is implied from the linear tags adding to or overwriting existing styles
Metadata may be outside of tags (eg. [beg][xyz]cmd[end1] is all tag related, no content)

Requirements

Define rules around tag interaction (eg. A style tag such as [bold] is closed by another style tag such as [normal] or [light])
Nesting of some content (tags which do not overwrite as above will nest and break accordingly)
Define maps from the well defined in memory representation to some output format

Thoughts

Parse into DOM like structure – Attempt to group tags into ‘sets’. When a tag is encountered, close the active tag for that set and open the new one. This produces <tag>content</tag>. Problems around proper nesting and closing/reopening tags so that you dont end up with overlap situations like <b>text<i>text</b>text</i> are annoying but straight forward enough.

How would you set about designing a data structure or method of parsing the content such that a set of rules can aid transformation to a well defined structure?

Alternatively, any suggestions for fields/areas that you would look at when solving this sort of problem?

*Real world problem

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:56:23+00:00

This problem is isomorphic (at least as you’ve described it so far) to XML. You have syntax that introduces and ends markup, and it comes mostly in pairs [blower]…[elower] and [weight 15]…[normal] with the occasional standalone [newline].

So if you know how to build an XML parser with tags, you know how to do this, too.

If you don’t, you just need a grammar (in EBNF) and a parser generator:

document =  fragment* ;

fragment = TEXT ;
fragment = '[blower]' fragment '[elower]' ;
fragment = '[weight' NATURAL ']' fragment '[normal]' ;
fragment =  other_start_tag fragment other_end_tag ;
fragment = '[newline]' ;

This requires a pretty simple lexer, and a pretty simple parser. (See FLEX and YACC as examples).
You can build your DOM as a set of tree nodes as the parser runs by attaching actions to the grammar rules (See YACC documentation). Many other parser generators will let you build the tree as you parse, too.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Problem* Given some data (text) which has style applied to it with a loosely

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply