Consider the following mark-up input:
* Line 1 * Line 2 :* Line 2.1 :* Line 2.2 * Line 3
This is typically coded as:
<ul>
<li>Line 1</li>
<li>Line 2</li>
<ul>
<li>Line 2.1</li>
<li>Line 2.2</li>
</ul>
<li>Line 3</li>
</ul>
My questions:
- What would be a good representation for the same input using a single line?
- What is the regular expression to generate the corresponding XHTML?
For example, the single line input format could be:
> Line 1 > Line 2 >> Line 2.1 >> Line 2.2 > Line 3
With > being unordered list item delimiter. I chose > because the text might include typical punctuation marks. Using » (or other such non-104-key keys) would be fun, but not as easy to type.
The line input format could also be:
[Line 1][Line 2 [Line 2.1][Line 2.2]][Line 3]
Update #1 – The problem is a little simpler. The number of nests can be limited to three. A general solution for n-levels deep would still be cool.
Update #2 – XHTML, not HTML.
Update #3 – Another possible input format.
Update #4 – Java solutions (or pure regex) are most welcome.
Update #5
Revised code:
String in = " * Line 1 * Line 2 > * Line 2.1 * Line 2.2 < * Line 3";
String sub = "<ul>" + in.replace( " > ", "<ul>" ) + "</ul>";
sub = sub.replace( " < ", "</ul>" );
sub = sub.replaceAll( "( | >)\\* ([^*<>]*)", "<li>$2</li>" );
System.out.println( "Result: " + sub );
Prints the following:
Result: <ul><li>Line 1 </li>* Line 2<ul>* Line 2.1<li>Line 2.2</li></ul>* Line 3
Solution
A working solution follows:
This creates the desired XHTML fragment: