UPDATE AT THE BOTTOM
Maybe somebody could help with this… been struggling with it for days and i’m blocked :/
For a content-cleaner solution i’m working in, i’m trying to convert some pure-text numbered lists, like:
1 Foo
1.1 Foo 1
1.2 Foo 2
2 Bar
2.1 Bar 1
2.2 Bar 2
2.2.1 Bar 2.1
2.2.2 Bar 2.2
2.3 Bar 3
3 Z Another root item
… into correct nested html lists …
<ul>
<li>Foo
<ul>
<li>Foo 1</li>
<li>Foo 2</li>
</ul>
</li>
<li>Bar
<ul>
<li>Bar 1</li>
<li>Bar 2
<ul>
<li>Bar 2.1</li>
<li>Bar 2.2</li>
</ul>
</li>
<li>Bar 3</li>
</ul>
<li>Another root item</li>
</ul>
Some things that may help:
- No need for the result to be correctly indented, just surrounded by the correct html tags
- No need to locate the list inside another text, can sume i already have only the list
- No need for great performance, regexp, itaration… whatever works is fine
- No need for especific language solution, PHP, Python, Javascript, Pseudocode… is fine
- Can asume ” ” (space) as the only separator after the “1.2.3 ” list text
- Can asume lines are already in the correct order, no need to order them at all
UPDATE TLTR (Not homework, but real world usage)
Sorry for looking so “homework not done”, my fault. English is not my language and i tried to be maybe to concise.
What i’m trying to do is to make it easier for my workmates to format text to correct html from unknow sources.
Up to day i managed to (you can see the full screenshot here http://twitpic.com/907aw5/ as i can’t attach images being my first question and no reputation):
- I get the original text and do a strip_tags on it to delete any incorrect HTML it can have
- I insert it into a textarea
- I integrated a Javascript editor ( Codemirror http://codemirror.net ) with the specifications for HTML
- I injected an edition bar with the most common tags we use, as my workmates doesn’t know a word about HTML
- As part of the cleaning options, i set two hotkeys that makes an ul / ol of the selected text (breaking in the \n chars)
- When the user saves, i run HTMLTidy on it for it to became as cleaner as posible (indent, delete propietary tags, etc…)
Just to finish, as you can see in the above screenshot, i have a lot of texts with the 1.2.3 “organization”, and it will be of much help to be able to get a nested list solution out of this kind of text.
UPDATE (The especific needs)
Now the explanation of “why” i used so many bullets for asumptions:
- No need for the result to be correctly indented, just surrounded by the correct html tags (Because after this, when the user hit Save button, i run htmltidy on it, so it get indented)
- No need to locate the list inside another text, can sume i already have only the list (Because i run the code over the user-selected text in the editor, so i can sume he selected the correct list)
- No need for great performance, regexp, itaration… whatever works is fine (As it an human-use, point-click, point-click, i don’t mind if it takes 0.0001 seconds per use, or 0.1)
- No need for especific language solution, PHP, Python, Javascript, Pseudocode… is fine (I intend to use it in javascript/jQuery, but what i need is just the logic, as i’m blocked… i can tarnslate it if the solution is in another language)
- Can asume ” ” (space) as the only separator after the “1.2.3 ” list text (As it is the 99% of my text-cases)
- Can asume lines are already in the correct order, no need to order them at all (As you can see in the screenshot, that text is human-entered, and i asume they inserted it in the correct order)
Sorry again for not being clear enought, just my first question in Stackoverflow, and i didn’t realize it will look like homework, my fault.
Just for funsies, I went ahead and wrote a solution to your problem using PHP:
However, in deference to this being homework, I included a deliberate error: for it to come out correctly, you need to make a single small change to $input before passing it in… Have fun 🙂