I’m a beginner programmer so this question might sound trivial: I have some text files containg tab-delimited text like:
A
B
C
D
E
Now I want to generate unordered .html lists out of this, with the structure:
<ul>
<li>A
<ul><li>B</li>
<li>C
<ul><li>D</li>
<li>E</li></ul></li></ul></li>
</ul>
My idea was to write a Python script, but if there is an easier (automatic) way, that is fine too. For identifying the indentation level and item name I would try to use this code:
import sys
indent = 0
last = []
for line in sys.stdin:
count = 0
while line.startswith("\t"):
count += 1
line = line[1:]
if count > indent:
indent += 1
last.append(last[-1])
elif count < indent:
indent -= 1
last = last[:-1]
tokenizemodule understands your input format: lines contain a valid Python identifiers, the indentation level of the statements is significant.ElementTreemodule allows you to manipulate tree structures in memory so it might be more flexable to separate a tree creation from a rendering it as html:Any class that provides
.start(),.end(),.data(),.close()methods can be used as aTreeBuildere.g., you could just write html on the fly instead of building a tree.To parse stdin and write html to stdout you could use
ElementTree.write():Output:
You can use any file, not just
sys.stdin/sys.stdout.Note: To write to stdout on Python 3 use
sys.stdout.bufferorencoding="unicode"due to bytes/Unicode distinction.