I’m a beginner programmer so this question might sound trivial: I have some text

Question

0

Asked: June 10, 20262026-06-10T21:56:37+00:00 2026-06-10T21:56:37+00:00

I’m a beginner programmer so this question might sound trivial: I have some text

0

I’m a beginner programmer so this question might sound trivial: I have some text files containg tab-delimited text like:

Now I want to generate unordered .html lists out of this, with the structure:

<ul>
<li>A
<ul><li>B</li>
<li>C
<ul><li>D</li>
<li>E</li></ul></li></ul></li>
</ul>

My idea was to write a Python script, but if there is an easier (automatic) way, that is fine too. For identifying the indentation level and item name I would try to use this code:

import sys
indent = 0
last = []
for line in sys.stdin:
    count = 0
    while line.startswith("\t"):
       count += 1
       line = line[1:]
    if count > indent:
       indent += 1
       last.append(last[-1])
    elif count < indent:
       indent -= 1
       last = last[:-1]

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T21:56:39+00:00

tokenize module understands your input format: lines contain a valid Python identifiers, the indentation level of the statements is significant. ElementTree module allows you to manipulate tree structures in memory so it might be more flexable to separate a tree creation from a rendering it as html:

from tokenize import NAME, INDENT, DEDENT, ENDMARKER, NEWLINE, generate_tokens
from xml.etree import ElementTree as etree

def parse(file, TreeBuilder=etree.TreeBuilder):
    tb = TreeBuilder()
    tb.start('ul', {})
    for type_, text, start, end, line in generate_tokens(file.readline):
        if type_ == NAME: # convert name to <li> item
            tb.start('li', {})
            tb.data(text)
            tb.end('li')
        elif type_ == NEWLINE:
            continue
        elif type_ == INDENT: # start <ul>
            tb.start('ul', {})
        elif type_ == DEDENT: # end </ul>
            tb.end('ul')
        elif type_ == ENDMARKER: # done
            tb.end('ul') # end parent list
            break
        else: # unexpected token
            assert 0, (type_, text, start, end, line)
    return tb.close() # return root element

Any class that provides .start(), .end(), .data(), .close() methods can be used as a TreeBuilder e.g., you could just write html on the fly instead of building a tree.

To parse stdin and write html to stdout you could use ElementTree.write():

import sys

etree.ElementTree(parse(sys.stdin)).write(sys.stdout, method='html')

Output:

<ul><li>A</li><ul><li>B</li><li>C</li><ul><li>D</li><li>E</li></ul></ul></ul>

You can use any file, not just sys.stdin/sys.stdout.

Note: To write to stdout on Python 3 use sys.stdout.buffer or encoding="unicode" due to bytes/Unicode distinction.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m a beginner programmer so this question might sound trivial: I have some text

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply