Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6062923
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T09:06:24+00:00 2026-05-23T09:06:24+00:00

I will try and keep this short and to the point. Given the following

  • 0

I will try and keep this short and to the point.

Given the following

#!/usr/bin/python
from lxml import etree

root = etree.Element('root')
sect = etree.SubElement(root,'sect')
para = etree.SubElement(sect,'para')
para.text = 'this is a [b]long[/b] block of text. Much longer than this example makes it out to be.'

how would I be best going about converting the output to what I have below. notice the [b]’s became element <b>

<root> 
  <sect>
    <para>
       this is a <b>long</b> block of text. 
      Much longer than this example makes it out to be.
    </para>
  </sect>
</root>

My real input and xml is considerably more complex. However, this is the gist of it. I have taken a standardly formatted text document and I am converting it to xml. The structure of the document is rather static. Therefore, this is not as crazy as it sounds. I currently have it broken into lines. This is relevant, because as I go through each line I have no trouble identifying <sect> or a <title>, but often times a <para> will have some extra formatting in its line. In this example, a [b], that needs to be converted yet again. What would be the best way of accomplishing this?

Items to keep in mind

  1. the authors of my input texts are not always consistent. therefore, it would be best to develop a lose regexp to find [b] WORD [/b] or when the authors errors something like [b[WORD[/b]. my current idea is to match something like [b or b]

  2. I am currently processing my input file line by line. I have removed any blank lines. should I consider processing this afterwards? I have no strong goal, but feel that this can be contained in a single loop through the text.

  3. This will need to play well with lxml when I output my document. for example see the edit below with my comment on the bbc parser

I have worked on this most of the afternoon, and can discuss more of the routes I have taken. I will be working on this throughout the evening so if I come across other items to keep in mind I will update this question accordingly.

EDIT: Or my problem with bbc parser

Paul thoughtfully suggested postmarkup-1.1.4, however, as you can see it does not play well with lxml. converting the elements to enities. This was a problem I ran into this afternoon when I did this through a search and replace. Ultimately, this is a perfect sed solution. As was pointed out. However, I was hoping to have not be the end user of this script and would rather everything contained within one command.

>>> p.text = render_bbcode(p.text)
>>> p.text
'this is a <strong>long</strong> text string'
>>> etree.tostring(root)
'<root><p>this is a &lt;strong&gt;long&lt;/strong&gt; text string</p></root>'

doing this in reverse returns equally poor results

 >>> p.text
 'this is a [b]long[/b] text string
 >>> render_bbcode(etree.tostring(root))
 u'&lt;root&gt;&lt;p&gt;this is a <strong>long</strong> string&lt;/p&gt;&lt;/root&gt;'
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T09:06:25+00:00Added an answer on May 23, 2026 at 9:06 am

    The postmarkup library seems to come closest to what you want to do.

    http://pypi.python.org/pypi/postmarkup/1.1.4

    Unfortunately it hasn’t seen a lot of development recently, but I don’t see any other libraries that look tons better.

    Starting from there and modifying the existing elements to fit your syntax is probably faster than reinventing the parsing wheel from scratch.

    If that isn’t a good direction, you might look at the more low-level syntax lexing and parsing, but that will rapidly get complex to the point that you might be better of with simple repetitive regexes and hand correction. How big is your corpus?

    The final item of note is that tasks like this are precisely what sed was written to do. It can be amazingly powerful if you’re willing to learn how to use it. If you’re not already comfortable with it though, the Python might be easier.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I will try to keep this short. I'm trying to create boxes of text
I am working on an algorithm that will try to pick out, given an
I'm fairly new to C# but I will try to make this quick! ;)
This might be a little hard to explain, but I will try. I want
I'm try to develop a regex that will be used in a C# program..
I am about to try and automate a daily build, which will involve database
I suspect this could be something faulty with Excel (although I keep an open
I'll try and keep my sample code very straightforward, but it may have errors
I keep getting the error Stream was not writable whenever I try to execute
I want to use a temp directory that will be unique to this build.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.