Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7907213
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T11:19:34+00:00 2026-06-03T11:19:34+00:00

I’m trying to write a program that will take an HTML file and make

  • 0

I’m trying to write a program that will take an HTML file and make it more email friendly. Right now all the conversion is done manually because none of the online converters do exactly what we need.

This sounded like a great opportunity to push the limits of my programming knowledge and actually code something useful so I offered to try to write a program in my spare time to help make the process more automated.

I don’t know much about HTML or CSS so I’m mostly relying on my brother (who does know HTML and CSS) to describe what changes this program needs to make, so please bear with me if I ask a stupid question. This is totally new territory for me.

Most of the changes are pretty basic — if you see tag/attribute X then convert it to tag/attribute Y. But I’ve run into trouble when dealing with an HTML tag containing a style attribute. For example:

<img src="http://example.com/file.jpg" style="width:150px;height:50px;float:right" />

Whenever possible I want to convert the style attributes into HTML attributes (or convert the style attribute to something more email friendly). So after the conversion it should look like this:

<img src="http://example.com/file.jpg" width="150" height="50" align="right"/>

Now I realize that not all CSS style attributes have an HTML equivalent, so right now I only want to focus on the ones that do. I whipped up a Python script that would do this conversion:

from bs4 import BeautifulSoup
import re

class Styler(object):

    img_attributes = {'float' : 'align'}

    def __init__(self, soup):
        self.soup = soup

    def format_factory(self):
        self.handle_image()

    def handle_image(self):
        tag = self.soup.find_all("img", style = re.compile('.'))
        print tag
        for i in xrange(len(tag)):
            old_attributes = tag[i]['style']
            tokens = [s for s in re.split(r'[:;]+|px', str(old_attributes)) if s]
            del tag[i]['style']
            print tokens

            for j in xrange(0, len(tokens), 2):
                if tokens[j] in Styler.img_attributes:
                    tokens[j] = Styler.img_attributes[tokens[j]]

                tag[i][tokens[j]] = tokens[j+1]

if __name__ == '__main__':
    html = """
    <body>hello</body>
    <img src="http://example.com/file.jpg" style="width:150px;height:50px;float:right" />
    <blockquote>my blockquote text</blockquote>
    <div style="padding-left:25px; padding-right:25px;">text here</div>
    <body>goodbye</body>
    """
    soup = BeautifulSoup(html)
    s = Styler(soup)
    s.format_factory()

Now this script will handle my particular example just fine, but it’s not very robust and I realize that when put up against real world examples it will easily break. My question is, how can I make this more robust? As far as I can tell Beautiful Soup doesn’t have a way to change or extract individual pieces of a style attribute. I guess that’s what I’m looking to do.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T11:19:35+00:00Added an answer on June 3, 2026 at 11:19 am

    For this type of thing, I’d recommend an HTML parser (like BeautifulSoup or lxml) in conjunction with a specialized CSS parser. I’ve had success with the cssutils package. You’ll have a much easier time than trying to come up with regular expressions to match any possible CSS you might find in the wild.

    For example:

    >>> import cssutils
    >>> css = 'width:150px;height:50px;float:right;'
    >>> s = cssutils.parseStyle(css)
    >>> s.width
    u'150px'
    >>> s.height
    u'50px'
    >>> s.keys()
    [u'width', u'height', u'float']
    >>> s.cssText
    u'width: 150px;\nheight: 50px;\nfloat: right'
    >>> del s['width']
    >>> s.cssText
    u'height: 50px;\nfloat: right'
    

    So, using this you can pretty easily extract and manipulate the CSS properties you want and plug them into the HTML directly with BeautifulSoup. Be a little careful of the newline characters that pop up in the cssText attribute, though. I think cssutils is more designed for formatting things as standalone CSS files, but it’s flexible enough to mostly work for what you’re doing here.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
this is what i have right now Drawing an RSS feed into the php,
I am trying to render a haml file in a javascript response like so:
In my XML file chapters tag has more chapter tag.i need to display chapters
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I need a function that will clean a strings' special characters. I do NOT
I'm trying to create an if statement in PHP that prevents a single post
I'm working with an upstream system that sometimes sends me text destined for HTML/XML
I am trying to understand how to use SyndicationItem to display feed which is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.