Use the following function like this: Image('/path/to/original.image', '1/1', '150*', './thumb.jpg');…

Question

0

Asked: May 11, 20262026-05-11T21:52:36+00:00 2026-05-11T21:52:36+00:00

Requirements : I have a Python project which parses data feeds from multiple sources

0

Requirements:
I have a Python project which parses data feeds from multiple sources in varying formats (Atom, valid XML, invalid XML, CSV, almost-garbage, etc…) and inserts the resulting data into a database. The catch is the information required to parse each of the feeds must also be stored in the database.

Current solution:
My previous solution was to store small python scripts which are evaled on the raw data, and return a data object for the parsed data. I’d really like to get away from this method as it obviously opens up a nasty security hole.

Ideal solution:
What I’m looking for is what I would describe as a template-driven feed parser for Python, so that I can write a template file for each of the feed formats, and this template file would be used to make sense of the various data formats.

I’ve had limited success finding something like this in the past, and was hoping someone may have a good suggestion.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T21:52:36+00:00

Instead of evaling scripts, maybe you should consider making a package of them?
Parsing CSV is one thing — the format is simple and regular, parsing XML requires completely another approach. Considering you don’t want to write every single parser from scratch, why not just write a bunch of small modules, each having identical API and use them? I believe, using Python itself (not some templating DSL) is ideal for this sort of thing.

For example, this is an approach I’ve seen in one small torrent-fetching script I’m using:

Main program:

...
def import_plugin(name):
    mod = __import__(name)
    components = name.split('.')
    for comp in components[1:]:
        mod = getattr(mod, comp)
    return mod

...
feed_parser = import_plugin('parsers.%s' % feed['format'])
data = feed_parser(...)
...

parsers/csv.py:

#!/usr/bin/python
from __future__ import absolute_import

import urllib2
import csv

def parse_feed(...):
    ...

If you don’t particularly like dynamically loaded modules, you may consider writing, for example, a single module with several parses classes (probably derived from some “abstract parser” base class).

class BaseParser(object):
    ...

class CSVParser(BaseParser):
    ...
register_feed_parser(CSVParser, ['text/plain', 'text/csv'])
...

parsers = get_registered_feed_parsers(feed['mime_type'])
data = None
for parser in parsers:
    try:
        data = parser(feed['data'])
        if data is not None: break
    except ParsingError:
        pass
...

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions