I am working on a server application which receives data over a TCP socket in an XMPP-like XML format, i.e. every child of the <root> element essentially represents one separate request (stanza). The connection is closed as soon as </root> is received.
I do know that I must use a stream parser like SAX, somehow. Though, for convenience, I’d prefer to have a tree-like interface to access each stanza’s child elements. (The data sent with every request is not large so I think it makes sense to read each stanza as a whole.)
What’s the best way to realize that in Python (preferably v3)?
This is the code I’d like to build it in. Feel free to point me in a totally different direction to solve this issue.
import socketserver
import settings
class MyServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
pass
class MyRequestHandler(socketserver.StreamRequestHandler):
def handle(self):
pass
if __name__ == '__main__':
server = MyServer((settings.host, settings.port), MyRequestHandler)
server.serve_forever()
You’ll want to use a push based parser that emits SAX events. Basically you want a parser that you can call pushChunk(data) with a partial bit of data, and have it an event handler for the first-level child end tag event that generates your stanzas. That can then be sent to application processing logic.
If you want to see an example of this, here is the expat parser for libstrophe, an XMPP client library I wrote:
http://github.com/metajack/libstrophe/blob/master/src/parser_expat.c
Building a whole document for each stanza is quite expensive. It is possible to implement this with a single parser instance, as opposed to continually making new document parsers for each stanza.
If you need a working Python version, you can probably use or pull out the code from Twisted Words (twisted.words.xish I believe).