If you are parsing html or xml (with python), and looking for certain tags, it can hurt performance to lower or uppercase an entire document so that your comparisons are accurate. What percentage (estimated) of xml and html docs use any upper case characters in their tags?
Share
I think you’re overly concerned about performance. If you’re talking about arbitrary web pages, 90% of them will be HTML, not XHTML, so you should do case-insensitive comparisons. Lowercasing a string is extremely fast, and should be less than 1% of the total time of your parser. If you’re not sure, carefully time your parser on a document that’s already all lowercase, with and without the lowercase conversions.
Even a pure-Python implementation of lower() would be negligible compared to the rest of the parsing, but it’s better than that – CPython implements lower() in C code, so it really is as fast as possible.
Remember, premature optimization is the root of all evil. Make your program correct first, then make it fast.