Are there any security risks in allowing(whitelist only) pure markup tags such as a, b, i, etc in post submission?
BB code seems like a heavy solution to the problem of injecting code and whitelisting “safe” html tags seems easier then going through all the parsing and conversion that bb code requires.
I have found that many bb code libraries have issues with nested elements(is this because they use a FSA or regex, instead of a proper parser?) and blockquote or fieldset are properly parsed by the web browser.
Any and all opinions are greatly appreciated.
This is something everyone seems to get wrong, while it is so simple.
Use a parser
It doesn’t matter whether you use markdown, html, bbcode, whatever.
Use a parser. A real parser. Not a bunch of regexes.
The parser gives you a syntaxtree. From the syntaxtree you derive the html (still as a tree of objects). Clean the tree (using a whitelist), print the html.
Using html as syntax is perfectly fine. Just don’t try to clean it with regexes.