I have a site that wraps some user-generated content, and I want to be able to separate the markup for the layout, and the markup from the user-generated content, so the u-g content can’t break the site layout.
The user-generated content is trusted, as it is coming from a known group of users on my network, but nonetheless only a small subset of html tags are allowed (p, ul/ol/li, em, strong, and a couple more). However, the user-generated content is not guaranteed to be well-formed, and we have had some instances of malformed user-generated content breaking the layout of the site.
We are working with our users to keep the content well-formed, but in the meantime I am trying to find a good way to separate the content from the layout. I have been looking into namespaces, but have been unable to find good documentation about CSS support for embedded namespaces.
Anyone have any good ideas?
EDIT
I have seen some really good suggestions here, but I should probably clarify that I have absolutely no control over the input mechanism that the users use. They are entering content into one system, and my page uses that system’s API to pull content out of it. That system is using TinyMCE, but like I said, we are still getting some malformed content.
Maybe overkill, but HTML
Tidy
could help if you can use it.
Use a WYSIWYG like
TinyMCE
or CKEditor that has built in cleanup methods.
Robert Koritnik’s suggestion to use markdown seems brilliant, especially considering that you only allow a few harmless formatting tags.
I don’t think there’s anything you can do with CSS to stop layouts from breaking due to open HTML tags, so I would probably forget that idea.