At the risk of sounding foolish, in a scenario where large data fields need to be persisted (such as with blog posts), is database storage always the best solution?
I’m guessing bloating the database is probably not too high a risk, as thats kind of what databases are meant to be good at, right? Also databases can be good for text indexing and fast access. Is that assumption correct?
It occurs to me that that kind of data could be stored outside of the database in some kind of xml flat file, I’m not sure that’s a good idea…
Storing text inside a database, including things like blog posts, is something often done. There are database to handle this.
It’s also common to store large content (eg images, large text files, etc) outside the database (ie in the filesystem) and reference them from the database. Doing this may limit your database size but presents other problems such as handling concurrency issues (like editing the file at the same time).
Lots of factors come into play to determine which is the most appropriate solution, including how often things are edited, how large the files are, how many files you have and so on.
As for database handling of text indexing, support varies. MySQL (using MyISAM storage) has full-text searching for example. SQL Server with the right add-on has it too. Same with Oracle. It can be useful but is more limited than a general-purpose search engine (eg Lucerne). Your requirements and constraints will determine if database indexing is sufficient or you need a search engine type solution.
To give you a real and specific example, the StackOverflow search is implemented using SQL Server full text searching and many have criticized it for being ineffective compared to using Google’s “site:stackoverflow.com ….” (which I use by default pretty much).