I am new to MySQL. I am trying to make text documents “machine readable”. I have a bunch of text documents that each contain some metadata (like author, document number etc.,). Also, different parts of the text are marked (like heading, introduction, citations, links etc.,), some of the markup contains metadata (like link references).
I need to be able to search the database by both metadata and text. Also I need to be able to only search for different parts of the document (introduction, etc). I will also need to mark new parts of the text – add additional “markup”.
I can easily imagine how to represent those documents in xml, however, as I need to perform complicated queries over these texts, storing them in xml is not a viable option.
I would like to find basic pointers about how to construct the schema/tables in a way that wouldn’t make adding additional information (esp. “markup”) difficult.
Hopefully the description about what I am trying to achieve isn’t too ambiguous.
Help much appreciated.
The requirements you have described suggests that what you need is not really a MySQL (or any other vendor) relational database but rather a Lucene index. At least that is what (Lucene) I have used to accomplish similar goals.
Since the question was not really specific (see https://stackoverflow.com/faq#questions) I will give you a general answer.
So try Solr , which is Lucene combined with MySQL. Try to go through this tutorial http://lucene.apache.org/solr/api-3_6_1/doc-files/tutorial.html