I wish to save some semantic information about data in a table. How can I save this information in MySQL, such that I can access data and also search for the articles using the semantic data.
For example, I have a article about Apple and Microsoft. The semantic data will be like
Person : Steve Jobs
Person : Steve Ballmer
Company : Apple
Company : Microsoft
I want to save the information without losing the info that Steve Jobs and Steve Ballmer are persons and Apple and Microsoft are companies. I also want to search for articles about Steve Jobs / Apple.
Person and Company are not the only possible types, hence adding new fields is not viable. Since the type of the data is to be saved, I cannot use FullText field type directly.
Update – These are two options that I am considering.
- Save the data in a full text column as serialized php array.
- Create another table with 3 columns
—
--------------------------------
| id | subject | object |
--------------------------------
| 1 | Person | Steve Ballmer |
| 1 | Person | Steve Jobs |
| 1 | Company | Microsoft |
| 1 | Company | Apple |
| 2 | Person | Obama |
| 2 | Country | US |
--------------------------------
You’re working on a hard and interesting problem! You may get some interesting ideas from looking at the Dublin Core Metadata Initiative.
http://dublincore.org/metadata-basics/
To make it simple, think of your metadata items as all fitting in one table.
e.g.
The trick here is that some, but not all, your first and third column values need to be BOTH arbitrary text AND serve as indexes into the first and third columns. Then, if you’re trying to figure out what your data base has on Spolsky, you can full-text search your first and third columns for his name. You’ll get out a bunch of triplets. The values you find will tell you a lot. If you want to know more, you can search again.
To pull this off you’ll probably need to have five columns, as follows:
The point of the canonical forms of your subject and object is to allow queries like this to work, even if your user puts in “Joel Spolsky” and “Spolsky, Joel” in two different places even if they mean the same person.