I asked this question on meta, but i now realize that it may be more appropriate for the main site as it is a general question that would relate to any tagging based system (i am happy to close / delete one depending on where people think this question should go)
i have a similar system of tagged data and i am running into the same problem as SOF did where i have lots of tags that are really the same thing. I am trying to create a tag synonym page similar to SOF to support organizing this information.
A few questions around the relationships and “data model” of tag synonyms:
I assume that a master tag can have multiple synonym tags but a synonym tag can only be a
synonym for one master tag. Is that correct?
Also, can a master tag also be a synonym tag? For example, lets say you have a tag called javascript and you had:
Master: js
Synonyms: java-script, js-web
can you also have:
Master: javascript
Synonyms: js
So in the example above, you would keep resolving to ultimately resolve js-web to javascript because the master tag: js is itself a synonym tag.
Also, that makes me think you could also run into a circular reference where you have a
Master: js
Synonyms: java-script
and
Master: javascript
Synonyms: js
How does the system deal with circular refernces?
It is tempting to give you a more theoretical answer on meta concerning folksonomies, polysemy and such! Since I am answering on the StackOverflow side I will try and give a marginally more technical answer. Running queries using the StackOverflow Data Explorer will allow me to attempt to answer your questions (I am not affiliated with StackOverflow so I can’t know for sure).
On StackOverflow the master/synonym tag relationship is carefully stewarded and cultivated. At the time of writing from the Data Explorer:
It is interesting to contrast this with other folksonomies, one article “Technorati tags: Good idea, terrible implementation” states.
“Technorati advertises that they’re now tracking 466,951 different tags, which is pretty darn impressive when you consider that a typical dictionary has around 75,000 entries”
A quick caveat, I usually write Oracle SQL and I assume that the Data Explorer is using SQLServer so my queries may be a little amateurish. Firstly my presumptions about the data:
Now to your specific queries:
“I assume that a master tag can have multiple synonym tags but a synonym tag can only be a synonym for one master tag. Is that correct?”
Result: Yes. A master tag can have multiple synonym tags.
Result: Yes. A synonym tag can only be a synonym for one master tag.
“Also, can a master tag also be a synonym tag?”
Result: Yes. A master tag can also be a synonym tag. When I ran this query there were 465 tags that were both synonym and master
“How does the system deal with circular references?”
This is where my logic/SQL may let me down. The question is can I find any circular references? To do this I think I need to work out:
Anything in set c would be a circular reference.
We have already calculated set a above (it has 465 rows).
Set b – synonyms for the synonyms of set a
Result: 0 rows
We can stop here, there is no point working out set c as we already know set b is empty.
Unless I got my logic or SQL wrong (which is very possible) it seems there are no circular references in StackOverflow. I would imagine there are technical processes in place to prevent circular references from happening (otherwise StackOverflow could suffer StackOverflow!).