I wrote a program in C# to calculate TF-IDF to rank documents.
I used the following XML to store the word frequencies within documents. I was criticised heavily for using this structure. Even though I use the text of the word within the Tag, as per me its efficient and consumes less space. Also, I can make a search using XDocument pretty easily since its a nice tree structure. Can you help me understand why was I criticised heavily?
Criticism: How can you add information within meta-data? (For me its innovative).
<word>
<siddhartha>
<doc1> 4 </doc4>
<doc2> 5 </doc2>
<insipration>
<doc1> 4 </doc1>
<doc6> 5 </doc6>
....
</word>
I was suggested something like this:
<word>
<text> siddhartha </text>
<doc1> 4 </doc1>
<text> inspiration </text>
<doc1> 4 </doc1>
...
</word>
Your structure, with word name as node, will be hard to parse with generic parsers. There is no defined structure: you need to read the whole document to know it.
I may have done something like this (I tried to stay closed to your idea):