I searched Google and various forums for large XML files but apart from DBLP which is 1.0 GB and too-big for my needs, haven’t found anything. I need sets of 30-50 MB, 100-300 MB and something like 500 MB. Does anyone know any?
p.s. Don’t propose data generators since I need real data in order to use in testing with meaningful queries.
Finally I found good datasets. They are on:
http://dumps.wikimedia.org/mirrors.html
These are datasets from various wikis, including wikipedia. One may find various size datasets from 10MB to 500-600MB.