I have a requirement to scrape 3 million documents. They are all text and varchar fields. For sample I just scraped 250 documents and when I ran EXEC sp_spaceused it gave me 26.6 MB as Database size. I want to know can we calculate the size required to store 3 million documents from this? Offcourse that would be an approximate value. But question is can we really calculate based on this value?
I have a requirement to scrape 3 million documents. They are all text and
Share
250 docs is quite a small sample for 3 million docs. Depending on what else you have in the DB it’s hard to say how much of those 26.6 are made up of documents.
I’d say that 26.6MB / 250 * 3000000 ~= 319GB is
an upper bounda high estimate on the size of the DB with all the documents, assuming that the 250 you’ve scraped are representative samples.