I am building a data warehouse. I need to get data from different sources and put it together so that I can generate reports. I will do lots of joining of tables. I am talking about maybe 20 tables total and each table is going to be anywhere from 100mb to 5 gigs.
I would like to know if I should be creating different databases for each table since each table might have an entirely different TYPE of dataset.
For example, I might have one table that has 1 GB of data about design of cars. And I will have another table with 3 GBs of sales data on these cars.
Would it be appropriate to separate these into different databases?
Please let me know what additional information is needed to advise me on this situation.
If there’s a logical or business separation, by all means put them in different databases. That’s just clean data application development. However, if you’re going to be joining or merging the different data sets, then you can save some overhead and admin costs by having a single database. 20 tables total isn’t a lot (I’m working on a system that has about 3700 tables, though ~1600 are audits). Keep in mind SQL Server is meant to scale up to terabytes of data, provided you have a decent model, indexes, etc.
If you’re concerned with performance of the warehouse, you can jam that server full of RAM and harddrives. To leverage the harddrives properly you’d want to look at leveraging multiple files / filegroups and doling the tables out appropriately.