I have started down a rather exciting project. I have had this idea that people in my organisation could drop CSV files into a load folder and then setup a loader in a web interface.
This doesn’t sound that special – however, the beauty is the web app can let the user select only certain columns that they require to be extracted from the CSV – once setup this loader can be run on a regular basis.
The data can then be transformed via a user defined query and potentially loaded into a data wherehouse
The issue I am hitting is a table structure for my staged CSV data – so that I can transform it.
The CSV file structure can vary with lots of columns or very few eg.
CLIENT SALES COST
Mr Smith 234 45
Mr Blogs 256 35
The sturcture I currently have is
ID COLUMNID VALUE FILELOADDATE
1 1 Mr Smith 2012-12-25
2 2 234 2012-12-25
3 3 45 2012-12-25
4 1 Mr Blogs 2012-12-25
5 2 256 2012-12-25
6 3 35 2012-12-25
So the data has been ‘UNPIVOTED’ if you like, allowing me to store various CSV formats
The issue im hitting is now the data has been transposed I have effectively broken the link in that data so I would not know who the sales figure was related to
The approach I have taken is fine if I want to aggregate say sales by date or cost by date or something like that
Is there a different wy I could approach this so that I dont loose that link? could I have another column with row in or something?
I think you kind of answer your own question, if you add a row (and maybe a dataset id?, so you can differ between different csv-files). You could then also move the FILELOADDATE field into a dataset table
Dataset Table: