I inherited a horribly-designed table where data is stored like this:
Period | Identifier | Value
----------------------------------
1 | AB1 | some number
1 | AB2 | some number
1 | AB3 | some number
1 | AB4 | some number
1 | AB5 | some number
1 | A1 | some number
1 | A2 | some number
1 | A3 | some number
1 | A4 | some number
1 | A5 | some number
2 | AB1 | some number
2 | AB2 | some number
2 | AB3 | some number
2 | AB4 | some number
2 | AB5 | some number
2 | A1 | some number
2 | A2 | some number
2 | A3 | some number
2 | A4 | some number
2 | A5 | some number
I’m trying to use SELECT statements that will get data into this format:
Row # | First value | Second value
1 | A1's number | AB1's number // The next 5 rows are data from period 1
2 | A2's number | AB2's number
3 | A3's number | AB3's number
4 | A4's number | AB4's number
5 | A5's number | AB5's number
6 | A1's number | AB1's number // These 5 rows are from period 2
7 | A2's number | AB2's number
8 | A3's number | AB3's number
9 | A4's number | AB4's number
10 | A5's number | AB5's number
AB% and A% are two separate ID’s of that format, which mildly frustrates WHERE LIKE ... clauses, I think. I’m not entirely sure the data can be forced into the desired format, but my supervisor asked me to look into it.
My initial attempt, for which I don’t know the SQL code for, would be to look at the row number itself and work with, but as I said, I’m unsure how to progress down that route.
Right now, the data is in SQL Server, but it will be accessed from SAS using proc sql. I think those standards conform to SQL Server for the most part, even though DECLARE isn’t supported.
And no, I don’t know whose idea it was to store the data in this fashion…
If you’re using SAS, then I’d just use PROC TRANSPOSE. Get the data to include a label variable, which determines which variable the data will be moved to:
If for some reason you HAVE to do it in SQL, you are best off doing it as a join to itself. You want to join the row where period=1 and compress(identifier,,’kd’)=1 for both AB and A, so you can do that:
But the PROC TRANSPOSE option is likely to be more efficient than the self join, I’d think (and more flexible, if your data isn’t quite as pretty as you show).