I’ve got about 25 tables that I’d like to update with random data that’s picked from a subset of data. I’d like the data to be picked at random but meaningful — like changing all the first names in a database to new first names at random. So I don’t want random garbage in the fields, I’d like to pull from a temp table that’s populated ahead of time.
The only way I can think of to do this is with a loop and some dynamic sql.
- insert pick-from names into temp table
with id field - foreach table name in a list of
tables:- build a dynamic sql that updates all
first name fields to be a name
picked at random from the temp table based on rand() * max(id) from temp table
- build a dynamic sql that updates all
But anytime I think “loop” in SQL I figure I’m doing something wrong.
The database in question has a lot of denormalized tables in it, so that’s why I think I’d need a loop (the first name fields are scattered across the database).
Is there a better way?
Breaking the 4th wall a bit by answering my own question.
I did try this as a sql script. What I learned is that SQL pretty much sucks at random. The script was slow and weird — functions that referenced views that were only created for the script and couldn’t be made in tempdb.
So I made a console app.
to do with the Random class (just
remember to only use one instance of
Random).
names that you’d like to update via
a script that looks at
information_schema.
for all the tables that you’re going
to update, if possible (and wow will
it be slow if you have a large table
that doesn’t have any good PKs).
Wash, rinse, repeat. I updated about 2.2 million rows in an hour this way. Maybe it could be faster, but it was doing many small updates so it didn’t get in anyone’s way.