I want to select all the rows in a table that have a particular field property. Namely the field is one of a hundred thousand strings that are known.
I’m using R to send the query. So I have code which looks like:
my_samp <- sample(unique_ids, replace=T, size=10^6)
my_conn <- dbConnect(MySQL(),user='me',dbname='my_db')
query <- paste('SELECT blah1, blah2,...,blah900 FROM my_table WHERE blah1=',paste(my_samp,collapse=' OR ',sep=''),sep='')
res1 <- dbSendQuery(my_conn, query)
my_data <- fetch(res1,n=-1)
dbDisconnect(my_conn)
However, I fear that the string won’t be processed correctly by either the R interpreter or mysql. I also suspect that this is inefficient. Is there a cleaner solution?
I can’t read the whole table directly into R with dbReadTable as it’s too big.
I’m planning to do this several thousand times by forking the process – statistical bootstrapping.
There need to be duplicate records in the query if the string is repeated in my_samp.
The best way to do that is to:
Then A) create a new join query with those tables or B) use the “IN clause” with an immediate list as in: