I’m using sqldf to subset an enormous file. The following command gives me a data.frame of 100 rows and 42 columns.
first <- read.csv.sql("first.txt", sep = " ", header = TRUE, row.names = FALSE,
sql = "SELECT * FROM file WHERE n = '\"n63\"' AND ratio = 1 AND r_name = '\"r1\"' AND method = '\"nearest\"' AND variables = 10")
The structure of the object is
'data.frame': 100 obs. of 42 variables:
$ test_before : chr "TRUE" "TRUE" "TRUE" "TRUE" ...
$ test_after : chr "TRUE" "TRUE" "TRUE" "TRUE" ...
$ meanPSmatchRATIO : chr "1.54845330373635" "1.16857102212364" "1.25330045961256" "1.8011651466717" ...
snipped intervening normally printed columns
$ PSdiff_DIFF : chr "-0.0103938442562762" "-0.00935228868105753" "-0.00947571480267878"
snipped intervening normally printed columns
$ nUNMATCHt : chr "0" "0" "0" "0" ...
$ caliper : chr "\"no\"" "\"no\"" "\"no\"" "\"no\"" ...
$ method : chr "\"nearest\"" "\"nearest\"" "\"nearest\"" "\"nearest\"" ...
$ r_name : chr "\"r1\"" "\"r1\"" "\"r1\"" "\"r1\"" ...
$ ratio : int 1 1 1 1 1 1 1 1 1 1 ...
$ n : chr "\"n63\"" "\"n63\"" "\"n63\"" "\"n63\"" ...
$ variables : int 10 10 10 10 10 10 10 10 10 10 ...
Now, based on this you would expect that when I print the data.frame, all columns (except those int will be character (enclosed in “”)). But you would be wrong!
test_before test_after meanPSmatchRATIO del- nUNMATCHt caliper method r_name ratio n variables
1 TRUE TRUE 1.54845330373635 eted 0 "no" "nearest" "r1" 1 "n63" 10
2 TRUE TRUE 1.16857102212364 ... 0 "no" "nearest" "r1" 1 "n63" 10
3 TRUE TRUE 1.25330045961256 ... 0 "no" "nearest" "r1" 1 "n63" 10
4 TRUE TRUE 1.8011651466717 ...t 0 "no" "nearest" "r1" 1 "n63" 10
Notice that only the last few columns are “character”. I’m a bit lost at what’s going on. Can someone explain?
Looks fine to me.
print.data.framedoesn’t usually print quotes for character columns, but those last few columns have embedded quotes, so that’s why they appear “quoted” by default.