I can use read.csv or read.csv2 to read data into R. But the issue I encountered is that my separator is a multiple-byte string instead of a single character. How can I deal with this?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Providing example data would help. However, you might be able to adapt the following to your needs.
I created an example data file, which is a just a text file containing the following:
I saved it as ‘test.csv’. The separation character is the ‘sep’ string. I think
read.csv()usesscan(), which only accepts a single character forsep. To get around it, consider the following:readLines()just reads the lines in.gsubsubstitutes the multi-character seperation string for a single' ', or whatever is convenient for your data. ThentextConnection()andread.data()reads everything back in conveniently. For smaller datasets, this should be fine. If you have very large data, consider preprocessing with something like AWK to substitute the multi-character separation string. The above is from http://tolstoy.newcastle.edu.au/R/e4/help/08/04/9296.html .Update
Regarding your comment, if you have spaces in your data, use a different replacement separator. Consider changing
test.csvto :Then, with the following function:
Try:
Here, you replace the original separator with tabs (
\t). Theas.isis passed toread.table()to prevent strings being read in is factors, but that’s your call. If you have more complicated white space within your data, you might find thequoteargument inread.table()helpful, or pre-process with AWK, perl, etc.Something similar with crippledlambda’s
strsplit()is most likely equivalent for moderately sized data. If performance becomes an issue, try both and see which works for you.