I have a text file with few first few lines not required and then there is a table which is like this
-Hyphen line-
| col1 | col2 | col3 col4 col5 |
-Hyphen line-
| 1 | 2:24:21 PM 3/22/2012 | 0 0 1 |
| 2 | 2:24:21 PM 3/22/2012 | 1 · 0 |
- Col1, Col2 are separated by | but col3,col4 and col5 are just separated by space.
- Data Type should be maintained like col2 date and col3,4,5 as number ?
- Row 2, Col4 is dot and so it should be read ad NA
- Hyphen lines starts and end with – – –
Question:
1. I can use scan, but how to avoid reading “|” and “-” ?
2. I can skip top few lines but how to skip say 50th lines in addition to top few lines.
You can read it as a table as-is, then split up the column and recombine.
I avoided referring to the original column names, since you said you want to skip those lines. Simply add
header = FALSEandskip = 50to theread.tablecall, then add whatever column names make sense afterwards.Also, you can then strip the “.” from columns where necessary, convert to date-time formats or numbers as required. Use
colClassesin read.table if you know them up front. It makes sense to me to break this down into a number of steps, rather than trying to do it all with one read function.