I large messy data files that look something like this: 1 2 3 4

Question

0

Asked: June 8, 20262026-06-08T02:40:04+00:00 2026-06-08T02:40:04+00:00

I large messy data files that look something like this: 1 2 3 4

0

I large messy data files that look something like this:

1 2  3    4   5 6  7   8 . .
aa bb  ccc d eee     ffff gg h i jj
6      6   5 1 2 3 4 5i      734
33  44x    1234  12  1    9  888  345     12   987765

Most, but not all, lines in a data file have the same number of elements. What is the best way to read such a data file and convert it to a matrix or data frame?

I have been using readLines to read the file.

Also, I know from an answer to one of my earlier questions that an asymmetric list can be converted to a matrix using the following three lines:

R: convert asymmetric list to matrix – number of elements in each sub-list differ

max.len <- max(sapply(my.data, length))
corrected.list <- lapply(my.data, function(x) {c(x, rep(NA, max.len - length(x)))})
mat <- do.call(rbind, corrected.list)

I was thinking maybe I could:

read the data file with readLines
split each row in the data set into its separate elements, and then
convert the entire data set into a list, and then
use the three lines above to create a matrix

However, I get stuck on Step 2. I cannot figure out how to split each line into separate elements because the number of empty spaces between elements varies. Further, I suspect the proposed 4-step strategy is not efficient.

Thank you for any help with this problem.

EDIT

Sorry I forgot to post the desired result. I would like the data to look something like this once it is in the matrix or dataframe:

1   2    3     4   5    6     7    8    .    .
aa  bb   ccc   d   eee  ffff  gg   h    i    jj
6   6    5     1   2    3     4    5i   734  NA
33  44x  1234  12  1    9     888  345  12   987765

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T02:40:06+00:00

Could you use strsplit to achieve part 2?

test <- readLines(textConnection("1 2  3    4   5 6  7   8 . .
aa bb  ccc d eee     ffff gg h i jj
6      6   5 1 2 3 4 5i      734
33  44x    1234  12  1    9  888  345     12   987765"))

test <- strsplit(test,"[[:space:]]+")

max.len <- max(sapply(test, length))
corrected.list <- lapply(test, function(x) {c(x, rep(NA, max.len - length(x)))})
mat <- do.call(rbind, corrected.list)

Result:

> mat
     [,1] [,2]  [,3]   [,4] [,5]  [,6]   [,7]  [,8]  [,9]  [,10]   
[1,] "1"  "2"   "3"    "4"  "5"   "6"    "7"   "8"   "."   "."     
[2,] "aa" "bb"  "ccc"  "d"  "eee" "ffff" "gg"  "h"   "i"   "jj"    
[3,] "6"  "6"   "5"    "1"  "2"   "3"    "4"   "5i"  "734" NA      
[4,] "33" "44x" "1234" "12" "1"   "9"    "888" "345" "12"  "987765"

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I large messy data files that look something like this: 1 2 3 4

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply