Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8258791
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T02:40:04+00:00 2026-06-08T02:40:04+00:00

I large messy data files that look something like this: 1 2 3 4

  • 0

I large messy data files that look something like this:

1 2  3    4   5 6  7   8 . .
aa bb  ccc d eee     ffff gg h i jj
6      6   5 1 2 3 4 5i      734
33  44x    1234  12  1    9  888  345     12   987765

Most, but not all, lines in a data file have the same number of elements. What is the best way to read such a data file and convert it to a matrix or data frame?

I have been using readLines to read the file.

Also, I know from an answer to one of my earlier questions that an asymmetric list can be converted to a matrix using the following three lines:

R: convert asymmetric list to matrix – number of elements in each sub-list differ

max.len <- max(sapply(my.data, length))
corrected.list <- lapply(my.data, function(x) {c(x, rep(NA, max.len - length(x)))})
mat <- do.call(rbind, corrected.list)

I was thinking maybe I could:

  1. read the data file with readLines
  2. split each row in the data set into its separate elements, and then
  3. convert the entire data set into a list, and then
  4. use the three lines above to create a matrix

However, I get stuck on Step 2. I cannot figure out how to split each line into separate elements because the number of empty spaces between elements varies. Further, I suspect the proposed 4-step strategy is not efficient.

Thank you for any help with this problem.

EDIT

Sorry I forgot to post the desired result. I would like the data to look something like this once it is in the matrix or dataframe:

1   2    3     4   5    6     7    8    .    .
aa  bb   ccc   d   eee  ffff  gg   h    i    jj
6   6    5     1   2    3     4    5i   734  NA
33  44x  1234  12  1    9     888  345  12   987765
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T02:40:06+00:00Added an answer on June 8, 2026 at 2:40 am

    Could you use strsplit to achieve part 2?

    test <- readLines(textConnection("1 2  3    4   5 6  7   8 . .
    aa bb  ccc d eee     ffff gg h i jj
    6      6   5 1 2 3 4 5i      734
    33  44x    1234  12  1    9  888  345     12   987765"))
    
    test <- strsplit(test,"[[:space:]]+")
    
    max.len <- max(sapply(test, length))
    corrected.list <- lapply(test, function(x) {c(x, rep(NA, max.len - length(x)))})
    mat <- do.call(rbind, corrected.list)
    

    Result:

    > mat
         [,1] [,2]  [,3]   [,4] [,5]  [,6]   [,7]  [,8]  [,9]  [,10]   
    [1,] "1"  "2"   "3"    "4"  "5"   "6"    "7"   "8"   "."   "."     
    [2,] "aa" "bb"  "ccc"  "d"  "eee" "ffff" "gg"  "h"   "i"   "jj"    
    [3,] "6"  "6"   "5"    "1"  "2"   "3"    "4"   "5i"  "734" NA      
    [4,] "33" "44x" "1234" "12" "1"   "9"    "888" "345" "12"  "987765"
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm working on a large and extremely messy javascript file, and I would like
I've written a program that queries a large and messy sql database and then
Hey. I've a very large C# solution, with many projects that are messy in
I'm trying to extract a set of data from some (large) text files. Basically,
I have a Clojure proxy statement that was getting large and messy, so I
Large complex make files can be daunting to read and examine. What tools are
Interpolating Large Datasets I have a large data set of about 0.5million records representing
how to sort very large numbers like 03 10103538 2222 1233 6160 0142 03
I have a large number of records (10,000, increasing every day) that essentially is
I've inherited a rather large and somewhat messy codebase, and have been tasked with

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.