Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8119343
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T04:46:41+00:00 2026-06-06T04:46:41+00:00

I have a text data file that I likely will read with readLines .

  • 0

I have a text data file that I likely will read with readLines. The initial portion of each string contains a lot of gibberish followed by the data I need. The gibberish and the data are usually separated by three dots. I would like to split the strings after the last three dots, or replace the last three dots with a marker of some sort telling R to treat everything to the left of those three dots as one column.

Here is a similar post on Stackoverflow that will locate the last dot:

R: Find the last dot in a string

However, in my case some of the data have decimals, so locating the last dot will not suffice. Also, I think ... has a special meaning in R, which might be complicating the issue. Another potential complication is that some of the dots are bigger than others. Also, in some lines one of the three dots was replaced with a comma.

In addition to gregexpr in the post above I have tried using gsub, but cannot figure out the solution.

Here is an example data set and the outcome I hope to achieve:

aa = matrix(c(
'first string of junk... 0.2 0 1', 
'next string ........2 0 2', 
'%%%... ! 1959 ...  0 3 3',
'year .. 2 .,.  7 6 5',
'this_string   is . not fine .•. 4 2 3'), 
nrow=5, byrow=TRUE,
dimnames = list(NULL, c("C1")))

aa <- as.data.frame(aa, stringsAsFactors=F)
aa

# desired result
#                             C1  C2 C3 C4
# 1        first string of junk  0.2  0  1
# 2            next string .....   2  0  2
# 3             %%%... ! 1959      0  3  3
# 4                 year .. 2      7  6  5
# 5 this_string   is . not fine    4  2  3

I hope this question is not considered too specific. The text data file was created using the steps outlined in my post from yesterday about reading an MSWord file in R.

Some of the lines do not contain gibberish or three dots, but only data. However, that might be a complication for a follow up post.

Thank you for any advice.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T04:46:42+00:00Added an answer on June 6, 2026 at 4:46 am

    This does the trick, though not especially elegant…

    options(stringsAsFactors = FALSE)
    
    
    # Search for three consecutive characters of your delimiters, then pull out
    # all of the characters after that
    # (in parentheses, represented in replace by \\1)
    nums <- as.vector(gsub(aa$C1, pattern = "^.*[.,•]{3}\\s*(.*)", replace = "\\1"))
    
    # Use strsplit to break the results apart at spaces and just get the numbers
    # Use unlist to conver that into a bare vector of numbers
    # Use matrix(, nrow = length(x)) to convert it back into a
    # matrix of appropriate length
    num.mat <- do.call(rbind, strsplit(nums, split = " "))
    
    
    # Mash it back together with your original strings
    result <- as.data.frame(cbind(aa, num.mat))
    
    # Give it informative names
    names(result) <- c("original.string", "num1", "num2", "num3")
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have Text file that contains data separated with a comma , . How
I have a text file that contains a data like ID Name Path IsTrue
I have a text file that I parse each month and insert the data
I have a text file that contains a data dump from a database. This
I have a text file that contains the following data. The first line is
I have this text file that contains approximately 22 000 lines, with each line
I have a text file that contains cached data in JSON format. I'm trying
I have a method that reads data from a comma separated text file and
We have some C# code that reads data from a text file using a
I have a text file which contains data seperated by '|'. I need to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.