Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6863985
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T02:51:18+00:00 2026-05-27T02:51:18+00:00

DATA AND REQUIREMENTS The first table ( myMatrix1 ) is from an old geological

  • 0

DATA AND REQUIREMENTS

The first table (myMatrix1) is from an old geological survey that used different region boundaries (begin and finish) columns to the newer survey.
What I wish to do is to match the begin and finish boundaries and then create two tables one for the new data on sedimentation and one for the new data on bore width characterised as a boolean.

myMatrix1 <- read.table("/path/to/file")
myMatrix2 <- read.table("/path/to/file")

> head(myMatrix1)  # this is the old data

    sampleIDs begin finish   
1    19990224 4     5 
2    20000224 5     6 
3    20010203 6     8 
4    20019024 29    30 
5    20020201 51    52 

> head(myMatrix2)   # this is the new data

     begin finish  sedimentation    boreWidth
1    0     10       1.002455        0.014354
2    11    367      2.094351        0.056431
3    368   920      0.450275        0.154105
4    921   1414     2.250820        1.004353
5    1415  5278     0.114109        NA`

Desired output:

> head(myMatrix6)

    sampleIDs begin finish  sedimentation #myMatrix4
1    19990224 4     5       1.002455
2    20000224 5     6       1.002455
3    20010203 6     8       2.094351
4    20019024 29    30      2.094351
5    20020201 51    52      2.094351

> head(myMatrix7)

    sampleIDs begin finish  boreWidthThresh #myMatrix5
1    19990224 4     5       FALSE
2    20000224 5     6       FALSE
3    20010203 6     8       FALSE
4    20019024 29    30      FALSE
5    20020201 51    52      FALSE`

CODE

The following code has taken me several hours to run on my dataset (about 5 million data points). Is there any way to change the code to make it run any faster?

# create empty matrix for sedimentation
myMatrix6 <- data.frame(NA,NA,NA,NA)[0,]
names(myMatrix6) <- letters[1:4]

# create empty matrix for bore
myMatrix7 <- data.frame(NA,NA,NA,NA)[0,]
names(myMatrix7) <- letters[1:4]

for (i in 1:nrow(myMatrix2))
{       
    # create matrix that has the value of myMatrix1$begin being 
    # situated between the values of myMatrix2begin[i] and myMatrix2finish[i]
    myMatrix3 <- myMatrix1[which((myMatrix1$begin > myMatrix2$begin[i]) & (myMatrix1$begin <      myMatrix2$finish[i])),]

    myMatrix4 <- rep(myMatrix2$sedimentation, nrow(myMatrix3))

    if (is.na(myMatrix2$boreWidth[i])) {
        myMatrix5 <- rep(NA, nrow(myMatrix3))
    }
    else if (myMatrix2$boreWidth[i] == 0) {
    myMatrix5 <- rep(TRUE, nrow(myMatrix3))
    }
    else if (myMatrix2$boreWidth[i] > 0) {
    myMatrix5 <- rep(FALSE, nrow(myMatrix3))
    }

    myMatrix6 <- rbind(myMatrix6, cbind(myMatrix3, myMatrix4))
    myMatrix7 <- rbind(myMatrix7, cbind(myMatrix3, myMatrix5))
}

EDIT:

> dput(head(myMatrix2)

structure(list(V1 = structure(c(6L, 1L, 2L, 4L, 5L, 3L), .Label = c("0", 
"11", "1415", "368", "921", "begin"), class = "factor"), V2 = structure(c(6L, 
1L, 3L, 5L, 2L, 4L), .Label = c("10", "1414", "367", "5278", 
"920", "finish"), class = "factor"), V3 = structure(c(6L, 3L, 
4L, 2L, 5L, 1L), .Label = c("0.114109", "0.450275", "1.002455", 
"2.094351", "2.250820", "sedimentation"), class = "factor"), 
    V4 = structure(c(5L, 1L, 2L, 3L, 4L, 6L), .Label = c("0.014354", 
    "0.056431", "0.154105", "1.004353", "boreWidth", "NA"), class = "factor")), .Names = c("V1", 
"V2", "V3", "V4"), row.names = c(NA, 6L), class = "data.frame")

> dput(head(myMatrix1)

structure(list(V1 = structure(c(6L, 1L, 2L, 3L, 4L, 5L), .Label = c("19990224", 
"20000224", "20010203", "20019024", "20020201", "sampleIDs"), class = "factor"), 
    V2 = structure(c(6L, 2L, 3L, 5L, 1L, 4L), .Label = c("29", 
    "4", "5", "51", "6", "begin"), class = "factor"), V3 = structure(c(6L, 
    2L, 4L, 5L, 1L, 3L), .Label = c("30", "5", "52", "6", "8", 
    "finish"), class = "factor")), .Names = c("V1", "V2", "V3"
), row.names = c(NA, 6L), class = "data.frame")
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T02:51:18+00:00Added an answer on May 27, 2026 at 2:51 am

    First look at these general suggestions on speeding up code: https://stackoverflow.com/a/8474941/636656

    The first thing that jumps out at me is that I’d create only one results matrix. That way you’re not duplicating the sampleIDs begin finish columns, and you can avoid any overhead that comes with running the matching algorithm twice.

    Doing that, you can avoid selecting more than once (although it’s trivial in terms of speed as long as you store your selection vector rather than re-calculate).

    Here’s a solution using apply:

    myMatrix1 <- data.frame(sampleIDs=c(19990224,20000224),begin=c(4,5),finish=c(5,6))
    myMatrix2 <- data.frame(begin=c(0,11),finish=c(10,367),sed=c(1.002,2.01),boreWidth=c(.014,.056))
    
    glommer <- function(x,myMatrix2) {
      x[4:5] <- as.numeric(myMatrix2[ myMatrix2$begin <= x["begin"] & myMatrix2$finish >= x["finish"], c("sed","boreWidth") ])
      names(x)[4:5] <- c("sed","boreWidth")
      return( x )
    }
    
    > t(apply( myMatrix1, 1, glommer, myMatrix2=myMatrix2))
         sampleIDs begin finish   sed boreWidth
    [1,]  19990224     4      5 1.002     0.014
    [2,]  20000224     5      6 1.002     0.014
    

    I used apply and stored everything as numeric. Other approaches would be to return a data.frame and have the sampleIDs and begin, finish be ints. That might avoid some problems with floating point error.

    This solution assumes there are no boundary cases (e.g. the begin, finish times of myMatrix1 are entirely contained within the begin, finish times of the other). If your data is more complicated, just change the glommer() function. How you want to handle that is a substantive question.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have table view to load data from an API call. I am using:
We have a table and a set of procedures that are used for generating
I have my UITableView data organized such that the sections of the table are
I am developing an application where the security requirements for data transferred and access
I'm tasked with providing a list of metadata requirements our data warehouse developers might
Is there any tool to generate test data based on specific requirements? e.g. Size,
Having a project with following requirements in mind. data reading intensive application. 100 max
If I have a requirement to create a data structure that has the following
I have not used PackedArray before, but just started looking at using them from
I have a requirement to hand-code an text file from data residing in a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.