So this question has been bugging me for a while since I’ve been looking

Question

0

Asked: June 18, 20262026-06-18T08:22:32+00:00 2026-06-18T08:22:32+00:00

So this question has been bugging me for a while since I’ve been looking

0

So this question has been bugging me for a while since I’ve been looking for an efficient way of doing it. Basically, I have a dataframe, with a data sample from an experiment in each row. I guess this should be looked at more as a log file from an experiment than the final version of the data for analyses.

The problem that I have is that, from time to time, certain events get logged in a column of the data. To make the analyses tractable, what I’d like to do is “fill in the gaps” for the empty cells between events so that each row in the data can be tied to the most recent event that has occurred. This is a bit difficult to explain but here’s an example:

Screenshot of dataframe from RStudio of base dataset

Now, I’d like to take that and turn it into this:

enter image description here

Doing so will enable me to split the data up by the current event. In any other language I would jump into using a for loop to do this, but I know that R isn’t great with loops of that type, and, in this case, I have hundreds of thousands of rows of data to sort through, so am wondering if anyone can offer suggestions for a speedy way of doing this?

Many thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T08:22:33+00:00

The na.locf() function in package zoo is useful here, e.g.

require(zoo)
dat <- data.frame(ID = 1:5, sample_value = c(34,56,78,98,234),
                  log_message = c("FIRST_EVENT", NA, "SECOND_EVENT", NA, NA))

dat <-
  transform(dat,
            Current_Event = sapply(strsplit(as.character(na.locf(log_message)), 
                                            "_"),
                                   `[`, 1))

Gives

> dat
  ID sample_value  log_message Current_Event
1  1           34  FIRST_EVENT         FIRST
2  2           56         <NA>         FIRST
3  3           78 SECOND_EVENT        SECOND
4  4           98         <NA>        SECOND
5  5          234         <NA>        SECOND

To explain the code,

na.locf(log_message) returns a factor (that was how the data were created in dat) with the NAs replaced by the previous non-NA value (the last one carried forward part).
The result of 1. is then converted to a character string
strplit() is run on this character vector, breaking it apart on the underscore. strsplit() returns a list with as many elements as there were elements in the character vector. In this case each component is a vector of length two. We want the first elements of these vectors,
So I use sapply() to run the subsetting function '['() and extract the 1st element from each list component.
The whole thing is wrapped in transform() so i) I don;t need to refer to dat$ and so I can add the result as a new variable directly into the data dat.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So this question has been bugging me for a while since I’ve been looking

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply