So this question has been bugging me for a while since I’ve been looking for an efficient way of doing it. Basically, I have a dataframe, with a data sample from an experiment in each row. I guess this should be looked at more as a log file from an experiment than the final version of the data for analyses.
The problem that I have is that, from time to time, certain events get logged in a column of the data. To make the analyses tractable, what I’d like to do is “fill in the gaps” for the empty cells between events so that each row in the data can be tied to the most recent event that has occurred. This is a bit difficult to explain but here’s an example:

Now, I’d like to take that and turn it into this:

Doing so will enable me to split the data up by the current event. In any other language I would jump into using a for loop to do this, but I know that R isn’t great with loops of that type, and, in this case, I have hundreds of thousands of rows of data to sort through, so am wondering if anyone can offer suggestions for a speedy way of doing this?
Many thanks.
The
na.locf()function in package zoo is useful here, e.g.Gives
To explain the code,
na.locf(log_message)returns a factor (that was how the data were created indat) with theNAs replaced by the previous non-NAvalue (the last one carried forward part).strplit()is run on this character vector, breaking it apart on the underscore.strsplit()returns a list with as many elements as there were elements in the character vector. In this case each component is a vector of length two. We want the first elements of these vectors,sapply()to run the subsetting function'['()and extract the 1st element from each list component.transform()so i) I don;t need to refer todat$and so I can add the result as a new variable directly into the datadat.