The following code does what the question asks for.
The function that does all the work is append_one
. It
- Creates a vector
Y
repeating the prefixlength(x)
times. - Gets the runs of vector
y
. - Cleans the runs’ values to the empty string
""
if the runs’ lengths are less thanN
. - Reverses the run-length encoding.
- Pastes this vector of prefixes with the input vector
x
.
Then function append_all
calls this function on every column of the input data frame.
append_one <- function(x, N, pref = "D"){
y <- rep(pref, length(x))
is.na(y) <- is.na(x)
r <- rle(y)
r$values[r$lengths < N] <- ""
y <- inverse.rle(r)
paste0(y, x)
}
append_all <- function(X, n, pref = "D"){
Y <- X
Y [] <- lapply(Y, append_one, N = n, pref = pref)
Y
}
N1 <- 3
append_all(df1, N1)
Data.
Original data set, posted in the question.
df <- data.frame(c(1,NA,1,1,1),
c(2,2,NA,NA,NA),
c(3,3,3,3,NA),
c(4,4,4,4,4),
c(5,NA,5,NA,5))
New data set and corresponding output, posted in a comment.
df1 <- data.frame(c(1.0,NA,1.1,1.2,1.3),
c(2.0,2.1,NA,NA,NA),
c(3.0,3.1,3.2,3.3,NA),
c(4.0,4.1,4.2,4.3,4.4),
c(5.0,NA,5.1,NA,5.2))
df2 <- data.frame(c(1.0,NA,'D1.1','D1.2','D1.3'),
c(2.0,2.1,NA,NA,NA),
c('D3.0','D3.1','D3.2','D3.3',NA),
c('D4.0','D4.1','D4.2','D4.3','D4.4'),
c(5.0,NA,5.1,NA,5.2))
4
solved Is there an efficient method to check for 8 successive elements that are not NA (i.e. is.na()==FALSE) in each column of a large dataset?