[Solved] Is there an efficient method to check for 8 successive elements that are not NA (i.e. is.na()==FALSE) in each column of a large dataset?


The following code does what the question asks for.
The function that does all the work is append_one. It

  1. Creates a vector Y repeating the prefix length(x) times.
  2. Gets the runs of vector y.
  3. Cleans the runs’ values to the empty string "" if the runs’ lengths are less than N.
  4. Reverses the run-length encoding.
  5. Pastes this vector of prefixes with the input vector x.

Then function append_all calls this function on every column of the input data frame.

append_one <- function(x, N, pref = "D"){
  y <- rep(pref, length(x))
  is.na(y) <- is.na(x)
  r <- rle(y)
  r$values[r$lengths < N] <- ""
  y <- inverse.rle(r)
  paste0(y, x)
}

append_all <- function(X, n, pref = "D"){
  Y <- X
  Y [] <- lapply(Y, append_one, N = n, pref = pref)
  Y
}

N1 <- 3
append_all(df1, N1)

Data.

Original data set, posted in the question.

df <- data.frame(c(1,NA,1,1,1),
                  c(2,2,NA,NA,NA),
                  c(3,3,3,3,NA),
                  c(4,4,4,4,4),
                  c(5,NA,5,NA,5))

New data set and corresponding output, posted in a comment.

df1 <- data.frame(c(1.0,NA,1.1,1.2,1.3),
                  c(2.0,2.1,NA,NA,NA),
                  c(3.0,3.1,3.2,3.3,NA),
                  c(4.0,4.1,4.2,4.3,4.4),
                  c(5.0,NA,5.1,NA,5.2))

df2 <- data.frame(c(1.0,NA,'D1.1','D1.2','D1.3'),
                  c(2.0,2.1,NA,NA,NA),
                  c('D3.0','D3.1','D3.2','D3.3',NA),
                  c('D4.0','D4.1','D4.2','D4.3','D4.4'),
                  c(5.0,NA,5.1,NA,5.2))

4

solved Is there an efficient method to check for 8 successive elements that are not NA (i.e. is.na()==FALSE) in each column of a large dataset?