[Solved] Calculate sum of a column by ID based on the value of another column in R


Your desired output isn’t very clear, but I think this does what you need (you also have Email column twice)

library(data.table)
cols <- c("Call", "Callback", "Email") # Choose columns to modify

First solution (simple version)

setDT(df)[, paste0(cols, "Sum") := 
            lapply(.SD, function(x) c(rep(0L, .N - 1L), sum(x))),
            by = .(E_Add, cumsum(Action == "Event")), 
            .SDcols = cols][]

#     E_Add   Action ActionType Call Callback Email.1 CallSum CallbackSum EmailSum
#  1:  xxxx   Task       Call    1        0       0       0           0          0
#  2:  xxxx   Task       Call    1        0       0       2           0          0
#  3:  xxxx  Event      Start    0        0       0       0           0          0
#  4:  xxxx   Task       Call    1        0       0       1           0          0
#  5:  xxxx  Event      Trial    0        0       0       0           0          0
#  6:  yyyy   Task       Call    1        0       0       0           0          0
#  7:  yyyy   Task   Callback    0        1       0       0           0          0
#  8:  yyyy   Task      Email    0        0       1       0           0          0
#  9:  yyyy   Task       Call    1        0       0       2           1          1
# 10:  yyyy  Event      Start    0        0       0       0           0          0

Second solution To match your exact output

setDT(df)[, paste0(cols, "Sum") := 
            lapply(.SD, function(x) {
            if(any(x == 1L)){
              indx <- max(which(x == 1L))
              x[indx] <- sum(x) 
              x[-indx] <- 0L
              x
              } else 0L
           }), 
            by = .(E_Add, cumsum(Action == "Event")), 
           .SDcols = cols][]

#     E_Add   Action ActionType Call Callback Email.1 CallSum CallbackSum EmailSum
#  1:  xxxx   Task       Call    1        0       0       0           0          0
#  2:  xxxx   Task       Call    1        0       0       2           0          0
#  3:  xxxx  Event      Start    0        0       0       0           0          0
#  4:  xxxx   Task       Call    1        0       0       1           0          0
#  5:  xxxx  Event      Trial    0        0       0       0           0          0
#  6:  yyyy   Task       Call    1        0       0       0           0          0
#  7:  yyyy   Task   Callback    0        1       0       0           1          0
#  8:  yyyy   Task      Email    0        0       1       0           0          1
#  9:  yyyy   Task       Call    1        0       0       2           0          0
# 10:  yyyy  Event      Start    0        0       0       0           0          0

Edit per comment (If you want to display sum on Event

df[, paste0(cols, "Sum") := 
     lapply(.SD, function(x) c(rep(0L, .N - 1L), sum(x))),
     by = .(E_Add, cumsum(c(FALSE, (Action == "Event")[-length(Action)]))), 
          .SDcols = cols][]

#     E_Add Action ActionType Call Callback Email CallSum CallbackSum EmailSum
#  1:  xxxx   Task       Call    1        0     0       0           0        0
#  2:  xxxx   Task       Call    1        0     0       0           0        0
#  3:  xxxx  Event      Start    0        0     0       2           0        0
#  4:  xxxx   Task       Call    1        0     0       0           0        0
#  5:  xxxx  Event      Trial    0        0     0       1           0        0
#  6:  yyyy   Task       Call    1        0     0       0           0        0
#  7:  yyyy   Task   Callback    0        1     0       0           0        0
#  8:  yyyy   Task      Email    0        0     1       0           0        0
#  9:  yyyy   Task       Call    1        0     0       0           0        0
# 10:  yyyy  Event      Start    0        0     0       2           1        1

4

solved Calculate sum of a column by ID based on the value of another column in R