Editing from previous wrong answer and borrowing from @akron for the use of rle
, you can do this: assuming that your data is in a data.frame named “df” and your “frame classes” are in a column named “frame_class”, as in the code below, this should work:
df = data.frame(n_frame = seq(1:13), frame_type = "frame_type",
frame_class = c("I_frame", "P_frame", "P_frame", "B_frame", "P_frame", "P_frame",
"B_frame", "I_frame", "B_frame", "P_frame", "I_frame", "P_frame", "I_frame"))
df$frame_letter = substring(df$frame_class,1,1) # get only the beginning letter
# Find the location of I_frames
where_i = which(df$frame_class == "I_frame")
num_i = length(where_i)
out_codes = list()
for (ind_i in 1:(num_i-1)){ # cycle on "sandwiches"
start = where_i[ind_i]
end = where_i[ind_i+1]
sub_data = df$frame_letter[(start+1):(end-1)] # Get data in a sandwich
count_reps = rle(sub_data) # find repetitions pattern
# build the codes
out_code = "I"
for (ind_letter in 1:length(count_reps$lengths)){
out_code= paste0(out_code, ifelse(count_reps$lengths[ind_letter] == 1,
count_reps$values[ind_letter], # If only 1 rep, don't add "1" in the string
paste0(count_reps$lengths[ind_letter], count_reps$values[ind_letter])))
}
out_codes [[ind_i]] = out_code # put in list
}
out_codes
, which gives:
> out_codes
[[1]]
[1] "I2PB2PB"
[[2]]
[1] "IBP"
[[3]]
[1] "IP"
note it’s really quick and dirty: you should at least want to implement some checks to be sure that the series always start and end with an “I_frame”, but this could put you in the right direction…
Also note that this could be slow for large datasets.
Lorenzo
3
solved Grouping and summarizing [closed]