Perhaps the reason that you are having problems is that you are using empty strings when you should be using NAs. This is what I would assume is the idiomatic code.
df <- data.frame(unique_id = c(rep(1,3),rep(2,3)),
school = c(rep('great',3),rep('spring',3)),
subject = rep(c("Math", "English", "History"),2),
grade = c(88,78,98,65,72,84),
sex = c(NA,NA, "male", NA, "female", NA))
r2 <- df %>%
group_by(unique_id) %>%
summarise_each(funs(toString(unique(.))))
which returns
# A tibble: 2 x 5
unique_id school subject grade sex
<dbl> <chr> <chr> <chr> <chr>
1 1 great Math, English, History 88, 78, 98 NA, male
2 2 spring Math, English, History 65, 72, 84 NA, female
You can always
r2$sex <- sapply(stringr::str_split(r2$sex, ", "),"[",2)
afterwards if you really want to remove those NAs, but I see them as informative.
You can write your own function to supply to summarize_each
, which will allow you to take care of NAs in any column. Note, that you only need to do this because unique
, rightfully so, does not have an na.rm
argument.
rm_na_unique <- function(vec){
unique(vec[!is.na(vec)])
}
r2 <- df %>%
group_by(unique_id) %>%
summarise_each(funs(toString(rm_na_unique(.))))
Gives you the same result
# A tibble: 2 x 5
unique_id school subject grade sex
<dbl> <chr> <chr> <chr> <chr>
1 1 great Math, English, History 88, 78, 98 male
2 2 spring Math, English, History 65, 72, 84 female
3
solved combine duplicates, do not publish blanks, dplyr::distinct