[Solved] combine duplicates, do not publish blanks, dplyr::distinct


Perhaps the reason that you are having problems is that you are using empty strings when you should be using NAs. This is what I would assume is the idiomatic code.

df <- data.frame(unique_id = c(rep(1,3),rep(2,3)),
                school = c(rep('great',3),rep('spring',3)),
                           subject = rep(c("Math", "English", "History"),2),
                           grade = c(88,78,98,65,72,84),
                           sex = c(NA,NA, "male", NA, "female", NA))

r2 <- df %>%
  group_by(unique_id) %>% 
  summarise_each(funs(toString(unique(.))))

which returns

# A tibble: 2 x 5
  unique_id school                subject      grade        sex
      <dbl>  <chr>                  <chr>      <chr>      <chr>
1         1  great Math, English, History 88, 78, 98   NA, male
2         2 spring Math, English, History 65, 72, 84 NA, female

You can always

 r2$sex <- sapply(stringr::str_split(r2$sex, ", "),"[",2)

afterwards if you really want to remove those NAs, but I see them as informative.

You can write your own function to supply to summarize_each, which will allow you to take care of NAs in any column. Note, that you only need to do this because unique, rightfully so, does not have an na.rm argument.

rm_na_unique <- function(vec){
  unique(vec[!is.na(vec)])
}

r2 <- df %>%
       group_by(unique_id) %>% 
       summarise_each(funs(toString(rm_na_unique(.))))

Gives you the same result

# A tibble: 2 x 5
  unique_id school                subject      grade    sex
      <dbl>  <chr>                  <chr>      <chr>  <chr>
1         1  great Math, English, History 88, 78, 98   male
2         2 spring Math, English, History 65, 72, 84 female

3

solved combine duplicates, do not publish blanks, dplyr::distinct