[Solved] R: Is there a function to clean factor levels? characters columnwise in a data frame? [closed]


Just use the internal bits from janitor::clean_names():

# #' 'Clean' a character/factor vector like `janitor::clean_names()` does for data frame columns
# #'
# #' Most of the internals are from `janitor::clean_names()`
# #'
# #' @param x a vector of strings or factors
# #' @param refactor if `x` is a factor, return a ref-factored factor?
# #'        Default: `FALSE` == return character vector.
clean_vec <- function (x, refactor=FALSE) {

  require(magrittr, quietly=TRUE)

  if (!(is.character(x) || is.factor(x))) return(x)

  x_is_factor <- is.factor(x)

  old_names <- as.character(x)

  new_names <- old_names %>%
    gsub("'", "", .) %>%
    gsub("\"", "", .) %>%
    gsub("%", "percent", .) %>%
    gsub("^[ ]+", "", .) %>%
    make.names(.) %>%
    gsub("[.]+", "_", .) %>%
    gsub("[_]+", "_", .) %>%
    tolower(.) %>%
    gsub("_$", "", .)

  dupe_count <- sapply(1:length(new_names), function(i) {
    sum(new_names[i] == new_names[1:i])
  })

  new_names[dupe_count > 1] <- paste(
    new_names[dupe_count > 1], dupe_count[dupe_count > 1], sep = "_"
  )

  if (x_is_factor && refactor) factor(new_names) else new_names

}

Example:

vec <- stringi::stri_rand_strings(10, 10, pattern = "[A-Za-z0-9\\.\\-\\?_\\,\\*\\+]")

vec
##  [1] "TzMF-iCHX6" "v-b+2cpul5" "JPMwpP35K6" "5Z3RQf50Tb" "HaPzKB5jhH"
##  [6] "3gz6P4?0uU" "ofXkhP4Q1O" "?,4NvCjw,3" "AlG9dWJ,Ze" "MrPrvuYH4*"

clean_vec(vec)
##  [1] "tzmf_ichx6"  "v_b_2cpul5"  "jpmwpp35k6"  "x5z3rqf50tb" "hapzkb5jhh" 
##  [6] "x3gz6p4_0uu" "ofxkhp4q1o"  "x_4nvcjw_3"  "alg9dwj_ze"  "mrprvuyh4"

clean_vec(factor(vec))
##  [1] "tzmf_ichx6"  "v_b_2cpul5"  "jpmwpp35k6"  "x5z3rqf50tb" "hapzkb5jhh" 
##  [6] "x3gz6p4_0uu" "ofxkhp4q1o"  "x_4nvcjw_3"  "alg9dwj_ze"  "mrprvuyh4"

clean_vec(factor(vec), TRUE)
##  [1] tzmf_ichx6  v_b_2cpul5  jpmwpp35k6  x5z3rqf50tb hapzkb5jhh 
##  [6] x3gz6p4_0uu ofxkhp4q1o  x_4nvcjw_3  alg9dwj_ze  mrprvuyh4  
## 10 Levels: alg9dwj_ze hapzkb5jhh jpmwpp35k6 mrprvuyh4 ... x5z3rqf50tb

3

solved R: Is there a function to clean factor levels? characters columnwise in a data frame? [closed]