First off, for future posts please provide sample data in a reproducible and copy&paste-able format. Screenshots are not a good idea because we can’t easily extract data from an image. For more details, please review how to provide a minimal reproducible example/attempt.
That aside, here is a tidyverse
solution
library(tidyverse)
df %>%
separate_rows(Text, sep = " ") %>%
mutate(n = 1) %>%
pivot_wider(names_from = "Text", values_from = "n", values_fill = list(n = 0))
## A tibble: 5 x 6
# ID Peanut Butter Jelly Storm Wind
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 ID-0001 1 1 1 0 0
#2 ID-0002 1 0 0 0 0
#3 ID-0003 0 1 0 0 0
#4 ID-0004 0 0 0 1 0
#5 ID-0005 0 1 0 1 1
Explanation: We use separare_rows
to split entries in Text
on white spaces and reshape data into long format; we then add a count column; finally we reshape data from long to wide with pivot_wider
, and fill missing values with 0
.
Or in base R using xtabs
df2 <- transform(df, Text = strsplit(as.character(Text), " "))
xtabs(n ~ ., data.frame(
ID = with(df2, rep(ID, vapply(Text, length, 1L))),
Text = unlist(df2$Text),
n = 1))
#ID Butter Jelly Peanut Storm Wind
# ID-0001 1 1 1 0 0
# ID-0002 0 0 1 0 0
# ID-0003 1 0 0 0 0
# ID-0004 0 0 0 1 0
# ID-0005 1 0 0 1 1
Sample data
df <- read.table(text =
"ID Text
ID-0001 'Peanut Butter Jelly'
ID-0002 Peanut
ID-0003 Butter
ID-0004 Storm
ID-0005 'Storm Wind Butter'", header = T)
2
solved R: Splitting a string to different variables and assign 1 if string contains this word [duplicate]