[Solved] R: Splitting a string to different variables and assign 1 if string contains this word [duplicate]

Question

First off, for future posts please provide sample data in a reproducible and copy&paste-able format. Screenshots are not a good idea because we can’t easily extract data from an image. For more details, please review how to provide a minimal reproducible example/attempt.

That aside, here is a tidyverse solution

library(tidyverse)
df %>%
    separate_rows(Text, sep = " ") %>%
    mutate(n = 1) %>%
    pivot_wider(names_from = "Text", values_from = "n", values_fill = list(n = 0))
## A tibble: 5 x 6
#  ID      Peanut Butter Jelly Storm  Wind
#  <fct>    <dbl>  <dbl> <dbl> <dbl> <dbl>
#1 ID-0001      1      1     1     0     0
#2 ID-0002      1      0     0     0     0
#3 ID-0003      0      1     0     0     0
#4 ID-0004      0      0     0     1     0
#5 ID-0005      0      1     0     1     1

Explanation: We use separare_rows to split entries in Text on white spaces and reshape data into long format; we then add a count column; finally we reshape data from long to wide with pivot_wider, and fill missing values with 0.

Or in base R using xtabs

df2 <- transform(df, Text = strsplit(as.character(Text), " "))
xtabs(n ~ ., data.frame(
    ID = with(df2, rep(ID, vapply(Text, length, 1L))),
    Text = unlist(df2$Text),
    n = 1))
#ID        Butter Jelly Peanut Storm Wind
#  ID-0001      1     1      1     0    0
#  ID-0002      0     0      1     0    0
#  ID-0003      1     0      0     0    0
#  ID-0004      0     0      0     1    0
#  ID-0005      1     0      0     1    1

Sample data

df <- read.table(text =
"ID Text
ID-0001   'Peanut Butter Jelly'
ID-0002   Peanut
ID-0003   Butter
ID-0004   Storm
ID-0005   'Storm Wind Butter'", header = T)

Accepted Answer

First off, for future posts please provide sample data in a reproducible and copy&paste-able format. Screenshots are not a good idea because we can’t easily extract data from an image. For more details, please review how to provide a minimal reproducible example/attempt.

That aside, here is a tidyverse solution

library(tidyverse)
df %>%
    separate_rows(Text, sep = " ") %>%
    mutate(n = 1) %>%
    pivot_wider(names_from = "Text", values_from = "n", values_fill = list(n = 0))
## A tibble: 5 x 6
#  ID      Peanut Butter Jelly Storm  Wind
#  <fct>    <dbl>  <dbl> <dbl> <dbl> <dbl>
#1 ID-0001      1      1     1     0     0
#2 ID-0002      1      0     0     0     0
#3 ID-0003      0      1     0     0     0
#4 ID-0004      0      0     0     1     0
#5 ID-0005      0      1     0     1     1

Explanation: We use separare_rows to split entries in Text on white spaces and reshape data into long format; we then add a count column; finally we reshape data from long to wide with pivot_wider, and fill missing values with 0.

Or in base R using xtabs

df2 <- transform(df, Text = strsplit(as.character(Text), " "))
xtabs(n ~ ., data.frame(
    ID = with(df2, rep(ID, vapply(Text, length, 1L))),
    Text = unlist(df2$Text),
    n = 1))
#ID        Butter Jelly Peanut Storm Wind
#  ID-0001      1     1      1     0    0
#  ID-0002      0     0      1     0    0
#  ID-0003      1     0      0     0    0
#  ID-0004      0     0      0     1    0
#  ID-0005      1     0      0     1    1

Sample data

df <- read.table(text =
"ID Text
ID-0001   'Peanut Butter Jelly'
ID-0002   Peanut
ID-0003   Butter
ID-0004   Storm
ID-0005   'Storm Wind Butter'", header = T)