[Solved] Comparing two version of the same string

Question

Here’s a tidyverse approach:

library(dplyr)
library(tidyr)

# put data in a data.frame
data_frame(string = unlist(data)) %>% 
    # add ID column so we can recombine later
    add_rownames('id') %>% 
    # add a lagged column to compare against
    mutate(string2 = lag(string)) %>% 
    # break strings into words
    separate_rows(string) %>% 
    # evaluate the following calls rowwise (until regrouped)
    rowwise() %>% 
    # chop to rows with a string to compare against,
    filter(!is.na(string2), 
           # where the word is not in the comparison string
           !grepl(string, string2, ignore.case = TRUE)) %>% 
    # regroup by ID
    group_by(id) %>%
    # reassemble strings
    summarise(string = paste(string, collapse=" "))

## # A tibble: 2 x 2
##      id                  string
##   <chr>                   <chr>
## 1     2                    Very
## 2     3 and only one sentences.

Select out string if you’d like just a vector by appending

 ...
    %>% `[[`('string')

## [1] "Very"                    "and only one sentences."

Accepted Answer

Here’s a tidyverse approach:

library(dplyr)
library(tidyr)

# put data in a data.frame
data_frame(string = unlist(data)) %>% 
    # add ID column so we can recombine later
    add_rownames('id') %>% 
    # add a lagged column to compare against
    mutate(string2 = lag(string)) %>% 
    # break strings into words
    separate_rows(string) %>% 
    # evaluate the following calls rowwise (until regrouped)
    rowwise() %>% 
    # chop to rows with a string to compare against,
    filter(!is.na(string2), 
           # where the word is not in the comparison string
           !grepl(string, string2, ignore.case = TRUE)) %>% 
    # regroup by ID
    group_by(id) %>%
    # reassemble strings
    summarise(string = paste(string, collapse=" "))

## # A tibble: 2 x 2
##      id                  string
##   <chr>                   <chr>
## 1     2                    Very
## 2     3 and only one sentences.

Select out string if you’d like just a vector by appending

 ...
    %>% `[[`('string')

## [1] "Very"                    "and only one sentences."