[Solved] Removing different words form a document using R console

A simple way would be to use gsub(paste0(‘\\b’, YOURVECTOROFWORDSTOREMOVE, ‘\\b’, collapse=”|”),”,YOURSTRING) which replaces every occurence of the words in the vector surrounded by either end/beginning characters or whitespace with a single space. but you might want to look at the tm package and work with a corpus object if you have many files like this. … Read more

[Solved] Try to answer some boolean queries using Term-Document-Incidence-Matrix [closed]

Your problem is in this line: (ProcessFiles function) String[] termsCollection = RemoveStopsWords(file.ToUpper().Split(‘ ‘)); you’re splitting the name of the file and not its content That’s why you have no search results you should do something like this instead: String[] termsCollection = RemoveStopsWords(File.ReadAllText(file).ToUpper().Split(‘ ‘)); Now change your TermDocMatrix constructor: public TermDocMatrix(string IndexPath,string FileName) { if (!Directory.Exists(IndexPath)) … Read more