It splits a word into two parts: stem and end. There are three cases:
- The word ends with
ss(or even mores):stem <- wordandend <- "" - The word ends with a single
s:stem <- word without "s"andend <- "s" - The word does not end with
s:stem <- wordandend <- ""
This is done by a regular expression which captures the full word (due to ^....$). The first part (i.e. stem) consists either of as much as possible ending in ss (.*ss) or if that is not possible of as less as possible (.*?). Then possibly an ending s is taken to be the end part.
Note that in the first case (as much as possible ending in ss) there can never be an additional s for the end part.
solved Regex stemmer code explanation