A simple solution is by trying all possible substrings from left to right (i.e. starting from indices i=0
to n-k
), and comparing each to the next substrings (i.e. starting from indices j=i+1
to n-k
).
For every i
-substring, you count the number of occurrences, and keep a trace of the most frequentso far.
As a string comparison costs at worst k character comparisons, and you will be performing (n-k-1)(n-k)/2
such comparisons and the total cost is of order O(k(n-k)²)
. [In fact the cost can be lower because some of the string comparisons may terminate early, but I an not able to perform the evaluation.]
This solution is simple but probably not efficient.
In theory you can reduce the cost using a more efficient string matching algorithm, such as Knuth-Morris-Pratt, resulting in O((n-k)(n+k))
operations.
1
solved Most frequent substring of fixed length – simple solution needed [closed]