[Solved] fastest way to compare string elements with each other


The amount of downvotes is crazy but oh well… I found the reason for my performance issue / bottleneck thanks to the comments.

The second for loop inside StartSimilarityCheck() iterates over all entries, which in itself is no problem but when viewed under performance issues and efficient, is bad. The solution is to only check strings which are in the neighborhood but how do we know if they are?

First, we get a list which is ordered by ascension. That gives us a rough overview of similar strings. Now we define a threshold of Levenshtein score (smaller score is higher similarity between two strings). If the score is higher than the threshold it means they are not too similar, thus we can break out of the inner loop. That saves time and the program can finish really fast. Notice that that way is not bullet proof, IMHO because if the first string is 0Directory it will be at the beginning part of the list but a string like zDirectory will be further down and could be missed. Correct me if I am wrong..

    private static void StartSimilarityCheck(List<string> whiteList)
    {
        var watch = Stopwatch.StartNew();
        for (int j = 0; j < whiteList.Count - 1; j++)
        {
            string dirName = whiteList[j];
            bool insertDirName = true;
            int threshold = 2;
            if (!IsBlackList(dirName))
            {
                // start the next element
                for (int i = j + 1; i < whiteList.Count; i++)
                {
                    // end of index reached
                    if (i == whiteList.Count)
                    {
                        break;
                    }

                    int similiarity = LevenshteinDistance.Compute(dirName, whiteList[i]);
                    if (similiarity >= threshold)
                    {
                        break;
                    }
                    // low score means high similarity
                    if (similiarity <= threshold)
                    {
                        if (insertDirName)
                        {
                            AddSimilarEntry(dirName);
                            AddSimilarEntry(whiteList[i]);
                            AddBlackListEntry(whiteList[i]);
                            insertDirName = false;
                        }
                        else
                        {
                            AddBlackListEntry(whiteList[i]);
                        }
                    }
                }
            }
            Console.WriteLine(j);
        }
        watch.Stop();
        Console.WriteLine("Ms: " + watch.ElapsedMilliseconds);
        Console.WriteLine("Similar entries: " + similar.Count);
    }

solved fastest way to compare string elements with each other