[Solved] With an array of strings of equal length, get the nth character of each string in it’s own array using LINQ


A quick test shows that setting @jdweng’s purely LINQ solution 1X, a solution somewhat abusing LINQ to do indexing comes in at 7.4X and a solution using for comes in at 28.4X.

(Ab)using LINQ with indexing:

var c = Enumerable.Range(0, source[0].Length).Select(chpos => Enumerable.Range(0, source.Length).Select(si => source[si][chpos]).Join()).ToArray();

Using nested for loops:

var sourcelen = source.Length;
var strlen = source[0].Length;
var c = new String[strlen];
var sb = new StringBuilder(strlen);
for (int chpos = 0; chpos < strlen; ++chpos) {
    for (int si = 0; si < sourcelen; ++si)
        sb.Append(source[si][chpos]);
    c[chpos] = sb.ToString();
    sb.Length = 0;
}

Running some further tests, Append is a little faster then indexing the StringBuilder until there are a lot of long strings and building all the strings in parallel (sequentially) isn’t faster, and as the strings increase in length, a Parallel.For version catches up to the fastest nested for version and then surpasses it, if you have a large number (10,000) of strings. Interestingly Append is much slower than indexing in Parallel.

Here is the build each indexed parallel version:

    var sourcelen = source.Length;
    var strlen = source[0].Length;
    var c = new String[strlen];
    Parallel.For(0, strlen, chpos => {
        var sb = new StringBuilder(sourcelen);
        sb.Length = sourcelen;
        for (int si = 0; si < sourcelen; ++si)
            sb[si] = source[si][chpos];
        c[chpos] = sb.ToString();
    });

Using this Join extension method greatly speeds up the solutions that use String.Join(String.Empty, or String.Join("",:

public static string Join(this IEnumerable<char> src) {
    var sb = new StringBuilder();
    foreach (var c in src)
        sb.Append(c);
    return sb.ToString();
}

Some notes on timings: Using LINQPad, I wrote some code to generate a sample source:

var lens = 10000;
var source = new string[10000];
//
{
    var s = Enumerable.Range(0,26).Select(letternum => new String(Convert.ToChar('a'+letternum), lens)).ToArray();
    for (int j1 = 0; j1 < source.Length; ++j1)
        source[j1] = s[j1 % 26].ToString();
}

Then, using LINQPad’s Util.ElapsedTime, I measured the time to process source with various implementations:

TimeSpan basetime;
//
{
    var start = Util.ElapsedTime;
    var c = source.Select(x => x.Select((y, i) => new { chr = y, index = i })).SelectMany(x => x).GroupBy(x => x.index).Select(x => x.Select(y => y.chr).Join()).ToArray();
    basetime = Util.ElapsedTime - start;
    basetime.Dump("Elapsed LINQ My Join 1X");
    //c.Dump();
}
//
{
    var start = Util.ElapsedTime;
    var sourcelen = source.Length;
    var strlen = source[0].Length;
    var c = new String[strlen];
    Parallel.For(0, strlen, chpos => {
        var sb = new StringBuilder(sourcelen);
        sb.Length = sourcelen;
        for (int si = 0; si < sourcelen; ++si)
            sb[si] = source[si][chpos];
        c[chpos] = sb.ToString();
    });
    var myt = Util.ElapsedTime - start;
    myt.Dump($"Elapsed build each indexed parallel {basetime.TotalSeconds / myt.TotalSeconds:0.0}X");
    c.Dump();
}

2

solved With an array of strings of equal length, get the nth character of each string in it’s own array using LINQ