[Solved] High-performance merging of ordered sets


This is not going to be the fastest data structure & algorithm for your particular purpose I guess, but it may be fast enough. Test it yourself.

Note that a std::forward_list or even a std::vector might be faster depending on the actual scenario (-> constant factors in big-O-notation).

tmyklebu mentioned another approach in the comments: depending on the scenario, it might be faster to merge on-demand, e.g. storing all data sets individually and merging them into a vector to pass to the event handler, or even using a “merging” iterator (whose increment gets the next element of the individual data sets).

Further performance improvements may be achieved by using a custom memory pool -> custom allocator.

#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>

// inserts a sorted range into the `to` container
template < typename To, typename InputIt >
void insert_new_sorted(To& to,
                       InputIt beg_old, InputIt end_old,
                       InputIt beg_new, InputIt end_new)
{
    auto const& comp = to.value_comp();
    typename To::iterator i = to.begin();

    // might improve performance: don't remove elements which are in both
    // ranges (old and new)
    while(beg_old != end_old && beg_new != end_new)
    {
        if(comp(*beg_old, *beg_new))
        {
            // remove old element
            i = to.find(*beg_old);  // "slow", no hint :(
            i = to.erase(i);
            ++beg_old;
        }else if(comp(*beg_new, *beg_old))
        {
            // insert new element
            // using the hint to achieve better performance
            i = to.insert(i, *beg_new);
            ++beg_new;
        }else
        {
            // both equal, do nothing
            ++beg_new;
            ++beg_old;
        }
    }

    // remove remaining old elements
    for(; beg_old != end_old; ++beg_old)
    {
        to.erase(to.find(*beg_old));  // "slow", no hint :(
    }

    // insert remaining new elements
    for(; beg_new != end_new; ++beg_new)
    {
        i = to.insert(i, *beg_new);
    }

    std::copy(to.begin(), to.end(),
        std::ostream_iterator<typename To::value_type>(std::cout, ", "));
    std::cout << std::endl;
}

int main()
{
    using set_t = std::multiset<double>;

    set_t const A = {1, 3, 4, 6};
    set_t const B = {1, 2, 3};
    set_t const C = {2, 3, 5};
    set_t const A2 = {2, 4, 7, 8};

    set_t result;
    insert_new_sorted(result, A.end(), A.end(), A.begin(), A.end());
    insert_new_sorted(result, B.end(), B.end(), B.begin(), B.end());
    insert_new_sorted(result, C.end(), C.end(), C.begin(), C.end());
    insert_new_sorted(result, A.begin(), A.end(), A2.begin(), A2.end());
}

Output:

1, 3, 4, 6,
1, 1, 2, 3, 3, 4, 6,
1, 1, 2, 2, 3, 3, 3, 4, 5, 6,
1, 2, 2, 2, 3, 3, 4, 5, 7, 8,


A different approach: store the iterators of the inserted elements, to speed up erasing.

6

solved High-performance merging of ordered sets